Trying to find the frequency of words in a file using a script

up vote
1
down vote

favorite

The file I have is called test and it contains the following lines:

This is a test Test test test There are multiple tests.

I want the output to be:

test@3 tests@1 multiple@1 is@1 are@1 a@1 This@1 There@1 Test@1

I have the following script:

 cat $1 | tr ' ' 'n' > temp # put all words to a new line
 echo -n > file2.txt # clear file2.txt
 for line in $(cat temp) # trace each line from temp file
 do
 # check if the current line is visited
 grep -q $line file2.txt 
 if [ $line==$temp] 
 then
 count= expr `$count + 1` #count the number of words
 echo $line"@"$count >> file2.txt # add word and frequency to file
 fi
 done

edited Apr 1 at 17:26

Yurij Goncharuk

2,2582521

asked Apr 1 at 16:38

Samurai Bale

add a commentÂ |Â

up vote
1
down vote

favorite

The file I have is called test and it contains the following lines:

This is a test Test test test There are multiple tests.

I want the output to be:

test@3 tests@1 multiple@1 is@1 are@1 a@1 This@1 There@1 Test@1

I have the following script:

 cat $1 | tr ' ' 'n' > temp # put all words to a new line
 echo -n > file2.txt # clear file2.txt
 for line in $(cat temp) # trace each line from temp file
 do
 # check if the current line is visited
 grep -q $line file2.txt 
 if [ $line==$temp] 
 then
 count= expr `$count + 1` #count the number of words
 echo $line"@"$count >> file2.txt # add word and frequency to file
 fi
 done

edited Apr 1 at 17:26

Yurij Goncharuk

2,2582521

asked Apr 1 at 16:38

Samurai Bale

add a commentÂ |Â

up vote
1
down vote

favorite

The file I have is called test and it contains the following lines:

This is a test Test test test There are multiple tests.

I want the output to be:

test@3 tests@1 multiple@1 is@1 are@1 a@1 This@1 There@1 Test@1

I have the following script:

 cat $1 | tr ' ' 'n' > temp # put all words to a new line
 echo -n > file2.txt # clear file2.txt
 for line in $(cat temp) # trace each line from temp file
 do
 # check if the current line is visited
 grep -q $line file2.txt 
 if [ $line==$temp] 
 then
 count= expr `$count + 1` #count the number of words
 echo $line"@"$count >> file2.txt # add word and frequency to file
 fi
 done

edited Apr 1 at 17:26

Yurij Goncharuk

2,2582521

asked Apr 1 at 16:38

Samurai Bale

The file I have is called test and it contains the following lines:

This is a test Test test test There are multiple tests.

I want the output to be:

test@3 tests@1 multiple@1 is@1 are@1 a@1 This@1 There@1 Test@1

I have the following script:

 cat $1 | tr ' ' 'n' > temp # put all words to a new line
 echo -n > file2.txt # clear file2.txt
 for line in $(cat temp) # trace each line from temp file
 do
 # check if the current line is visited
 grep -q $line file2.txt 
 if [ $line==$temp] 
 then
 count= expr `$count + 1` #count the number of words
 echo $line"@"$count >> file2.txt # add word and frequency to file
 fi
 done

edited Apr 1 at 17:26

Yurij Goncharuk

2,2582521

asked Apr 1 at 16:38

Samurai Bale

edited Apr 1 at 17:26

Yurij Goncharuk

2,2582521

edited Apr 1 at 17:26

Yurij Goncharuk

2,2582521

edited Apr 1 at 17:26

Yurij Goncharuk

2,2582521

asked Apr 1 at 16:38

Samurai Bale

asked Apr 1 at 16:38

Samurai Bale

asked Apr 1 at 16:38

Samurai Bale

add a commentÂ |Â

7 Answers
7

active

oldest

votes

up vote
3
down vote

Use sort | uniq -c | sort -n to create a frequency table. Some more tweaking needed to get the desired format.

 tr ' ' 'n' < "$1" 
 | sort 
 | uniq -c 
 | sort -rn 
 | awk 'print $2"@"$1' 
 | tr 'n' ' '

answered Apr 1 at 16:45

choroba

24.3k33967

For the assignment I have to use this format, of a for loop and if,
â€“Â Samurai Bale
Apr 1 at 17:52

add a commentÂ |Â

up vote
2
down vote


$ cat >wdbag.py
#!/usr/bin/python

from collections import *
import re, sys

text=' '.join(sys.argv[1:]) 

t=Counter(re.findall(r"[w']+", text.lower()))

for item in t:
 print item+"@"+str(t[item])

$ chmod 755 wdbag.py 

$ ./wdbag.py "This is a test Test test test There are multiple tests."
a@1
tests@1
multiple@1
this@1
is@1
there@1
are@1
test@4

$ ./wdbag.py This is a test Test test test There are multiple tests.
a@1
tests@1
multiple@1
this@1
is@1
there@1
are@1
test@4

Ref: https://stackoverflow.com/a/11300418/3720510

answered Apr 1 at 21:39

Hannu

32916

add a , last on the print -line to get the list one a single row.
â€“Â Hannu
Apr 1 at 21:48

add a commentÂ |Â

up vote
1
down vote

grep + sort + uniq + sed pipeline:

grep -o '[[:alnum:]]*' file | sort | uniq -c | sed -E 's/[[:space:]]*([0-9]+) (.+)/2@1/'

The output:

a@1
are@1
is@1
multiple@1
test@3
Test@1
tests@1
There@1
This@1

answered Apr 1 at 16:45

RomanPerekhrest

22.4k12144

add a commentÂ |Â

up vote
1
down vote

With awk only:

 awk -v RS='( |\.|n)' 's[$0]++ 
 ENDfor (x in s) printf "%s%s", SEP,x"@"s[x]; SEP=" "; print ""' infile

This defines the Record Separator either a space, dot or newline, then save fields into an array called s with the key as whole fields/words and for each seen of the words, increment the occurrences in array that represents the value of the keys in array.

At the END loop over the elements of the array and first print the keys (fields/words) x, a @ and their value as occurrences s[x].

The SEP as a variable used to add spaces between each words when printing and on second to the next words.

edited Apr 2 at 2:59

answered Apr 1 at 17:54

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

add a commentÂ |Â

up vote
0
down vote

Using grep and awk..

 grep -o '[[:alnum:]]*' file | awk ' count[$0]++; nextEND ORS=" "; for (x in count)print x"@"count[x];print "n"'

tests@1 Test@1 multiple@1 a@1 This@1 There@1 are@1 test@3 is@1

answered Apr 1 at 17:24

Bharat

3058

add a commentÂ |Â

up vote
0
down vote

gawk '

 for(i = 1; i <= NF; i++) 
 arr[$i]++
 

END 
 PROCINFO["sorted_in"] = "@val_num_desc"

 for(i in arr) 
 printf "%s@%s ", i, arr[i]
 
 print ""

' FPAT='[a-zA-Z]+' input.txt

Explanation

PROCINFO["sorted_in"] = "@val_num_desc" - Order by element values in descending order (rather than by indices). Scalar values are compared as numbers. See Predefined Array Scanning Orders.

FPAT='[a-zA-Z]+' - A regular expression describing the contents of the fields in a record. When set, gawk
parses the input into fields, where the fields match the regular expression, instead of
using the value of the FS variable as the field separator.

Input

This is a test Test test test There are multiple tests.
This is a test Test test test There are multiple tests.
This is a test Test test test There are multiple tests.

Output

test@9 tests@3 Test@3 multiple@3 a@3 This@3 There@3 are@3 is@3

edited Apr 1 at 20:52

answered Apr 1 at 20:45

MiniMax

2,681718

Could you use the code I already posted, Im restricted to that format,
â€“Â Samurai Bale
Apr 1 at 20:49

@SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
â€“Â MiniMax
Apr 1 at 21:06

add a commentÂ |Â

up vote
0
down vote

As OP asked in the same kind of format...

bash-4.1$ cat test.sh
#!/bin/bash

tr ' ' 'n' < $1 > temp
while read line
do
 count=$(grep -cw $line temp)
 echo -n "$line@$count "
done < temp
echo ""

bash-4.1$ bash test.sh test.txt
This@1 is@1 a@1 test@3 Test@1 test@3 test@3 There@1 are@1 multiple@1 tests.@1

bash-4.1$ cat test.txt
This is a test Test test test There are multiple tests.

edited Apr 11 at 22:37

Drakonoved

674518

answered Apr 2 at 2:42

Kamaraj

2,5891312

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f434860%2ftrying-to-find-the-frequency-of-words-in-a-file-using-a-script%23new-answer', 'question_page');

);

Post as a guest

Name

7 Answers
7

active

oldest

votes

7 Answers
7

active

oldest

votes

up vote
3
down vote

Use sort | uniq -c | sort -n to create a frequency table. Some more tweaking needed to get the desired format.

 tr ' ' 'n' < "$1" 
 | sort 
 | uniq -c 
 | sort -rn 
 | awk 'print $2"@"$1' 
 | tr 'n' ' '

answered Apr 1 at 16:45

choroba

24.3k33967

For the assignment I have to use this format, of a for loop and if,
â€“Â Samurai Bale
Apr 1 at 17:52

add a commentÂ |Â

up vote
3
down vote

Use sort | uniq -c | sort -n to create a frequency table. Some more tweaking needed to get the desired format.

 tr ' ' 'n' < "$1" 
 | sort 
 | uniq -c 
 | sort -rn 
 | awk 'print $2"@"$1' 
 | tr 'n' ' '

answered Apr 1 at 16:45

choroba

24.3k33967

For the assignment I have to use this format, of a for loop and if,
â€“Â Samurai Bale
Apr 1 at 17:52

add a commentÂ |Â

up vote
3
down vote

Use sort | uniq -c | sort -n to create a frequency table. Some more tweaking needed to get the desired format.

 tr ' ' 'n' < "$1" 
 | sort 
 | uniq -c 
 | sort -rn 
 | awk 'print $2"@"$1' 
 | tr 'n' ' '

answered Apr 1 at 16:45

choroba

24.3k33967

Use sort | uniq -c | sort -n to create a frequency table. Some more tweaking needed to get the desired format.

 tr ' ' 'n' < "$1" 
 | sort 
 | uniq -c 
 | sort -rn 
 | awk 'print $2"@"$1' 
 | tr 'n' ' '

answered Apr 1 at 16:45

choroba

24.3k33967

answered Apr 1 at 16:45

choroba

24.3k33967

answered Apr 1 at 16:45

choroba

24.3k33967

answered Apr 1 at 16:45

choroba

24.3k33967

For the assignment I have to use this format, of a for loop and if,
â€“Â Samurai Bale
Apr 1 at 17:52

add a commentÂ |Â

For the assignment I have to use this format, of a for loop and if,
â€“Â Samurai Bale
Apr 1 at 17:52

For the assignment I have to use this format, of a for loop and if,
â€“Â Samurai Bale
Apr 1 at 17:52

add a commentÂ |Â

up vote
2
down vote


$ cat >wdbag.py
#!/usr/bin/python

from collections import *
import re, sys

text=' '.join(sys.argv[1:]) 

t=Counter(re.findall(r"[w']+", text.lower()))

for item in t:
 print item+"@"+str(t[item])

$ chmod 755 wdbag.py 

$ ./wdbag.py "This is a test Test test test There are multiple tests."
a@1
tests@1
multiple@1
this@1
is@1
there@1
are@1
test@4

$ ./wdbag.py This is a test Test test test There are multiple tests.
a@1
tests@1
multiple@1
this@1
is@1
there@1
are@1
test@4

Ref: https://stackoverflow.com/a/11300418/3720510

answered Apr 1 at 21:39

Hannu

32916

add a , last on the print -line to get the list one a single row.
â€“Â Hannu
Apr 1 at 21:48

add a commentÂ |Â

up vote
2
down vote


$ cat >wdbag.py
#!/usr/bin/python

from collections import *
import re, sys

text=' '.join(sys.argv[1:]) 

t=Counter(re.findall(r"[w']+", text.lower()))

for item in t:
 print item+"@"+str(t[item])

$ chmod 755 wdbag.py 

$ ./wdbag.py "This is a test Test test test There are multiple tests."
a@1
tests@1
multiple@1
this@1
is@1
there@1
are@1
test@4

$ ./wdbag.py This is a test Test test test There are multiple tests.
a@1
tests@1
multiple@1
this@1
is@1
there@1
are@1
test@4

Ref: https://stackoverflow.com/a/11300418/3720510

answered Apr 1 at 21:39

Hannu

32916

add a , last on the print -line to get the list one a single row.
â€“Â Hannu
Apr 1 at 21:48

add a commentÂ |Â

up vote
2
down vote


$ cat >wdbag.py
#!/usr/bin/python

from collections import *
import re, sys

text=' '.join(sys.argv[1:]) 

t=Counter(re.findall(r"[w']+", text.lower()))

for item in t:
 print item+"@"+str(t[item])

$ chmod 755 wdbag.py 

$ ./wdbag.py "This is a test Test test test There are multiple tests."
a@1
tests@1
multiple@1
this@1
is@1
there@1
are@1
test@4

$ ./wdbag.py This is a test Test test test There are multiple tests.
a@1
tests@1
multiple@1
this@1
is@1
there@1
are@1
test@4

Ref: https://stackoverflow.com/a/11300418/3720510

answered Apr 1 at 21:39

Hannu

32916


$ cat >wdbag.py
#!/usr/bin/python

from collections import *
import re, sys

text=' '.join(sys.argv[1:]) 

t=Counter(re.findall(r"[w']+", text.lower()))

for item in t:
 print item+"@"+str(t[item])

$ chmod 755 wdbag.py 

$ ./wdbag.py "This is a test Test test test There are multiple tests."
a@1
tests@1
multiple@1
this@1
is@1
there@1
are@1
test@4

$ ./wdbag.py This is a test Test test test There are multiple tests.
a@1
tests@1
multiple@1
this@1
is@1
there@1
are@1
test@4

Ref: https://stackoverflow.com/a/11300418/3720510

answered Apr 1 at 21:39

Hannu

32916

answered Apr 1 at 21:39

Hannu

32916

answered Apr 1 at 21:39

Hannu

32916

answered Apr 1 at 21:39

Hannu

32916

add a , last on the print -line to get the list one a single row.
â€“Â Hannu
Apr 1 at 21:48

add a commentÂ |Â

add a , last on the print -line to get the list one a single row.
â€“Â Hannu
Apr 1 at 21:48

add a , last on the print -line to get the list one a single row.
â€“Â Hannu
Apr 1 at 21:48

add a commentÂ |Â

up vote
1
down vote

grep + sort + uniq + sed pipeline:

grep -o '[[:alnum:]]*' file | sort | uniq -c | sed -E 's/[[:space:]]*([0-9]+) (.+)/2@1/'

The output:

a@1
are@1
is@1
multiple@1
test@3
Test@1
tests@1
There@1
This@1

answered Apr 1 at 16:45

RomanPerekhrest

22.4k12144

add a commentÂ |Â

up vote
1
down vote

grep + sort + uniq + sed pipeline:

grep -o '[[:alnum:]]*' file | sort | uniq -c | sed -E 's/[[:space:]]*([0-9]+) (.+)/2@1/'

The output:

a@1
are@1
is@1
multiple@1
test@3
Test@1
tests@1
There@1
This@1

answered Apr 1 at 16:45

RomanPerekhrest

22.4k12144

add a commentÂ |Â

up vote
1
down vote

grep + sort + uniq + sed pipeline:

grep -o '[[:alnum:]]*' file | sort | uniq -c | sed -E 's/[[:space:]]*([0-9]+) (.+)/2@1/'

The output:

a@1
are@1
is@1
multiple@1
test@3
Test@1
tests@1
There@1
This@1

answered Apr 1 at 16:45

RomanPerekhrest

22.4k12144

grep + sort + uniq + sed pipeline:

grep -o '[[:alnum:]]*' file | sort | uniq -c | sed -E 's/[[:space:]]*([0-9]+) (.+)/2@1/'

The output:

a@1
are@1
is@1
multiple@1
test@3
Test@1
tests@1
There@1
This@1

answered Apr 1 at 16:45

RomanPerekhrest

22.4k12144

answered Apr 1 at 16:45

RomanPerekhrest

22.4k12144

answered Apr 1 at 16:45

RomanPerekhrest

22.4k12144

answered Apr 1 at 16:45

RomanPerekhrest

22.4k12144

add a commentÂ |Â

up vote
1
down vote

With awk only:

 awk -v RS='( |\.|n)' 's[$0]++ 
 ENDfor (x in s) printf "%s%s", SEP,x"@"s[x]; SEP=" "; print ""' infile

At the END loop over the elements of the array and first print the keys (fields/words) x, a @ and their value as occurrences s[x].

The SEP as a variable used to add spaces between each words when printing and on second to the next words.

edited Apr 2 at 2:59

answered Apr 1 at 17:54

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

add a commentÂ |Â

up vote
1
down vote

With awk only:

 awk -v RS='( |\.|n)' 's[$0]++ 
 ENDfor (x in s) printf "%s%s", SEP,x"@"s[x]; SEP=" "; print ""' infile

At the END loop over the elements of the array and first print the keys (fields/words) x, a @ and their value as occurrences s[x].

The SEP as a variable used to add spaces between each words when printing and on second to the next words.

edited Apr 2 at 2:59

answered Apr 1 at 17:54

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

add a commentÂ |Â

up vote
1
down vote

With awk only:

 awk -v RS='( |\.|n)' 's[$0]++ 
 ENDfor (x in s) printf "%s%s", SEP,x"@"s[x]; SEP=" "; print ""' infile

At the END loop over the elements of the array and first print the keys (fields/words) x, a @ and their value as occurrences s[x].

The SEP as a variable used to add spaces between each words when printing and on second to the next words.

edited Apr 2 at 2:59

answered Apr 1 at 17:54

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

With awk only:

 awk -v RS='( |\.|n)' 's[$0]++ 
 ENDfor (x in s) printf "%s%s", SEP,x"@"s[x]; SEP=" "; print ""' infile

At the END loop over the elements of the array and first print the keys (fields/words) x, a @ and their value as occurrences s[x].

The SEP as a variable used to add spaces between each words when printing and on second to the next words.

edited Apr 2 at 2:59

answered Apr 1 at 17:54

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

edited Apr 2 at 2:59

answered Apr 1 at 17:54

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

answered Apr 1 at 17:54

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

answered Apr 1 at 17:54

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

14.8k82462

add a commentÂ |Â

up vote
0
down vote

Using grep and awk..

 grep -o '[[:alnum:]]*' file | awk ' count[$0]++; nextEND ORS=" "; for (x in count)print x"@"count[x];print "n"'

tests@1 Test@1 multiple@1 a@1 This@1 There@1 are@1 test@3 is@1

answered Apr 1 at 17:24

Bharat

3058

add a commentÂ |Â

up vote
0
down vote

Using grep and awk..

 grep -o '[[:alnum:]]*' file | awk ' count[$0]++; nextEND ORS=" "; for (x in count)print x"@"count[x];print "n"'

tests@1 Test@1 multiple@1 a@1 This@1 There@1 are@1 test@3 is@1

answered Apr 1 at 17:24

Bharat

3058

add a commentÂ |Â

up vote
0
down vote

Using grep and awk..

 grep -o '[[:alnum:]]*' file | awk ' count[$0]++; nextEND ORS=" "; for (x in count)print x"@"count[x];print "n"'

tests@1 Test@1 multiple@1 a@1 This@1 There@1 are@1 test@3 is@1

answered Apr 1 at 17:24

Bharat

3058

Using grep and awk..

 grep -o '[[:alnum:]]*' file | awk ' count[$0]++; nextEND ORS=" "; for (x in count)print x"@"count[x];print "n"'

tests@1 Test@1 multiple@1 a@1 This@1 There@1 are@1 test@3 is@1

answered Apr 1 at 17:24

Bharat

3058

answered Apr 1 at 17:24

Bharat

3058

answered Apr 1 at 17:24

Bharat

3058

answered Apr 1 at 17:24

Bharat

3058

add a commentÂ |Â

up vote
0
down vote

gawk '

 for(i = 1; i <= NF; i++) 
 arr[$i]++
 

END 
 PROCINFO["sorted_in"] = "@val_num_desc"

 for(i in arr) 
 printf "%s@%s ", i, arr[i]
 
 print ""

' FPAT='[a-zA-Z]+' input.txt

Explanation

PROCINFO["sorted_in"] = "@val_num_desc" - Order by element values in descending order (rather than by indices). Scalar values are compared as numbers. See Predefined Array Scanning Orders.

Input

This is a test Test test test There are multiple tests.
This is a test Test test test There are multiple tests.
This is a test Test test test There are multiple tests.

Output

test@9 tests@3 Test@3 multiple@3 a@3 This@3 There@3 are@3 is@3

edited Apr 1 at 20:52

answered Apr 1 at 20:45

MiniMax

2,681718

Could you use the code I already posted, Im restricted to that format,
â€“Â Samurai Bale
Apr 1 at 20:49

@SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
â€“Â MiniMax
Apr 1 at 21:06

add a commentÂ |Â

up vote
0
down vote

gawk '

 for(i = 1; i <= NF; i++) 
 arr[$i]++
 

END 
 PROCINFO["sorted_in"] = "@val_num_desc"

 for(i in arr) 
 printf "%s@%s ", i, arr[i]
 
 print ""

' FPAT='[a-zA-Z]+' input.txt

Explanation

PROCINFO["sorted_in"] = "@val_num_desc" - Order by element values in descending order (rather than by indices). Scalar values are compared as numbers. See Predefined Array Scanning Orders.

Input

This is a test Test test test There are multiple tests.
This is a test Test test test There are multiple tests.
This is a test Test test test There are multiple tests.

Output

test@9 tests@3 Test@3 multiple@3 a@3 This@3 There@3 are@3 is@3

edited Apr 1 at 20:52

answered Apr 1 at 20:45

MiniMax

2,681718

Could you use the code I already posted, Im restricted to that format,
â€“Â Samurai Bale
Apr 1 at 20:49

@SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
â€“Â MiniMax
Apr 1 at 21:06

add a commentÂ |Â

up vote
0
down vote

gawk '

 for(i = 1; i <= NF; i++) 
 arr[$i]++
 

END 
 PROCINFO["sorted_in"] = "@val_num_desc"

 for(i in arr) 
 printf "%s@%s ", i, arr[i]
 
 print ""

' FPAT='[a-zA-Z]+' input.txt

Explanation

PROCINFO["sorted_in"] = "@val_num_desc" - Order by element values in descending order (rather than by indices). Scalar values are compared as numbers. See Predefined Array Scanning Orders.

Input

This is a test Test test test There are multiple tests.
This is a test Test test test There are multiple tests.
This is a test Test test test There are multiple tests.

Output

test@9 tests@3 Test@3 multiple@3 a@3 This@3 There@3 are@3 is@3

edited Apr 1 at 20:52

answered Apr 1 at 20:45

MiniMax

2,681718

gawk '

 for(i = 1; i <= NF; i++) 
 arr[$i]++
 

END 
 PROCINFO["sorted_in"] = "@val_num_desc"

 for(i in arr) 
 printf "%s@%s ", i, arr[i]
 
 print ""

' FPAT='[a-zA-Z]+' input.txt

Explanation

PROCINFO["sorted_in"] = "@val_num_desc" - Order by element values in descending order (rather than by indices). Scalar values are compared as numbers. See Predefined Array Scanning Orders.

Input

This is a test Test test test There are multiple tests.
This is a test Test test test There are multiple tests.
This is a test Test test test There are multiple tests.

Output

test@9 tests@3 Test@3 multiple@3 a@3 This@3 There@3 are@3 is@3

edited Apr 1 at 20:52

answered Apr 1 at 20:45

MiniMax

2,681718

edited Apr 1 at 20:52

answered Apr 1 at 20:45

MiniMax

2,681718

answered Apr 1 at 20:45

MiniMax

2,681718

answered Apr 1 at 20:45

MiniMax

2,681718

Could you use the code I already posted, Im restricted to that format,
â€“Â Samurai Bale
Apr 1 at 20:49

@SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
â€“Â MiniMax
Apr 1 at 21:06

add a commentÂ |Â

Could you use the code I already posted, Im restricted to that format,
â€“Â Samurai Bale
Apr 1 at 20:49

@SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
â€“Â MiniMax
Apr 1 at 21:06

Could you use the code I already posted, Im restricted to that format,
â€“Â Samurai Bale
Apr 1 at 20:49

@SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
â€“Â MiniMax
Apr 1 at 21:06

add a commentÂ |Â

up vote
0
down vote

As OP asked in the same kind of format...

bash-4.1$ cat test.sh
#!/bin/bash

tr ' ' 'n' < $1 > temp
while read line
do
 count=$(grep -cw $line temp)
 echo -n "$line@$count "
done < temp
echo ""

bash-4.1$ bash test.sh test.txt
This@1 is@1 a@1 test@3 Test@1 test@3 test@3 There@1 are@1 multiple@1 tests.@1

bash-4.1$ cat test.txt
This is a test Test test test There are multiple tests.

edited Apr 11 at 22:37

Drakonoved

674518

answered Apr 2 at 2:42

Kamaraj

2,5891312

add a commentÂ |Â

up vote
0
down vote

As OP asked in the same kind of format...

bash-4.1$ cat test.sh
#!/bin/bash

tr ' ' 'n' < $1 > temp
while read line
do
 count=$(grep -cw $line temp)
 echo -n "$line@$count "
done < temp
echo ""

bash-4.1$ bash test.sh test.txt
This@1 is@1 a@1 test@3 Test@1 test@3 test@3 There@1 are@1 multiple@1 tests.@1

bash-4.1$ cat test.txt
This is a test Test test test There are multiple tests.

edited Apr 11 at 22:37

Drakonoved

674518

answered Apr 2 at 2:42

Kamaraj

2,5891312

add a commentÂ |Â

up vote
0
down vote

As OP asked in the same kind of format...

bash-4.1$ cat test.sh
#!/bin/bash

tr ' ' 'n' < $1 > temp
while read line
do
 count=$(grep -cw $line temp)
 echo -n "$line@$count "
done < temp
echo ""

bash-4.1$ bash test.sh test.txt
This@1 is@1 a@1 test@3 Test@1 test@3 test@3 There@1 are@1 multiple@1 tests.@1

bash-4.1$ cat test.txt
This is a test Test test test There are multiple tests.

edited Apr 11 at 22:37

Drakonoved

674518

answered Apr 2 at 2:42

Kamaraj

2,5891312

As OP asked in the same kind of format...

bash-4.1$ cat test.sh
#!/bin/bash

tr ' ' 'n' < $1 > temp
while read line
do
 count=$(grep -cw $line temp)
 echo -n "$line@$count "
done < temp
echo ""

bash-4.1$ bash test.sh test.txt
This@1 is@1 a@1 test@3 Test@1 test@3 test@3 There@1 are@1 multiple@1 tests.@1

bash-4.1$ cat test.txt
This is a test Test test test There are multiple tests.

edited Apr 11 at 22:37

Drakonoved

674518

answered Apr 2 at 2:42

Kamaraj

2,5891312

edited Apr 11 at 22:37

Drakonoved

674518

edited Apr 11 at 22:37

Drakonoved

674518

edited Apr 11 at 22:37

Drakonoved

674518

answered Apr 2 at 2:42

Kamaraj

2,5891312

answered Apr 2 at 2:42

Kamaraj

2,5891312

answered Apr 2 at 2:42

Kamaraj

2,5891312

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu