Trying to find the frequency of words in a file using a script

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












The file I have is called test and it contains the following lines:



This is a test Test test test There are multiple tests.


I want the output to be:



test@3 tests@1 multiple@1 is@1 are@1 a@1 This@1 There@1 Test@1


I have the following script:



 cat $1 | tr ' ' 'n' > temp # put all words to a new line
echo -n > file2.txt # clear file2.txt
for line in $(cat temp) # trace each line from temp file
do
# check if the current line is visited
grep -q $line file2.txt
if [ $line==$temp]
then
count= expr `$count + 1` #count the number of words
echo $line"@"$count >> file2.txt # add word and frequency to file
fi
done






share|improve this question


























    up vote
    1
    down vote

    favorite












    The file I have is called test and it contains the following lines:



    This is a test Test test test There are multiple tests.


    I want the output to be:



    test@3 tests@1 multiple@1 is@1 are@1 a@1 This@1 There@1 Test@1


    I have the following script:



     cat $1 | tr ' ' 'n' > temp # put all words to a new line
    echo -n > file2.txt # clear file2.txt
    for line in $(cat temp) # trace each line from temp file
    do
    # check if the current line is visited
    grep -q $line file2.txt
    if [ $line==$temp]
    then
    count= expr `$count + 1` #count the number of words
    echo $line"@"$count >> file2.txt # add word and frequency to file
    fi
    done






    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      The file I have is called test and it contains the following lines:



      This is a test Test test test There are multiple tests.


      I want the output to be:



      test@3 tests@1 multiple@1 is@1 are@1 a@1 This@1 There@1 Test@1


      I have the following script:



       cat $1 | tr ' ' 'n' > temp # put all words to a new line
      echo -n > file2.txt # clear file2.txt
      for line in $(cat temp) # trace each line from temp file
      do
      # check if the current line is visited
      grep -q $line file2.txt
      if [ $line==$temp]
      then
      count= expr `$count + 1` #count the number of words
      echo $line"@"$count >> file2.txt # add word and frequency to file
      fi
      done






      share|improve this question














      The file I have is called test and it contains the following lines:



      This is a test Test test test There are multiple tests.


      I want the output to be:



      test@3 tests@1 multiple@1 is@1 are@1 a@1 This@1 There@1 Test@1


      I have the following script:



       cat $1 | tr ' ' 'n' > temp # put all words to a new line
      echo -n > file2.txt # clear file2.txt
      for line in $(cat temp) # trace each line from temp file
      do
      # check if the current line is visited
      grep -q $line file2.txt
      if [ $line==$temp]
      then
      count= expr `$count + 1` #count the number of words
      echo $line"@"$count >> file2.txt # add word and frequency to file
      fi
      done








      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 1 at 17:26









      Yurij Goncharuk

      2,2582521




      2,2582521










      asked Apr 1 at 16:38









      Samurai Bale

      61




      61




















          7 Answers
          7






          active

          oldest

          votes

















          up vote
          3
          down vote













          Use sort | uniq -c | sort -n to create a frequency table. Some more tweaking needed to get the desired format.



           tr ' ' 'n' < "$1" 
          | sort
          | uniq -c
          | sort -rn
          | awk 'print $2"@"$1'
          | tr 'n' ' '





          share|improve this answer




















          • For the assignment I have to use this format, of a for loop and if,
            – Samurai Bale
            Apr 1 at 17:52

















          up vote
          2
          down vote














          $ cat >wdbag.py
          #!/usr/bin/python

          from collections import *
          import re, sys

          text=' '.join(sys.argv[1:])

          t=Counter(re.findall(r"[w']+", text.lower()))

          for item in t:
          print item+"@"+str(t[item])

          $ chmod 755 wdbag.py

          $ ./wdbag.py "This is a test Test test test There are multiple tests."
          a@1
          tests@1
          multiple@1
          this@1
          is@1
          there@1
          are@1
          test@4

          $ ./wdbag.py This is a test Test test test There are multiple tests.
          a@1
          tests@1
          multiple@1
          this@1
          is@1
          there@1
          are@1
          test@4


          Ref: https://stackoverflow.com/a/11300418/3720510






          share|improve this answer




















          • add a , last on the print -line to get the list one a single row.
            – Hannu
            Apr 1 at 21:48

















          up vote
          1
          down vote













          grep + sort + uniq + sed pipeline:



          grep -o '[[:alnum:]]*' file | sort | uniq -c | sed -E 's/[[:space:]]*([0-9]+) (.+)/2@1/'


          The output:



          a@1
          are@1
          is@1
          multiple@1
          test@3
          Test@1
          tests@1
          There@1
          This@1





          share|improve this answer



























            up vote
            1
            down vote













            With awk only:



             awk -v RS='( |\.|n)' 's[$0]++ 
            ENDfor (x in s) printf "%s%s", SEP,x"@"s[x]; SEP=" "; print ""' infile


            This defines the Record Separator either a space, dot or newline, then save fields into an array called s with the key as whole fields/words and for each seen of the words, increment the occurrences in array that represents the value of the keys in array.



            At the END loop over the elements of the array and first print the keys (fields/words) x, a @ and their value as occurrences s[x].



            The SEP as a variable used to add spaces between each words when printing and on second to the next words.






            share|improve this answer





























              up vote
              0
              down vote













              Using grep and awk..



               grep -o '[[:alnum:]]*' file | awk ' count[$0]++; nextEND ORS=" "; for (x in count)print x"@"count[x];print "n"'


              tests@1 Test@1 multiple@1 a@1 This@1 There@1 are@1 test@3 is@1






              share|improve this answer



























                up vote
                0
                down vote













                gawk '

                for(i = 1; i <= NF; i++)
                arr[$i]++


                END
                PROCINFO["sorted_in"] = "@val_num_desc"

                for(i in arr)
                printf "%s@%s ", i, arr[i]

                print ""

                ' FPAT='[a-zA-Z]+' input.txt


                Explanation



                PROCINFO["sorted_in"] = "@val_num_desc" - Order by element values in descending order (rather than by indices). Scalar values are compared as numbers. See Predefined Array Scanning Orders.



                FPAT='[a-zA-Z]+' - A regular expression describing the contents of the fields in a record. When set, gawk
                parses the input into fields, where the fields match the regular expression, instead of
                using the value of the FS variable as the field separator.



                Input



                This is a test Test test test There are multiple tests.
                This is a test Test test test There are multiple tests.
                This is a test Test test test There are multiple tests.


                Output



                test@9 tests@3 Test@3 multiple@3 a@3 This@3 There@3 are@3 is@3 





                share|improve this answer






















                • Could you use the code I already posted, Im restricted to that format,
                  – Samurai Bale
                  Apr 1 at 20:49










                • @SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
                  – MiniMax
                  Apr 1 at 21:06


















                up vote
                0
                down vote













                As OP asked in the same kind of format...



                bash-4.1$ cat test.sh
                #!/bin/bash

                tr ' ' 'n' < $1 > temp
                while read line
                do
                count=$(grep -cw $line temp)
                echo -n "$line@$count "
                done < temp
                echo ""

                bash-4.1$ bash test.sh test.txt
                This@1 is@1 a@1 test@3 Test@1 test@3 test@3 There@1 are@1 multiple@1 tests.@1

                bash-4.1$ cat test.txt
                This is a test Test test test There are multiple tests.





                share|improve this answer






















                  Your Answer







                  StackExchange.ready(function()
                  var channelOptions =
                  tags: "".split(" "),
                  id: "106"
                  ;
                  initTagRenderer("".split(" "), "".split(" "), channelOptions);

                  StackExchange.using("externalEditor", function()
                  // Have to fire editor after snippets, if snippets enabled
                  if (StackExchange.settings.snippets.snippetsEnabled)
                  StackExchange.using("snippets", function()
                  createEditor();
                  );

                  else
                  createEditor();

                  );

                  function createEditor()
                  StackExchange.prepareEditor(
                  heartbeatType: 'answer',
                  convertImagesToLinks: false,
                  noModals: false,
                  showLowRepImageUploadWarning: true,
                  reputationToPostImages: null,
                  bindNavPrevention: true,
                  postfix: "",
                  onDemand: true,
                  discardSelector: ".discard-answer"
                  ,immediatelyShowMarkdownHelp:true
                  );



                  );








                   

                  draft saved


                  draft discarded


















                  StackExchange.ready(
                  function ()
                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f434860%2ftrying-to-find-the-frequency-of-words-in-a-file-using-a-script%23new-answer', 'question_page');

                  );

                  Post as a guest






























                  7 Answers
                  7






                  active

                  oldest

                  votes








                  7 Answers
                  7






                  active

                  oldest

                  votes









                  active

                  oldest

                  votes






                  active

                  oldest

                  votes








                  up vote
                  3
                  down vote













                  Use sort | uniq -c | sort -n to create a frequency table. Some more tweaking needed to get the desired format.



                   tr ' ' 'n' < "$1" 
                  | sort
                  | uniq -c
                  | sort -rn
                  | awk 'print $2"@"$1'
                  | tr 'n' ' '





                  share|improve this answer




















                  • For the assignment I have to use this format, of a for loop and if,
                    – Samurai Bale
                    Apr 1 at 17:52














                  up vote
                  3
                  down vote













                  Use sort | uniq -c | sort -n to create a frequency table. Some more tweaking needed to get the desired format.



                   tr ' ' 'n' < "$1" 
                  | sort
                  | uniq -c
                  | sort -rn
                  | awk 'print $2"@"$1'
                  | tr 'n' ' '





                  share|improve this answer




















                  • For the assignment I have to use this format, of a for loop and if,
                    – Samurai Bale
                    Apr 1 at 17:52












                  up vote
                  3
                  down vote










                  up vote
                  3
                  down vote









                  Use sort | uniq -c | sort -n to create a frequency table. Some more tweaking needed to get the desired format.



                   tr ' ' 'n' < "$1" 
                  | sort
                  | uniq -c
                  | sort -rn
                  | awk 'print $2"@"$1'
                  | tr 'n' ' '





                  share|improve this answer












                  Use sort | uniq -c | sort -n to create a frequency table. Some more tweaking needed to get the desired format.



                   tr ' ' 'n' < "$1" 
                  | sort
                  | uniq -c
                  | sort -rn
                  | awk 'print $2"@"$1'
                  | tr 'n' ' '






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Apr 1 at 16:45









                  choroba

                  24.3k33967




                  24.3k33967











                  • For the assignment I have to use this format, of a for loop and if,
                    – Samurai Bale
                    Apr 1 at 17:52
















                  • For the assignment I have to use this format, of a for loop and if,
                    – Samurai Bale
                    Apr 1 at 17:52















                  For the assignment I have to use this format, of a for loop and if,
                  – Samurai Bale
                  Apr 1 at 17:52




                  For the assignment I have to use this format, of a for loop and if,
                  – Samurai Bale
                  Apr 1 at 17:52












                  up vote
                  2
                  down vote














                  $ cat >wdbag.py
                  #!/usr/bin/python

                  from collections import *
                  import re, sys

                  text=' '.join(sys.argv[1:])

                  t=Counter(re.findall(r"[w']+", text.lower()))

                  for item in t:
                  print item+"@"+str(t[item])

                  $ chmod 755 wdbag.py

                  $ ./wdbag.py "This is a test Test test test There are multiple tests."
                  a@1
                  tests@1
                  multiple@1
                  this@1
                  is@1
                  there@1
                  are@1
                  test@4

                  $ ./wdbag.py This is a test Test test test There are multiple tests.
                  a@1
                  tests@1
                  multiple@1
                  this@1
                  is@1
                  there@1
                  are@1
                  test@4


                  Ref: https://stackoverflow.com/a/11300418/3720510






                  share|improve this answer




















                  • add a , last on the print -line to get the list one a single row.
                    – Hannu
                    Apr 1 at 21:48














                  up vote
                  2
                  down vote














                  $ cat >wdbag.py
                  #!/usr/bin/python

                  from collections import *
                  import re, sys

                  text=' '.join(sys.argv[1:])

                  t=Counter(re.findall(r"[w']+", text.lower()))

                  for item in t:
                  print item+"@"+str(t[item])

                  $ chmod 755 wdbag.py

                  $ ./wdbag.py "This is a test Test test test There are multiple tests."
                  a@1
                  tests@1
                  multiple@1
                  this@1
                  is@1
                  there@1
                  are@1
                  test@4

                  $ ./wdbag.py This is a test Test test test There are multiple tests.
                  a@1
                  tests@1
                  multiple@1
                  this@1
                  is@1
                  there@1
                  are@1
                  test@4


                  Ref: https://stackoverflow.com/a/11300418/3720510






                  share|improve this answer




















                  • add a , last on the print -line to get the list one a single row.
                    – Hannu
                    Apr 1 at 21:48












                  up vote
                  2
                  down vote










                  up vote
                  2
                  down vote










                  $ cat >wdbag.py
                  #!/usr/bin/python

                  from collections import *
                  import re, sys

                  text=' '.join(sys.argv[1:])

                  t=Counter(re.findall(r"[w']+", text.lower()))

                  for item in t:
                  print item+"@"+str(t[item])

                  $ chmod 755 wdbag.py

                  $ ./wdbag.py "This is a test Test test test There are multiple tests."
                  a@1
                  tests@1
                  multiple@1
                  this@1
                  is@1
                  there@1
                  are@1
                  test@4

                  $ ./wdbag.py This is a test Test test test There are multiple tests.
                  a@1
                  tests@1
                  multiple@1
                  this@1
                  is@1
                  there@1
                  are@1
                  test@4


                  Ref: https://stackoverflow.com/a/11300418/3720510






                  share|improve this answer













                  $ cat >wdbag.py
                  #!/usr/bin/python

                  from collections import *
                  import re, sys

                  text=' '.join(sys.argv[1:])

                  t=Counter(re.findall(r"[w']+", text.lower()))

                  for item in t:
                  print item+"@"+str(t[item])

                  $ chmod 755 wdbag.py

                  $ ./wdbag.py "This is a test Test test test There are multiple tests."
                  a@1
                  tests@1
                  multiple@1
                  this@1
                  is@1
                  there@1
                  are@1
                  test@4

                  $ ./wdbag.py This is a test Test test test There are multiple tests.
                  a@1
                  tests@1
                  multiple@1
                  this@1
                  is@1
                  there@1
                  are@1
                  test@4


                  Ref: https://stackoverflow.com/a/11300418/3720510







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Apr 1 at 21:39









                  Hannu

                  32916




                  32916











                  • add a , last on the print -line to get the list one a single row.
                    – Hannu
                    Apr 1 at 21:48
















                  • add a , last on the print -line to get the list one a single row.
                    – Hannu
                    Apr 1 at 21:48















                  add a , last on the print -line to get the list one a single row.
                  – Hannu
                  Apr 1 at 21:48




                  add a , last on the print -line to get the list one a single row.
                  – Hannu
                  Apr 1 at 21:48










                  up vote
                  1
                  down vote













                  grep + sort + uniq + sed pipeline:



                  grep -o '[[:alnum:]]*' file | sort | uniq -c | sed -E 's/[[:space:]]*([0-9]+) (.+)/2@1/'


                  The output:



                  a@1
                  are@1
                  is@1
                  multiple@1
                  test@3
                  Test@1
                  tests@1
                  There@1
                  This@1





                  share|improve this answer
























                    up vote
                    1
                    down vote













                    grep + sort + uniq + sed pipeline:



                    grep -o '[[:alnum:]]*' file | sort | uniq -c | sed -E 's/[[:space:]]*([0-9]+) (.+)/2@1/'


                    The output:



                    a@1
                    are@1
                    is@1
                    multiple@1
                    test@3
                    Test@1
                    tests@1
                    There@1
                    This@1





                    share|improve this answer






















                      up vote
                      1
                      down vote










                      up vote
                      1
                      down vote









                      grep + sort + uniq + sed pipeline:



                      grep -o '[[:alnum:]]*' file | sort | uniq -c | sed -E 's/[[:space:]]*([0-9]+) (.+)/2@1/'


                      The output:



                      a@1
                      are@1
                      is@1
                      multiple@1
                      test@3
                      Test@1
                      tests@1
                      There@1
                      This@1





                      share|improve this answer












                      grep + sort + uniq + sed pipeline:



                      grep -o '[[:alnum:]]*' file | sort | uniq -c | sed -E 's/[[:space:]]*([0-9]+) (.+)/2@1/'


                      The output:



                      a@1
                      are@1
                      is@1
                      multiple@1
                      test@3
                      Test@1
                      tests@1
                      There@1
                      This@1






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Apr 1 at 16:45









                      RomanPerekhrest

                      22.4k12144




                      22.4k12144




















                          up vote
                          1
                          down vote













                          With awk only:



                           awk -v RS='( |\.|n)' 's[$0]++ 
                          ENDfor (x in s) printf "%s%s", SEP,x"@"s[x]; SEP=" "; print ""' infile


                          This defines the Record Separator either a space, dot or newline, then save fields into an array called s with the key as whole fields/words and for each seen of the words, increment the occurrences in array that represents the value of the keys in array.



                          At the END loop over the elements of the array and first print the keys (fields/words) x, a @ and their value as occurrences s[x].



                          The SEP as a variable used to add spaces between each words when printing and on second to the next words.






                          share|improve this answer


























                            up vote
                            1
                            down vote













                            With awk only:



                             awk -v RS='( |\.|n)' 's[$0]++ 
                            ENDfor (x in s) printf "%s%s", SEP,x"@"s[x]; SEP=" "; print ""' infile


                            This defines the Record Separator either a space, dot or newline, then save fields into an array called s with the key as whole fields/words and for each seen of the words, increment the occurrences in array that represents the value of the keys in array.



                            At the END loop over the elements of the array and first print the keys (fields/words) x, a @ and their value as occurrences s[x].



                            The SEP as a variable used to add spaces between each words when printing and on second to the next words.






                            share|improve this answer
























                              up vote
                              1
                              down vote










                              up vote
                              1
                              down vote









                              With awk only:



                               awk -v RS='( |\.|n)' 's[$0]++ 
                              ENDfor (x in s) printf "%s%s", SEP,x"@"s[x]; SEP=" "; print ""' infile


                              This defines the Record Separator either a space, dot or newline, then save fields into an array called s with the key as whole fields/words and for each seen of the words, increment the occurrences in array that represents the value of the keys in array.



                              At the END loop over the elements of the array and first print the keys (fields/words) x, a @ and their value as occurrences s[x].



                              The SEP as a variable used to add spaces between each words when printing and on second to the next words.






                              share|improve this answer














                              With awk only:



                               awk -v RS='( |\.|n)' 's[$0]++ 
                              ENDfor (x in s) printf "%s%s", SEP,x"@"s[x]; SEP=" "; print ""' infile


                              This defines the Record Separator either a space, dot or newline, then save fields into an array called s with the key as whole fields/words and for each seen of the words, increment the occurrences in array that represents the value of the keys in array.



                              At the END loop over the elements of the array and first print the keys (fields/words) x, a @ and their value as occurrences s[x].



                              The SEP as a variable used to add spaces between each words when printing and on second to the next words.







                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited Apr 2 at 2:59

























                              answered Apr 1 at 17:54









                              αғsнιη

                              14.8k82462




                              14.8k82462




















                                  up vote
                                  0
                                  down vote













                                  Using grep and awk..



                                   grep -o '[[:alnum:]]*' file | awk ' count[$0]++; nextEND ORS=" "; for (x in count)print x"@"count[x];print "n"'


                                  tests@1 Test@1 multiple@1 a@1 This@1 There@1 are@1 test@3 is@1






                                  share|improve this answer
























                                    up vote
                                    0
                                    down vote













                                    Using grep and awk..



                                     grep -o '[[:alnum:]]*' file | awk ' count[$0]++; nextEND ORS=" "; for (x in count)print x"@"count[x];print "n"'


                                    tests@1 Test@1 multiple@1 a@1 This@1 There@1 are@1 test@3 is@1






                                    share|improve this answer






















                                      up vote
                                      0
                                      down vote










                                      up vote
                                      0
                                      down vote









                                      Using grep and awk..



                                       grep -o '[[:alnum:]]*' file | awk ' count[$0]++; nextEND ORS=" "; for (x in count)print x"@"count[x];print "n"'


                                      tests@1 Test@1 multiple@1 a@1 This@1 There@1 are@1 test@3 is@1






                                      share|improve this answer












                                      Using grep and awk..



                                       grep -o '[[:alnum:]]*' file | awk ' count[$0]++; nextEND ORS=" "; for (x in count)print x"@"count[x];print "n"'


                                      tests@1 Test@1 multiple@1 a@1 This@1 There@1 are@1 test@3 is@1







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Apr 1 at 17:24









                                      Bharat

                                      3058




                                      3058




















                                          up vote
                                          0
                                          down vote













                                          gawk '

                                          for(i = 1; i <= NF; i++)
                                          arr[$i]++


                                          END
                                          PROCINFO["sorted_in"] = "@val_num_desc"

                                          for(i in arr)
                                          printf "%s@%s ", i, arr[i]

                                          print ""

                                          ' FPAT='[a-zA-Z]+' input.txt


                                          Explanation



                                          PROCINFO["sorted_in"] = "@val_num_desc" - Order by element values in descending order (rather than by indices). Scalar values are compared as numbers. See Predefined Array Scanning Orders.



                                          FPAT='[a-zA-Z]+' - A regular expression describing the contents of the fields in a record. When set, gawk
                                          parses the input into fields, where the fields match the regular expression, instead of
                                          using the value of the FS variable as the field separator.



                                          Input



                                          This is a test Test test test There are multiple tests.
                                          This is a test Test test test There are multiple tests.
                                          This is a test Test test test There are multiple tests.


                                          Output



                                          test@9 tests@3 Test@3 multiple@3 a@3 This@3 There@3 are@3 is@3 





                                          share|improve this answer






















                                          • Could you use the code I already posted, Im restricted to that format,
                                            – Samurai Bale
                                            Apr 1 at 20:49










                                          • @SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
                                            – MiniMax
                                            Apr 1 at 21:06















                                          up vote
                                          0
                                          down vote













                                          gawk '

                                          for(i = 1; i <= NF; i++)
                                          arr[$i]++


                                          END
                                          PROCINFO["sorted_in"] = "@val_num_desc"

                                          for(i in arr)
                                          printf "%s@%s ", i, arr[i]

                                          print ""

                                          ' FPAT='[a-zA-Z]+' input.txt


                                          Explanation



                                          PROCINFO["sorted_in"] = "@val_num_desc" - Order by element values in descending order (rather than by indices). Scalar values are compared as numbers. See Predefined Array Scanning Orders.



                                          FPAT='[a-zA-Z]+' - A regular expression describing the contents of the fields in a record. When set, gawk
                                          parses the input into fields, where the fields match the regular expression, instead of
                                          using the value of the FS variable as the field separator.



                                          Input



                                          This is a test Test test test There are multiple tests.
                                          This is a test Test test test There are multiple tests.
                                          This is a test Test test test There are multiple tests.


                                          Output



                                          test@9 tests@3 Test@3 multiple@3 a@3 This@3 There@3 are@3 is@3 





                                          share|improve this answer






















                                          • Could you use the code I already posted, Im restricted to that format,
                                            – Samurai Bale
                                            Apr 1 at 20:49










                                          • @SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
                                            – MiniMax
                                            Apr 1 at 21:06













                                          up vote
                                          0
                                          down vote










                                          up vote
                                          0
                                          down vote









                                          gawk '

                                          for(i = 1; i <= NF; i++)
                                          arr[$i]++


                                          END
                                          PROCINFO["sorted_in"] = "@val_num_desc"

                                          for(i in arr)
                                          printf "%s@%s ", i, arr[i]

                                          print ""

                                          ' FPAT='[a-zA-Z]+' input.txt


                                          Explanation



                                          PROCINFO["sorted_in"] = "@val_num_desc" - Order by element values in descending order (rather than by indices). Scalar values are compared as numbers. See Predefined Array Scanning Orders.



                                          FPAT='[a-zA-Z]+' - A regular expression describing the contents of the fields in a record. When set, gawk
                                          parses the input into fields, where the fields match the regular expression, instead of
                                          using the value of the FS variable as the field separator.



                                          Input



                                          This is a test Test test test There are multiple tests.
                                          This is a test Test test test There are multiple tests.
                                          This is a test Test test test There are multiple tests.


                                          Output



                                          test@9 tests@3 Test@3 multiple@3 a@3 This@3 There@3 are@3 is@3 





                                          share|improve this answer














                                          gawk '

                                          for(i = 1; i <= NF; i++)
                                          arr[$i]++


                                          END
                                          PROCINFO["sorted_in"] = "@val_num_desc"

                                          for(i in arr)
                                          printf "%s@%s ", i, arr[i]

                                          print ""

                                          ' FPAT='[a-zA-Z]+' input.txt


                                          Explanation



                                          PROCINFO["sorted_in"] = "@val_num_desc" - Order by element values in descending order (rather than by indices). Scalar values are compared as numbers. See Predefined Array Scanning Orders.



                                          FPAT='[a-zA-Z]+' - A regular expression describing the contents of the fields in a record. When set, gawk
                                          parses the input into fields, where the fields match the regular expression, instead of
                                          using the value of the FS variable as the field separator.



                                          Input



                                          This is a test Test test test There are multiple tests.
                                          This is a test Test test test There are multiple tests.
                                          This is a test Test test test There are multiple tests.


                                          Output



                                          test@9 tests@3 Test@3 multiple@3 a@3 This@3 There@3 are@3 is@3 






                                          share|improve this answer














                                          share|improve this answer



                                          share|improve this answer








                                          edited Apr 1 at 20:52

























                                          answered Apr 1 at 20:45









                                          MiniMax

                                          2,681718




                                          2,681718











                                          • Could you use the code I already posted, Im restricted to that format,
                                            – Samurai Bale
                                            Apr 1 at 20:49










                                          • @SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
                                            – MiniMax
                                            Apr 1 at 21:06

















                                          • Could you use the code I already posted, Im restricted to that format,
                                            – Samurai Bale
                                            Apr 1 at 20:49










                                          • @SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
                                            – MiniMax
                                            Apr 1 at 21:06
















                                          Could you use the code I already posted, Im restricted to that format,
                                          – Samurai Bale
                                          Apr 1 at 20:49




                                          Could you use the code I already posted, Im restricted to that format,
                                          – Samurai Bale
                                          Apr 1 at 20:49












                                          @SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
                                          – MiniMax
                                          Apr 1 at 21:06





                                          @SamuraiBale No, because of this - Why is using a shell loop to process text considered bad practice?. You use the inappropriate approach to solving your task. It is like coding game in the bash - it can be done (for example "tetris" in the terminal window), but the bash is the not right instrument for this. The same for text processing - the awk is the right tool for this, the bash loop is not.
                                          – MiniMax
                                          Apr 1 at 21:06











                                          up vote
                                          0
                                          down vote













                                          As OP asked in the same kind of format...



                                          bash-4.1$ cat test.sh
                                          #!/bin/bash

                                          tr ' ' 'n' < $1 > temp
                                          while read line
                                          do
                                          count=$(grep -cw $line temp)
                                          echo -n "$line@$count "
                                          done < temp
                                          echo ""

                                          bash-4.1$ bash test.sh test.txt
                                          This@1 is@1 a@1 test@3 Test@1 test@3 test@3 There@1 are@1 multiple@1 tests.@1

                                          bash-4.1$ cat test.txt
                                          This is a test Test test test There are multiple tests.





                                          share|improve this answer


























                                            up vote
                                            0
                                            down vote













                                            As OP asked in the same kind of format...



                                            bash-4.1$ cat test.sh
                                            #!/bin/bash

                                            tr ' ' 'n' < $1 > temp
                                            while read line
                                            do
                                            count=$(grep -cw $line temp)
                                            echo -n "$line@$count "
                                            done < temp
                                            echo ""

                                            bash-4.1$ bash test.sh test.txt
                                            This@1 is@1 a@1 test@3 Test@1 test@3 test@3 There@1 are@1 multiple@1 tests.@1

                                            bash-4.1$ cat test.txt
                                            This is a test Test test test There are multiple tests.





                                            share|improve this answer
























                                              up vote
                                              0
                                              down vote










                                              up vote
                                              0
                                              down vote









                                              As OP asked in the same kind of format...



                                              bash-4.1$ cat test.sh
                                              #!/bin/bash

                                              tr ' ' 'n' < $1 > temp
                                              while read line
                                              do
                                              count=$(grep -cw $line temp)
                                              echo -n "$line@$count "
                                              done < temp
                                              echo ""

                                              bash-4.1$ bash test.sh test.txt
                                              This@1 is@1 a@1 test@3 Test@1 test@3 test@3 There@1 are@1 multiple@1 tests.@1

                                              bash-4.1$ cat test.txt
                                              This is a test Test test test There are multiple tests.





                                              share|improve this answer














                                              As OP asked in the same kind of format...



                                              bash-4.1$ cat test.sh
                                              #!/bin/bash

                                              tr ' ' 'n' < $1 > temp
                                              while read line
                                              do
                                              count=$(grep -cw $line temp)
                                              echo -n "$line@$count "
                                              done < temp
                                              echo ""

                                              bash-4.1$ bash test.sh test.txt
                                              This@1 is@1 a@1 test@3 Test@1 test@3 test@3 There@1 are@1 multiple@1 tests.@1

                                              bash-4.1$ cat test.txt
                                              This is a test Test test test There are multiple tests.






                                              share|improve this answer














                                              share|improve this answer



                                              share|improve this answer








                                              edited Apr 11 at 22:37









                                              Drakonoved

                                              674518




                                              674518










                                              answered Apr 2 at 2:42









                                              Kamaraj

                                              2,5891312




                                              2,5891312






















                                                   

                                                  draft saved


                                                  draft discarded


























                                                   


                                                  draft saved


                                                  draft discarded














                                                  StackExchange.ready(
                                                  function ()
                                                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f434860%2ftrying-to-find-the-frequency-of-words-in-a-file-using-a-script%23new-answer', 'question_page');

                                                  );

                                                  Post as a guest













































































                                                  Popular posts from this blog

                                                  How to check contact read email or not when send email to Individual?

                                                  Bahrain

                                                  Postfix configuration issue with fips on centos 7; mailgun relay