Counting occurrences of word in text file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite
1












I have a text file containing tweets and I'm required to count the number of times a word is mentioned in the tweet. For example, the file contains:



Apple iPhone X is going to worth a fortune
The iPhone X is Apple's latest flagship iPhone. How will it pit against it's competitors?


And let's say I want to count how many times the word iPhone is mentioned in the file. So here's what I've tried.



cut -f 1 Tweet_Data | grep -i "iPhone" | wc -l


it certainly works but I'm confused about the 'wc' command in unix. What is the difference if I try something like:



cut -f 1 Tweet_Data | grep -c "iPhone"


where -c is used instead? Both of these yield different results in a large file full of tweets and I'm confused on how it works. Which method is the correct way of counting the occurrence?







share|improve this question






















  • cut -f1 is cutting based on tabs, which isn't doing much here. Are you sure that wc -l is really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".
    – Jeff Schaller
    Oct 16 '17 at 13:28










  • Another technique: unix.stackexchange.com/q/39039/117549
    – Jeff Schaller
    Oct 16 '17 at 13:33










  • Also similar: unix.stackexchange.com/q/60727/117549
    – Jeff Schaller
    Oct 16 '17 at 13:37














up vote
6
down vote

favorite
1












I have a text file containing tweets and I'm required to count the number of times a word is mentioned in the tweet. For example, the file contains:



Apple iPhone X is going to worth a fortune
The iPhone X is Apple's latest flagship iPhone. How will it pit against it's competitors?


And let's say I want to count how many times the word iPhone is mentioned in the file. So here's what I've tried.



cut -f 1 Tweet_Data | grep -i "iPhone" | wc -l


it certainly works but I'm confused about the 'wc' command in unix. What is the difference if I try something like:



cut -f 1 Tweet_Data | grep -c "iPhone"


where -c is used instead? Both of these yield different results in a large file full of tweets and I'm confused on how it works. Which method is the correct way of counting the occurrence?







share|improve this question






















  • cut -f1 is cutting based on tabs, which isn't doing much here. Are you sure that wc -l is really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".
    – Jeff Schaller
    Oct 16 '17 at 13:28










  • Another technique: unix.stackexchange.com/q/39039/117549
    – Jeff Schaller
    Oct 16 '17 at 13:33










  • Also similar: unix.stackexchange.com/q/60727/117549
    – Jeff Schaller
    Oct 16 '17 at 13:37












up vote
6
down vote

favorite
1









up vote
6
down vote

favorite
1






1





I have a text file containing tweets and I'm required to count the number of times a word is mentioned in the tweet. For example, the file contains:



Apple iPhone X is going to worth a fortune
The iPhone X is Apple's latest flagship iPhone. How will it pit against it's competitors?


And let's say I want to count how many times the word iPhone is mentioned in the file. So here's what I've tried.



cut -f 1 Tweet_Data | grep -i "iPhone" | wc -l


it certainly works but I'm confused about the 'wc' command in unix. What is the difference if I try something like:



cut -f 1 Tweet_Data | grep -c "iPhone"


where -c is used instead? Both of these yield different results in a large file full of tweets and I'm confused on how it works. Which method is the correct way of counting the occurrence?







share|improve this question














I have a text file containing tweets and I'm required to count the number of times a word is mentioned in the tweet. For example, the file contains:



Apple iPhone X is going to worth a fortune
The iPhone X is Apple's latest flagship iPhone. How will it pit against it's competitors?


And let's say I want to count how many times the word iPhone is mentioned in the file. So here's what I've tried.



cut -f 1 Tweet_Data | grep -i "iPhone" | wc -l


it certainly works but I'm confused about the 'wc' command in unix. What is the difference if I try something like:



cut -f 1 Tweet_Data | grep -c "iPhone"


where -c is used instead? Both of these yield different results in a large file full of tweets and I'm confused on how it works. Which method is the correct way of counting the occurrence?









share|improve this question













share|improve this question




share|improve this question








edited Oct 16 '17 at 13:26









Jeff Schaller

32.1k849109




32.1k849109










asked Oct 16 '17 at 13:23









Maxxx

148113




148113











  • cut -f1 is cutting based on tabs, which isn't doing much here. Are you sure that wc -l is really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".
    – Jeff Schaller
    Oct 16 '17 at 13:28










  • Another technique: unix.stackexchange.com/q/39039/117549
    – Jeff Schaller
    Oct 16 '17 at 13:33










  • Also similar: unix.stackexchange.com/q/60727/117549
    – Jeff Schaller
    Oct 16 '17 at 13:37
















  • cut -f1 is cutting based on tabs, which isn't doing much here. Are you sure that wc -l is really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".
    – Jeff Schaller
    Oct 16 '17 at 13:28










  • Another technique: unix.stackexchange.com/q/39039/117549
    – Jeff Schaller
    Oct 16 '17 at 13:33










  • Also similar: unix.stackexchange.com/q/60727/117549
    – Jeff Schaller
    Oct 16 '17 at 13:37















cut -f1 is cutting based on tabs, which isn't doing much here. Are you sure that wc -l is really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".
– Jeff Schaller
Oct 16 '17 at 13:28




cut -f1 is cutting based on tabs, which isn't doing much here. Are you sure that wc -l is really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".
– Jeff Schaller
Oct 16 '17 at 13:28












Another technique: unix.stackexchange.com/q/39039/117549
– Jeff Schaller
Oct 16 '17 at 13:33




Another technique: unix.stackexchange.com/q/39039/117549
– Jeff Schaller
Oct 16 '17 at 13:33












Also similar: unix.stackexchange.com/q/60727/117549
– Jeff Schaller
Oct 16 '17 at 13:37




Also similar: unix.stackexchange.com/q/60727/117549
– Jeff Schaller
Oct 16 '17 at 13:37










1 Answer
1






active

oldest

votes

















up vote
17
down vote



accepted










Given such a requirement, I would use a GNU grep (for the -o option), then pass it through wc to count the total number of occurrences:



$ grep -o -i iphone Tweet_Data | wc -l
3


Plain grep -c on the data will count the number of lines that match, not the total number of words that match. Using the -o option tells grep to output each match on its on line, no matter how many times the match is in the line.



wc -l tells the wc utility to count the number of lines. After grep puts each match in its own line, this is the total number of occurrences of the word in the input.




If GNU grep is not available (or desired), you could transform the input with tr so that each word is on its own line, then use grep -c to count:



$ tr '[:space:]' '[n*]' < Tweet_Data | grep -i -c iphone
3





share|improve this answer






















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f398413%2fcounting-occurrences-of-word-in-text-file%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    17
    down vote



    accepted










    Given such a requirement, I would use a GNU grep (for the -o option), then pass it through wc to count the total number of occurrences:



    $ grep -o -i iphone Tweet_Data | wc -l
    3


    Plain grep -c on the data will count the number of lines that match, not the total number of words that match. Using the -o option tells grep to output each match on its on line, no matter how many times the match is in the line.



    wc -l tells the wc utility to count the number of lines. After grep puts each match in its own line, this is the total number of occurrences of the word in the input.




    If GNU grep is not available (or desired), you could transform the input with tr so that each word is on its own line, then use grep -c to count:



    $ tr '[:space:]' '[n*]' < Tweet_Data | grep -i -c iphone
    3





    share|improve this answer


























      up vote
      17
      down vote



      accepted










      Given such a requirement, I would use a GNU grep (for the -o option), then pass it through wc to count the total number of occurrences:



      $ grep -o -i iphone Tweet_Data | wc -l
      3


      Plain grep -c on the data will count the number of lines that match, not the total number of words that match. Using the -o option tells grep to output each match on its on line, no matter how many times the match is in the line.



      wc -l tells the wc utility to count the number of lines. After grep puts each match in its own line, this is the total number of occurrences of the word in the input.




      If GNU grep is not available (or desired), you could transform the input with tr so that each word is on its own line, then use grep -c to count:



      $ tr '[:space:]' '[n*]' < Tweet_Data | grep -i -c iphone
      3





      share|improve this answer
























        up vote
        17
        down vote



        accepted







        up vote
        17
        down vote



        accepted






        Given such a requirement, I would use a GNU grep (for the -o option), then pass it through wc to count the total number of occurrences:



        $ grep -o -i iphone Tweet_Data | wc -l
        3


        Plain grep -c on the data will count the number of lines that match, not the total number of words that match. Using the -o option tells grep to output each match on its on line, no matter how many times the match is in the line.



        wc -l tells the wc utility to count the number of lines. After grep puts each match in its own line, this is the total number of occurrences of the word in the input.




        If GNU grep is not available (or desired), you could transform the input with tr so that each word is on its own line, then use grep -c to count:



        $ tr '[:space:]' '[n*]' < Tweet_Data | grep -i -c iphone
        3





        share|improve this answer














        Given such a requirement, I would use a GNU grep (for the -o option), then pass it through wc to count the total number of occurrences:



        $ grep -o -i iphone Tweet_Data | wc -l
        3


        Plain grep -c on the data will count the number of lines that match, not the total number of words that match. Using the -o option tells grep to output each match on its on line, no matter how many times the match is in the line.



        wc -l tells the wc utility to count the number of lines. After grep puts each match in its own line, this is the total number of occurrences of the word in the input.




        If GNU grep is not available (or desired), you could transform the input with tr so that each word is on its own line, then use grep -c to count:



        $ tr '[:space:]' '[n*]' < Tweet_Data | grep -i -c iphone
        3






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Oct 16 '17 at 14:04

























        answered Oct 16 '17 at 13:31









        Jeff Schaller

        32.1k849109




        32.1k849109



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f398413%2fcounting-occurrences-of-word-in-text-file%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Peggy Mitchell

            Palaiologos

            The Forum (Inglewood, California)