Counting occurrences of word in text file

Clash Royale CLAN TAG#URR8PPP
up vote
6
down vote
favorite
I have a text file containing tweets and I'm required to count the number of times a word is mentioned in the tweet. For example, the file contains:
Apple iPhone X is going to worth a fortune
The iPhone X is Apple's latest flagship iPhone. How will it pit against it's competitors?
And let's say I want to count how many times the word iPhone is mentioned in the file. So here's what I've tried.
cut -f 1 Tweet_Data | grep -i "iPhone" | wc -l
it certainly works but I'm confused about the 'wc' command in unix. What is the difference if I try something like:
cut -f 1 Tweet_Data | grep -c "iPhone"
where -c is used instead? Both of these yield different results in a large file full of tweets and I'm confused on how it works. Which method is the correct way of counting the occurrence?
text-processing grep cut
add a comment |Â
up vote
6
down vote
favorite
I have a text file containing tweets and I'm required to count the number of times a word is mentioned in the tweet. For example, the file contains:
Apple iPhone X is going to worth a fortune
The iPhone X is Apple's latest flagship iPhone. How will it pit against it's competitors?
And let's say I want to count how many times the word iPhone is mentioned in the file. So here's what I've tried.
cut -f 1 Tweet_Data | grep -i "iPhone" | wc -l
it certainly works but I'm confused about the 'wc' command in unix. What is the difference if I try something like:
cut -f 1 Tweet_Data | grep -c "iPhone"
where -c is used instead? Both of these yield different results in a large file full of tweets and I'm confused on how it works. Which method is the correct way of counting the occurrence?
text-processing grep cut
cut -f1is cutting based on tabs, which isn't doing much here. Are you sure thatwc -lis really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".
â Jeff Schaller
Oct 16 '17 at 13:28
Another technique: unix.stackexchange.com/q/39039/117549
â Jeff Schaller
Oct 16 '17 at 13:33
Also similar: unix.stackexchange.com/q/60727/117549
â Jeff Schaller
Oct 16 '17 at 13:37
add a comment |Â
up vote
6
down vote
favorite
up vote
6
down vote
favorite
I have a text file containing tweets and I'm required to count the number of times a word is mentioned in the tweet. For example, the file contains:
Apple iPhone X is going to worth a fortune
The iPhone X is Apple's latest flagship iPhone. How will it pit against it's competitors?
And let's say I want to count how many times the word iPhone is mentioned in the file. So here's what I've tried.
cut -f 1 Tweet_Data | grep -i "iPhone" | wc -l
it certainly works but I'm confused about the 'wc' command in unix. What is the difference if I try something like:
cut -f 1 Tweet_Data | grep -c "iPhone"
where -c is used instead? Both of these yield different results in a large file full of tweets and I'm confused on how it works. Which method is the correct way of counting the occurrence?
text-processing grep cut
I have a text file containing tweets and I'm required to count the number of times a word is mentioned in the tweet. For example, the file contains:
Apple iPhone X is going to worth a fortune
The iPhone X is Apple's latest flagship iPhone. How will it pit against it's competitors?
And let's say I want to count how many times the word iPhone is mentioned in the file. So here's what I've tried.
cut -f 1 Tweet_Data | grep -i "iPhone" | wc -l
it certainly works but I'm confused about the 'wc' command in unix. What is the difference if I try something like:
cut -f 1 Tweet_Data | grep -c "iPhone"
where -c is used instead? Both of these yield different results in a large file full of tweets and I'm confused on how it works. Which method is the correct way of counting the occurrence?
text-processing grep cut
edited Oct 16 '17 at 13:26
Jeff Schaller
32.1k849109
32.1k849109
asked Oct 16 '17 at 13:23
Maxxx
148113
148113
cut -f1is cutting based on tabs, which isn't doing much here. Are you sure thatwc -lis really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".
â Jeff Schaller
Oct 16 '17 at 13:28
Another technique: unix.stackexchange.com/q/39039/117549
â Jeff Schaller
Oct 16 '17 at 13:33
Also similar: unix.stackexchange.com/q/60727/117549
â Jeff Schaller
Oct 16 '17 at 13:37
add a comment |Â
cut -f1is cutting based on tabs, which isn't doing much here. Are you sure thatwc -lis really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".
â Jeff Schaller
Oct 16 '17 at 13:28
Another technique: unix.stackexchange.com/q/39039/117549
â Jeff Schaller
Oct 16 '17 at 13:33
Also similar: unix.stackexchange.com/q/60727/117549
â Jeff Schaller
Oct 16 '17 at 13:37
cut -f1 is cutting based on tabs, which isn't doing much here. Are you sure that wc -l is really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".â Jeff Schaller
Oct 16 '17 at 13:28
cut -f1 is cutting based on tabs, which isn't doing much here. Are you sure that wc -l is really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".â Jeff Schaller
Oct 16 '17 at 13:28
Another technique: unix.stackexchange.com/q/39039/117549
â Jeff Schaller
Oct 16 '17 at 13:33
Another technique: unix.stackexchange.com/q/39039/117549
â Jeff Schaller
Oct 16 '17 at 13:33
Also similar: unix.stackexchange.com/q/60727/117549
â Jeff Schaller
Oct 16 '17 at 13:37
Also similar: unix.stackexchange.com/q/60727/117549
â Jeff Schaller
Oct 16 '17 at 13:37
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
17
down vote
accepted
Given such a requirement, I would use a GNU grep (for the -o option), then pass it through wc to count the total number of occurrences:
$ grep -o -i iphone Tweet_Data | wc -l
3
Plain grep -c on the data will count the number of lines that match, not the total number of words that match. Using the -o option tells grep to output each match on its on line, no matter how many times the match is in the line.
wc -l tells the wc utility to count the number of lines. After grep puts each match in its own line, this is the total number of occurrences of the word in the input.
If GNU grep is not available (or desired), you could transform the input with tr so that each word is on its own line, then use grep -c to count:
$ tr '[:space:]' '[n*]' < Tweet_Data | grep -i -c iphone
3
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
17
down vote
accepted
Given such a requirement, I would use a GNU grep (for the -o option), then pass it through wc to count the total number of occurrences:
$ grep -o -i iphone Tweet_Data | wc -l
3
Plain grep -c on the data will count the number of lines that match, not the total number of words that match. Using the -o option tells grep to output each match on its on line, no matter how many times the match is in the line.
wc -l tells the wc utility to count the number of lines. After grep puts each match in its own line, this is the total number of occurrences of the word in the input.
If GNU grep is not available (or desired), you could transform the input with tr so that each word is on its own line, then use grep -c to count:
$ tr '[:space:]' '[n*]' < Tweet_Data | grep -i -c iphone
3
add a comment |Â
up vote
17
down vote
accepted
Given such a requirement, I would use a GNU grep (for the -o option), then pass it through wc to count the total number of occurrences:
$ grep -o -i iphone Tweet_Data | wc -l
3
Plain grep -c on the data will count the number of lines that match, not the total number of words that match. Using the -o option tells grep to output each match on its on line, no matter how many times the match is in the line.
wc -l tells the wc utility to count the number of lines. After grep puts each match in its own line, this is the total number of occurrences of the word in the input.
If GNU grep is not available (or desired), you could transform the input with tr so that each word is on its own line, then use grep -c to count:
$ tr '[:space:]' '[n*]' < Tweet_Data | grep -i -c iphone
3
add a comment |Â
up vote
17
down vote
accepted
up vote
17
down vote
accepted
Given such a requirement, I would use a GNU grep (for the -o option), then pass it through wc to count the total number of occurrences:
$ grep -o -i iphone Tweet_Data | wc -l
3
Plain grep -c on the data will count the number of lines that match, not the total number of words that match. Using the -o option tells grep to output each match on its on line, no matter how many times the match is in the line.
wc -l tells the wc utility to count the number of lines. After grep puts each match in its own line, this is the total number of occurrences of the word in the input.
If GNU grep is not available (or desired), you could transform the input with tr so that each word is on its own line, then use grep -c to count:
$ tr '[:space:]' '[n*]' < Tweet_Data | grep -i -c iphone
3
Given such a requirement, I would use a GNU grep (for the -o option), then pass it through wc to count the total number of occurrences:
$ grep -o -i iphone Tweet_Data | wc -l
3
Plain grep -c on the data will count the number of lines that match, not the total number of words that match. Using the -o option tells grep to output each match on its on line, no matter how many times the match is in the line.
wc -l tells the wc utility to count the number of lines. After grep puts each match in its own line, this is the total number of occurrences of the word in the input.
If GNU grep is not available (or desired), you could transform the input with tr so that each word is on its own line, then use grep -c to count:
$ tr '[:space:]' '[n*]' < Tweet_Data | grep -i -c iphone
3
edited Oct 16 '17 at 14:04
answered Oct 16 '17 at 13:31
Jeff Schaller
32.1k849109
32.1k849109
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f398413%2fcounting-occurrences-of-word-in-text-file%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
cut -f1is cutting based on tabs, which isn't doing much here. Are you sure thatwc -lis really giving you the correct count? It would show 2 here, but I count 3 instances of "iPhone".â Jeff Schaller
Oct 16 '17 at 13:28
Another technique: unix.stackexchange.com/q/39039/117549
â Jeff Schaller
Oct 16 '17 at 13:33
Also similar: unix.stackexchange.com/q/60727/117549
â Jeff Schaller
Oct 16 '17 at 13:37