Count uniq instances of blocks of 2 lines [closed]
Clash Royale CLAN TAG#URR8PPP
up vote
-1
down vote
favorite
Given input:
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
How can I output:
144.252.36.69
afrloop=32235330165603 3 times
222.252.36.69
afrloop=31135330165603 4 times
uniq
closed as unclear what you're asking by Jeff Schaller, Michael Homer, hildred, Stephen Rauch, G-Man Dec 14 '17 at 4:54
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |Â
up vote
-1
down vote
favorite
Given input:
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
How can I output:
144.252.36.69
afrloop=32235330165603 3 times
222.252.36.69
afrloop=31135330165603 4 times
uniq
closed as unclear what you're asking by Jeff Schaller, Michael Homer, hildred, Stephen Rauch, G-Man Dec 14 '17 at 4:54
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
PSA: Please don't post images of text
â Wildcard
Dec 15 '17 at 3:36
add a comment |Â
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
Given input:
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
How can I output:
144.252.36.69
afrloop=32235330165603 3 times
222.252.36.69
afrloop=31135330165603 4 times
uniq
Given input:
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
How can I output:
144.252.36.69
afrloop=32235330165603 3 times
222.252.36.69
afrloop=31135330165603 4 times
uniq
edited Dec 15 '17 at 3:38
Wildcard
22k855154
22k855154
asked Dec 14 '17 at 1:53
ÃÂặng Thắng
63
63
closed as unclear what you're asking by Jeff Schaller, Michael Homer, hildred, Stephen Rauch, G-Man Dec 14 '17 at 4:54
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as unclear what you're asking by Jeff Schaller, Michael Homer, hildred, Stephen Rauch, G-Man Dec 14 '17 at 4:54
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, itâÂÂs hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
PSA: Please don't post images of text
â Wildcard
Dec 15 '17 at 3:36
add a comment |Â
PSA: Please don't post images of text
â Wildcard
Dec 15 '17 at 3:36
PSA: Please don't post images of text
â Wildcard
Dec 15 '17 at 3:36
PSA: Please don't post images of text
â Wildcard
Dec 15 '17 at 3:36
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
paste - - < file | sort | uniq -c
Nice! Took me a little while to understand what paste does in this case :-)
â NickD
Dec 14 '17 at 3:27
add a comment |Â
up vote
2
down vote
Here is a solution with awk
if you want a customized output format
NR%2==1 ip=$0; next
NR%2==0 a[ip"n"$0]++
END
for(i in a)
printf "%s %d timesn", i, a[i]
the script can be executed as
awk -f main.awk file
Explanation
First, we use
NR%2==1
to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line$0
into a variable calledip
. We can usenext
to skip any further processing and go straight to the next iteration.Second, we use
NR%2==0
to match even number lines, if a line matches then we create an index labeled asip"n"$0
in an arraya
and increment the count value of that specific index. For example, an equivalent expansion would be likea["144.252.36.69 afrloop=32235330165603"] += 1
I ignored the new line
n
in this example just for simplicityFinally at
END
, after each line has been processed, we use afor
loop to print out the value of each element inside arraya
which in our case is the count number for each unique index
Fun Benchmark
Test file generation (1 million records)
awk '
BEGINfor(i=1;i<10000000;i++)
printf "%dnafrLoop=%dn", int(rand()*100), int(rand()*10)
' > test
$ head test
23
afrLoop=2
84
afrLoop=1
58@n.caillou paste solution
$ time paste - - < test | sort | uniq -c > /dev/null
real 0m11.250s
user 0m11.352s
sys 0m0.272sawk solution
$ time awk -f main.awk test > /dev/null
real 0m5.673s
user 0m5.636s
sys 0m0.036s
i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
â ÃÂặng Thắng
Dec 14 '17 at 6:55
@ÃÂặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
â etopylight
Dec 14 '17 at 7:40
can u explain for me with ur script of u. ..tks
â ÃÂặng Thắng
Dec 15 '17 at 1:42
@ÃÂặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
â etopylight
Dec 15 '17 at 3:26
1
much more elegant than myawk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
â Tim Kennedy
Dec 21 '17 at 4:26
 |Â
show 1 more comment
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
paste - - < file | sort | uniq -c
Nice! Took me a little while to understand what paste does in this case :-)
â NickD
Dec 14 '17 at 3:27
add a comment |Â
up vote
2
down vote
accepted
paste - - < file | sort | uniq -c
Nice! Took me a little while to understand what paste does in this case :-)
â NickD
Dec 14 '17 at 3:27
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
paste - - < file | sort | uniq -c
paste - - < file | sort | uniq -c
answered Dec 14 '17 at 3:20
n.caillou
29216
29216
Nice! Took me a little while to understand what paste does in this case :-)
â NickD
Dec 14 '17 at 3:27
add a comment |Â
Nice! Took me a little while to understand what paste does in this case :-)
â NickD
Dec 14 '17 at 3:27
Nice! Took me a little while to understand what paste does in this case :-)
â NickD
Dec 14 '17 at 3:27
Nice! Took me a little while to understand what paste does in this case :-)
â NickD
Dec 14 '17 at 3:27
add a comment |Â
up vote
2
down vote
Here is a solution with awk
if you want a customized output format
NR%2==1 ip=$0; next
NR%2==0 a[ip"n"$0]++
END
for(i in a)
printf "%s %d timesn", i, a[i]
the script can be executed as
awk -f main.awk file
Explanation
First, we use
NR%2==1
to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line$0
into a variable calledip
. We can usenext
to skip any further processing and go straight to the next iteration.Second, we use
NR%2==0
to match even number lines, if a line matches then we create an index labeled asip"n"$0
in an arraya
and increment the count value of that specific index. For example, an equivalent expansion would be likea["144.252.36.69 afrloop=32235330165603"] += 1
I ignored the new line
n
in this example just for simplicityFinally at
END
, after each line has been processed, we use afor
loop to print out the value of each element inside arraya
which in our case is the count number for each unique index
Fun Benchmark
Test file generation (1 million records)
awk '
BEGINfor(i=1;i<10000000;i++)
printf "%dnafrLoop=%dn", int(rand()*100), int(rand()*10)
' > test
$ head test
23
afrLoop=2
84
afrLoop=1
58@n.caillou paste solution
$ time paste - - < test | sort | uniq -c > /dev/null
real 0m11.250s
user 0m11.352s
sys 0m0.272sawk solution
$ time awk -f main.awk test > /dev/null
real 0m5.673s
user 0m5.636s
sys 0m0.036s
i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
â ÃÂặng Thắng
Dec 14 '17 at 6:55
@ÃÂặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
â etopylight
Dec 14 '17 at 7:40
can u explain for me with ur script of u. ..tks
â ÃÂặng Thắng
Dec 15 '17 at 1:42
@ÃÂặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
â etopylight
Dec 15 '17 at 3:26
1
much more elegant than myawk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
â Tim Kennedy
Dec 21 '17 at 4:26
 |Â
show 1 more comment
up vote
2
down vote
Here is a solution with awk
if you want a customized output format
NR%2==1 ip=$0; next
NR%2==0 a[ip"n"$0]++
END
for(i in a)
printf "%s %d timesn", i, a[i]
the script can be executed as
awk -f main.awk file
Explanation
First, we use
NR%2==1
to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line$0
into a variable calledip
. We can usenext
to skip any further processing and go straight to the next iteration.Second, we use
NR%2==0
to match even number lines, if a line matches then we create an index labeled asip"n"$0
in an arraya
and increment the count value of that specific index. For example, an equivalent expansion would be likea["144.252.36.69 afrloop=32235330165603"] += 1
I ignored the new line
n
in this example just for simplicityFinally at
END
, after each line has been processed, we use afor
loop to print out the value of each element inside arraya
which in our case is the count number for each unique index
Fun Benchmark
Test file generation (1 million records)
awk '
BEGINfor(i=1;i<10000000;i++)
printf "%dnafrLoop=%dn", int(rand()*100), int(rand()*10)
' > test
$ head test
23
afrLoop=2
84
afrLoop=1
58@n.caillou paste solution
$ time paste - - < test | sort | uniq -c > /dev/null
real 0m11.250s
user 0m11.352s
sys 0m0.272sawk solution
$ time awk -f main.awk test > /dev/null
real 0m5.673s
user 0m5.636s
sys 0m0.036s
i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
â ÃÂặng Thắng
Dec 14 '17 at 6:55
@ÃÂặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
â etopylight
Dec 14 '17 at 7:40
can u explain for me with ur script of u. ..tks
â ÃÂặng Thắng
Dec 15 '17 at 1:42
@ÃÂặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
â etopylight
Dec 15 '17 at 3:26
1
much more elegant than myawk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
â Tim Kennedy
Dec 21 '17 at 4:26
 |Â
show 1 more comment
up vote
2
down vote
up vote
2
down vote
Here is a solution with awk
if you want a customized output format
NR%2==1 ip=$0; next
NR%2==0 a[ip"n"$0]++
END
for(i in a)
printf "%s %d timesn", i, a[i]
the script can be executed as
awk -f main.awk file
Explanation
First, we use
NR%2==1
to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line$0
into a variable calledip
. We can usenext
to skip any further processing and go straight to the next iteration.Second, we use
NR%2==0
to match even number lines, if a line matches then we create an index labeled asip"n"$0
in an arraya
and increment the count value of that specific index. For example, an equivalent expansion would be likea["144.252.36.69 afrloop=32235330165603"] += 1
I ignored the new line
n
in this example just for simplicityFinally at
END
, after each line has been processed, we use afor
loop to print out the value of each element inside arraya
which in our case is the count number for each unique index
Fun Benchmark
Test file generation (1 million records)
awk '
BEGINfor(i=1;i<10000000;i++)
printf "%dnafrLoop=%dn", int(rand()*100), int(rand()*10)
' > test
$ head test
23
afrLoop=2
84
afrLoop=1
58@n.caillou paste solution
$ time paste - - < test | sort | uniq -c > /dev/null
real 0m11.250s
user 0m11.352s
sys 0m0.272sawk solution
$ time awk -f main.awk test > /dev/null
real 0m5.673s
user 0m5.636s
sys 0m0.036s
Here is a solution with awk
if you want a customized output format
NR%2==1 ip=$0; next
NR%2==0 a[ip"n"$0]++
END
for(i in a)
printf "%s %d timesn", i, a[i]
the script can be executed as
awk -f main.awk file
Explanation
First, we use
NR%2==1
to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line$0
into a variable calledip
. We can usenext
to skip any further processing and go straight to the next iteration.Second, we use
NR%2==0
to match even number lines, if a line matches then we create an index labeled asip"n"$0
in an arraya
and increment the count value of that specific index. For example, an equivalent expansion would be likea["144.252.36.69 afrloop=32235330165603"] += 1
I ignored the new line
n
in this example just for simplicityFinally at
END
, after each line has been processed, we use afor
loop to print out the value of each element inside arraya
which in our case is the count number for each unique index
Fun Benchmark
Test file generation (1 million records)
awk '
BEGINfor(i=1;i<10000000;i++)
printf "%dnafrLoop=%dn", int(rand()*100), int(rand()*10)
' > test
$ head test
23
afrLoop=2
84
afrLoop=1
58@n.caillou paste solution
$ time paste - - < test | sort | uniq -c > /dev/null
real 0m11.250s
user 0m11.352s
sys 0m0.272sawk solution
$ time awk -f main.awk test > /dev/null
real 0m5.673s
user 0m5.636s
sys 0m0.036s
edited Dec 15 '17 at 4:52
answered Dec 14 '17 at 3:54
etopylight
383117
383117
i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
â ÃÂặng Thắng
Dec 14 '17 at 6:55
@ÃÂặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
â etopylight
Dec 14 '17 at 7:40
can u explain for me with ur script of u. ..tks
â ÃÂặng Thắng
Dec 15 '17 at 1:42
@ÃÂặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
â etopylight
Dec 15 '17 at 3:26
1
much more elegant than myawk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
â Tim Kennedy
Dec 21 '17 at 4:26
 |Â
show 1 more comment
i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
â ÃÂặng Thắng
Dec 14 '17 at 6:55
@ÃÂặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
â etopylight
Dec 14 '17 at 7:40
can u explain for me with ur script of u. ..tks
â ÃÂặng Thắng
Dec 15 '17 at 1:42
@ÃÂặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
â etopylight
Dec 15 '17 at 3:26
1
much more elegant than myawk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
â Tim Kennedy
Dec 21 '17 at 4:26
i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
â ÃÂặng Thắng
Dec 14 '17 at 6:55
i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
â ÃÂặng Thắng
Dec 14 '17 at 6:55
@ÃÂặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
â etopylight
Dec 14 '17 at 7:40
@ÃÂặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
â etopylight
Dec 14 '17 at 7:40
can u explain for me with ur script of u. ..tks
â ÃÂặng Thắng
Dec 15 '17 at 1:42
can u explain for me with ur script of u. ..tks
â ÃÂặng Thắng
Dec 15 '17 at 1:42
@ÃÂặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
â etopylight
Dec 15 '17 at 3:26
@ÃÂặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
â etopylight
Dec 15 '17 at 3:26
1
1
much more elegant than my
awk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
â Tim Kennedy
Dec 21 '17 at 4:26
much more elegant than my
awk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
â Tim Kennedy
Dec 21 '17 at 4:26
 |Â
show 1 more comment
PSA: Please don't post images of text
â Wildcard
Dec 15 '17 at 3:36