Count uniq instances of blocks of 2 lines [closed]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-1
down vote

favorite












Given input:



144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603


How can I output:



144.252.36.69
afrloop=32235330165603 3 times
222.252.36.69
afrloop=31135330165603 4 times






share|improve this question














closed as unclear what you're asking by Jeff Schaller, Michael Homer, hildred, Stephen Rauch, G-Man Dec 14 '17 at 4:54


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.














  • PSA: Please don't post images of text
    – Wildcard
    Dec 15 '17 at 3:36














up vote
-1
down vote

favorite












Given input:



144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603


How can I output:



144.252.36.69
afrloop=32235330165603 3 times
222.252.36.69
afrloop=31135330165603 4 times






share|improve this question














closed as unclear what you're asking by Jeff Schaller, Michael Homer, hildred, Stephen Rauch, G-Man Dec 14 '17 at 4:54


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.














  • PSA: Please don't post images of text
    – Wildcard
    Dec 15 '17 at 3:36












up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











Given input:



144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603


How can I output:



144.252.36.69
afrloop=32235330165603 3 times
222.252.36.69
afrloop=31135330165603 4 times






share|improve this question














Given input:



144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
144.252.36.69
afrloop=32235330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603
222.252.36.69
afrloop=31135330165603


How can I output:



144.252.36.69
afrloop=32235330165603 3 times
222.252.36.69
afrloop=31135330165603 4 times








share|improve this question













share|improve this question




share|improve this question








edited Dec 15 '17 at 3:38









Wildcard

22k855154




22k855154










asked Dec 14 '17 at 1:53









Đặng Thắng

63




63




closed as unclear what you're asking by Jeff Schaller, Michael Homer, hildred, Stephen Rauch, G-Man Dec 14 '17 at 4:54


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






closed as unclear what you're asking by Jeff Schaller, Michael Homer, hildred, Stephen Rauch, G-Man Dec 14 '17 at 4:54


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.













  • PSA: Please don't post images of text
    – Wildcard
    Dec 15 '17 at 3:36
















  • PSA: Please don't post images of text
    – Wildcard
    Dec 15 '17 at 3:36















PSA: Please don't post images of text
– Wildcard
Dec 15 '17 at 3:36




PSA: Please don't post images of text
– Wildcard
Dec 15 '17 at 3:36










2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted










paste - - < file | sort | uniq -c





share|improve this answer




















  • Nice! Took me a little while to understand what paste does in this case :-)
    – NickD
    Dec 14 '17 at 3:27

















up vote
2
down vote













Here is a solution with awk if you want a customized output format



NR%2==1 ip=$0; next
NR%2==0 a[ip"n"$0]++
END
for(i in a)
printf "%s %d timesn", i, a[i]



the script can be executed as



awk -f main.awk file


Explanation



  • First, we use NR%2==1 to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line $0 into a variable called ip. We can use next to skip any further processing and go straight to the next iteration.



  • Second, we use NR%2==0 to match even number lines, if a line matches then we create an index labeled as ip"n"$0 in an array a and increment the count value of that specific index. For example, an equivalent expansion would be like



    a["144.252.36.69 afrloop=32235330165603"] += 1


    I ignored the new line n in this example just for simplicity



  • Finally at END, after each line has been processed, we use a for loop to print out the value of each element inside array a which in our case is the count number for each unique index


Fun Benchmark




  • Test file generation (1 million records)



    awk '
    BEGINfor(i=1;i<10000000;i++)
    printf "%dnafrLoop=%dn", int(rand()*100), int(rand()*10)
    ' > test

    $ head test
    23
    afrLoop=2
    84
    afrLoop=1
    58



  • @n.caillou paste solution



    $ time paste - - < test | sort | uniq -c > /dev/null
    real 0m11.250s
    user 0m11.352s
    sys 0m0.272s



  • awk solution



    $ time awk -f main.awk test > /dev/null
    real 0m5.673s
    user 0m5.636s
    sys 0m0.036s






share|improve this answer






















  • i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
    – Äáº·ng Thắng
    Dec 14 '17 at 6:55










  • @ĐặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
    – etopylight
    Dec 14 '17 at 7:40










  • can u explain for me with ur script of u. ..tks
    – Äáº·ng Thắng
    Dec 15 '17 at 1:42











  • @ĐặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
    – etopylight
    Dec 15 '17 at 3:26






  • 1




    much more elegant than my awk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
    – Tim Kennedy
    Dec 21 '17 at 4:26

















2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










paste - - < file | sort | uniq -c





share|improve this answer




















  • Nice! Took me a little while to understand what paste does in this case :-)
    – NickD
    Dec 14 '17 at 3:27














up vote
2
down vote



accepted










paste - - < file | sort | uniq -c





share|improve this answer




















  • Nice! Took me a little while to understand what paste does in this case :-)
    – NickD
    Dec 14 '17 at 3:27












up vote
2
down vote



accepted







up vote
2
down vote



accepted






paste - - < file | sort | uniq -c





share|improve this answer












paste - - < file | sort | uniq -c






share|improve this answer












share|improve this answer



share|improve this answer










answered Dec 14 '17 at 3:20









n.caillou

29216




29216











  • Nice! Took me a little while to understand what paste does in this case :-)
    – NickD
    Dec 14 '17 at 3:27
















  • Nice! Took me a little while to understand what paste does in this case :-)
    – NickD
    Dec 14 '17 at 3:27















Nice! Took me a little while to understand what paste does in this case :-)
– NickD
Dec 14 '17 at 3:27




Nice! Took me a little while to understand what paste does in this case :-)
– NickD
Dec 14 '17 at 3:27












up vote
2
down vote













Here is a solution with awk if you want a customized output format



NR%2==1 ip=$0; next
NR%2==0 a[ip"n"$0]++
END
for(i in a)
printf "%s %d timesn", i, a[i]



the script can be executed as



awk -f main.awk file


Explanation



  • First, we use NR%2==1 to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line $0 into a variable called ip. We can use next to skip any further processing and go straight to the next iteration.



  • Second, we use NR%2==0 to match even number lines, if a line matches then we create an index labeled as ip"n"$0 in an array a and increment the count value of that specific index. For example, an equivalent expansion would be like



    a["144.252.36.69 afrloop=32235330165603"] += 1


    I ignored the new line n in this example just for simplicity



  • Finally at END, after each line has been processed, we use a for loop to print out the value of each element inside array a which in our case is the count number for each unique index


Fun Benchmark




  • Test file generation (1 million records)



    awk '
    BEGINfor(i=1;i<10000000;i++)
    printf "%dnafrLoop=%dn", int(rand()*100), int(rand()*10)
    ' > test

    $ head test
    23
    afrLoop=2
    84
    afrLoop=1
    58



  • @n.caillou paste solution



    $ time paste - - < test | sort | uniq -c > /dev/null
    real 0m11.250s
    user 0m11.352s
    sys 0m0.272s



  • awk solution



    $ time awk -f main.awk test > /dev/null
    real 0m5.673s
    user 0m5.636s
    sys 0m0.036s






share|improve this answer






















  • i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
    – Äáº·ng Thắng
    Dec 14 '17 at 6:55










  • @ĐặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
    – etopylight
    Dec 14 '17 at 7:40










  • can u explain for me with ur script of u. ..tks
    – Äáº·ng Thắng
    Dec 15 '17 at 1:42











  • @ĐặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
    – etopylight
    Dec 15 '17 at 3:26






  • 1




    much more elegant than my awk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
    – Tim Kennedy
    Dec 21 '17 at 4:26














up vote
2
down vote













Here is a solution with awk if you want a customized output format



NR%2==1 ip=$0; next
NR%2==0 a[ip"n"$0]++
END
for(i in a)
printf "%s %d timesn", i, a[i]



the script can be executed as



awk -f main.awk file


Explanation



  • First, we use NR%2==1 to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line $0 into a variable called ip. We can use next to skip any further processing and go straight to the next iteration.



  • Second, we use NR%2==0 to match even number lines, if a line matches then we create an index labeled as ip"n"$0 in an array a and increment the count value of that specific index. For example, an equivalent expansion would be like



    a["144.252.36.69 afrloop=32235330165603"] += 1


    I ignored the new line n in this example just for simplicity



  • Finally at END, after each line has been processed, we use a for loop to print out the value of each element inside array a which in our case is the count number for each unique index


Fun Benchmark




  • Test file generation (1 million records)



    awk '
    BEGINfor(i=1;i<10000000;i++)
    printf "%dnafrLoop=%dn", int(rand()*100), int(rand()*10)
    ' > test

    $ head test
    23
    afrLoop=2
    84
    afrLoop=1
    58



  • @n.caillou paste solution



    $ time paste - - < test | sort | uniq -c > /dev/null
    real 0m11.250s
    user 0m11.352s
    sys 0m0.272s



  • awk solution



    $ time awk -f main.awk test > /dev/null
    real 0m5.673s
    user 0m5.636s
    sys 0m0.036s






share|improve this answer






















  • i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
    – Äáº·ng Thắng
    Dec 14 '17 at 6:55










  • @ĐặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
    – etopylight
    Dec 14 '17 at 7:40










  • can u explain for me with ur script of u. ..tks
    – Äáº·ng Thắng
    Dec 15 '17 at 1:42











  • @ĐặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
    – etopylight
    Dec 15 '17 at 3:26






  • 1




    much more elegant than my awk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
    – Tim Kennedy
    Dec 21 '17 at 4:26












up vote
2
down vote










up vote
2
down vote









Here is a solution with awk if you want a customized output format



NR%2==1 ip=$0; next
NR%2==0 a[ip"n"$0]++
END
for(i in a)
printf "%s %d timesn", i, a[i]



the script can be executed as



awk -f main.awk file


Explanation



  • First, we use NR%2==1 to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line $0 into a variable called ip. We can use next to skip any further processing and go straight to the next iteration.



  • Second, we use NR%2==0 to match even number lines, if a line matches then we create an index labeled as ip"n"$0 in an array a and increment the count value of that specific index. For example, an equivalent expansion would be like



    a["144.252.36.69 afrloop=32235330165603"] += 1


    I ignored the new line n in this example just for simplicity



  • Finally at END, after each line has been processed, we use a for loop to print out the value of each element inside array a which in our case is the count number for each unique index


Fun Benchmark




  • Test file generation (1 million records)



    awk '
    BEGINfor(i=1;i<10000000;i++)
    printf "%dnafrLoop=%dn", int(rand()*100), int(rand()*10)
    ' > test

    $ head test
    23
    afrLoop=2
    84
    afrLoop=1
    58



  • @n.caillou paste solution



    $ time paste - - < test | sort | uniq -c > /dev/null
    real 0m11.250s
    user 0m11.352s
    sys 0m0.272s



  • awk solution



    $ time awk -f main.awk test > /dev/null
    real 0m5.673s
    user 0m5.636s
    sys 0m0.036s






share|improve this answer














Here is a solution with awk if you want a customized output format



NR%2==1 ip=$0; next
NR%2==0 a[ip"n"$0]++
END
for(i in a)
printf "%s %d timesn", i, a[i]



the script can be executed as



awk -f main.awk file


Explanation



  • First, we use NR%2==1 to match for odd number lines since odd number modulo 2 equals 1, if any line matches this condition then we save the whole line $0 into a variable called ip. We can use next to skip any further processing and go straight to the next iteration.



  • Second, we use NR%2==0 to match even number lines, if a line matches then we create an index labeled as ip"n"$0 in an array a and increment the count value of that specific index. For example, an equivalent expansion would be like



    a["144.252.36.69 afrloop=32235330165603"] += 1


    I ignored the new line n in this example just for simplicity



  • Finally at END, after each line has been processed, we use a for loop to print out the value of each element inside array a which in our case is the count number for each unique index


Fun Benchmark




  • Test file generation (1 million records)



    awk '
    BEGINfor(i=1;i<10000000;i++)
    printf "%dnafrLoop=%dn", int(rand()*100), int(rand()*10)
    ' > test

    $ head test
    23
    afrLoop=2
    84
    afrLoop=1
    58



  • @n.caillou paste solution



    $ time paste - - < test | sort | uniq -c > /dev/null
    real 0m11.250s
    user 0m11.352s
    sys 0m0.272s



  • awk solution



    $ time awk -f main.awk test > /dev/null
    real 0m5.673s
    user 0m5.636s
    sys 0m0.036s







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 15 '17 at 4:52

























answered Dec 14 '17 at 3:54









etopylight

383117




383117











  • i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
    – Äáº·ng Thắng
    Dec 14 '17 at 6:55










  • @ĐặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
    – etopylight
    Dec 14 '17 at 7:40










  • can u explain for me with ur script of u. ..tks
    – Äáº·ng Thắng
    Dec 15 '17 at 1:42











  • @ĐặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
    – etopylight
    Dec 15 '17 at 3:26






  • 1




    much more elegant than my awk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
    – Tim Kennedy
    Dec 21 '17 at 4:26
















  • i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
    – Äáº·ng Thắng
    Dec 14 '17 at 6:55










  • @ĐặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
    – etopylight
    Dec 14 '17 at 7:40










  • can u explain for me with ur script of u. ..tks
    – Äáº·ng Thắng
    Dec 15 '17 at 1:42











  • @ĐặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
    – etopylight
    Dec 15 '17 at 3:26






  • 1




    much more elegant than my awk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
    – Tim Kennedy
    Dec 21 '17 at 4:26















i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
– Äáº·ng Thắng
Dec 14 '17 at 6:55




i have log file 1,4G :v when i use awk :v it'll take so long time :D but tks u
– Äáº·ng Thắng
Dec 14 '17 at 6:55












@ĐặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
– etopylight
Dec 14 '17 at 7:40




@ĐặngThắng Thanks for the feedback! It seems a bit strange to me that you would find the awk solution to be slower. From experience, it should be faster since it doesn't go through any additional pipes. I added a benchmark section to my original answer in case you want to try it out :)
– etopylight
Dec 14 '17 at 7:40












can u explain for me with ur script of u. ..tks
– Äáº·ng Thắng
Dec 15 '17 at 1:42





can u explain for me with ur script of u. ..tks
– Äáº·ng Thắng
Dec 15 '17 at 1:42













@ĐặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
– etopylight
Dec 15 '17 at 3:26




@ĐặngThắng Sure, glad to. Just updated the answer. Let me know if there is still anything unclear to you.
– etopylight
Dec 15 '17 at 3:26




1




1




much more elegant than my awk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
– Tim Kennedy
Dec 21 '17 at 4:26




much more elegant than my awk '!(NR%2)print$0" " pp=$0' | uniq -c | awk 'print $3"n"$2" "$1" times"'
– Tim Kennedy
Dec 21 '17 at 4:26


Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?