Merge two lists while removing duplicates

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
14
down vote

favorite
5












I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:



first file



aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn


second file



mmmmmm
nnnnnn
yyyyyy
zzzzzz


I need to merge these 2 lists into 1 file, and remove the duplicates. I don't have diff (space is limited) so we get to use the great awk, sed, and grep (or other tools that might be included in a standard Busybox instance). Going to a merge file like:



command1 > mylist.merge 
command2 mylist.merge > originallist


is totally ok. It doesn't have to be a single-line command.



Currently defined functions in the instance of Busybox that I am using (default OpenWRT):
[, [[, arping, ash, awk, basename, brctl, bunzip2, bzcat, cat, chgrp, chmod, chown, chroot, clear, cmp,
cp, crond, crontab, cut, date, dd, df, dirname, dmesg, du, echo, egrep, env, expr, false, fgrep, find,
free, fsync, grep, gunzip, gzip, halt, head, hexdump, hostid, hwclock, id, ifconfig, init, insmod, kill,
killall, klogd, less, ln, lock, logger, logread, ls, lsmod, md5sum, mkdir, mkfifo, mknod, mktemp, mount,
mv, nc, netmsg, netstat, nice, nslookup, ntpd, passwd, pgrep, pidof, ping, ping6, pivot_root, pkill,
poweroff, printf, ps, pwd, reboot, reset, rm, rmdir, rmmod, route, sed, seq, sh, sleep, sort,
start-stop-daemon, strings, switch_root, sync, sysctl, syslogd, tail, tar, tee, telnet, telnetd, test,
time, top, touch, tr, traceroute, true, udhcpc, umount, uname, uniq, uptime, vconfig, vi, watchdog, wc,
wget, which, xargs, yes, zcat










share|improve this question















migrated from serverfault.com Oct 6 '12 at 15:56


This question came from our site for system and network administrators.


















    up vote
    14
    down vote

    favorite
    5












    I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:



    first file



    aaaaaa
    bbbbbb
    cccccc
    mmmmmm
    nnnnnn


    second file



    mmmmmm
    nnnnnn
    yyyyyy
    zzzzzz


    I need to merge these 2 lists into 1 file, and remove the duplicates. I don't have diff (space is limited) so we get to use the great awk, sed, and grep (or other tools that might be included in a standard Busybox instance). Going to a merge file like:



    command1 > mylist.merge 
    command2 mylist.merge > originallist


    is totally ok. It doesn't have to be a single-line command.



    Currently defined functions in the instance of Busybox that I am using (default OpenWRT):
    [, [[, arping, ash, awk, basename, brctl, bunzip2, bzcat, cat, chgrp, chmod, chown, chroot, clear, cmp,
    cp, crond, crontab, cut, date, dd, df, dirname, dmesg, du, echo, egrep, env, expr, false, fgrep, find,
    free, fsync, grep, gunzip, gzip, halt, head, hexdump, hostid, hwclock, id, ifconfig, init, insmod, kill,
    killall, klogd, less, ln, lock, logger, logread, ls, lsmod, md5sum, mkdir, mkfifo, mknod, mktemp, mount,
    mv, nc, netmsg, netstat, nice, nslookup, ntpd, passwd, pgrep, pidof, ping, ping6, pivot_root, pkill,
    poweroff, printf, ps, pwd, reboot, reset, rm, rmdir, rmmod, route, sed, seq, sh, sleep, sort,
    start-stop-daemon, strings, switch_root, sync, sysctl, syslogd, tail, tar, tee, telnet, telnetd, test,
    time, top, touch, tr, traceroute, true, udhcpc, umount, uname, uniq, uptime, vconfig, vi, watchdog, wc,
    wget, which, xargs, yes, zcat










    share|improve this question















    migrated from serverfault.com Oct 6 '12 at 15:56


    This question came from our site for system and network administrators.
















      up vote
      14
      down vote

      favorite
      5









      up vote
      14
      down vote

      favorite
      5






      5





      I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:



      first file



      aaaaaa
      bbbbbb
      cccccc
      mmmmmm
      nnnnnn


      second file



      mmmmmm
      nnnnnn
      yyyyyy
      zzzzzz


      I need to merge these 2 lists into 1 file, and remove the duplicates. I don't have diff (space is limited) so we get to use the great awk, sed, and grep (or other tools that might be included in a standard Busybox instance). Going to a merge file like:



      command1 > mylist.merge 
      command2 mylist.merge > originallist


      is totally ok. It doesn't have to be a single-line command.



      Currently defined functions in the instance of Busybox that I am using (default OpenWRT):
      [, [[, arping, ash, awk, basename, brctl, bunzip2, bzcat, cat, chgrp, chmod, chown, chroot, clear, cmp,
      cp, crond, crontab, cut, date, dd, df, dirname, dmesg, du, echo, egrep, env, expr, false, fgrep, find,
      free, fsync, grep, gunzip, gzip, halt, head, hexdump, hostid, hwclock, id, ifconfig, init, insmod, kill,
      killall, klogd, less, ln, lock, logger, logread, ls, lsmod, md5sum, mkdir, mkfifo, mknod, mktemp, mount,
      mv, nc, netmsg, netstat, nice, nslookup, ntpd, passwd, pgrep, pidof, ping, ping6, pivot_root, pkill,
      poweroff, printf, ps, pwd, reboot, reset, rm, rmdir, rmmod, route, sed, seq, sh, sleep, sort,
      start-stop-daemon, strings, switch_root, sync, sysctl, syslogd, tail, tar, tee, telnet, telnetd, test,
      time, top, touch, tr, traceroute, true, udhcpc, umount, uname, uniq, uptime, vconfig, vi, watchdog, wc,
      wget, which, xargs, yes, zcat










      share|improve this question















      I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:



      first file



      aaaaaa
      bbbbbb
      cccccc
      mmmmmm
      nnnnnn


      second file



      mmmmmm
      nnnnnn
      yyyyyy
      zzzzzz


      I need to merge these 2 lists into 1 file, and remove the duplicates. I don't have diff (space is limited) so we get to use the great awk, sed, and grep (or other tools that might be included in a standard Busybox instance). Going to a merge file like:



      command1 > mylist.merge 
      command2 mylist.merge > originallist


      is totally ok. It doesn't have to be a single-line command.



      Currently defined functions in the instance of Busybox that I am using (default OpenWRT):
      [, [[, arping, ash, awk, basename, brctl, bunzip2, bzcat, cat, chgrp, chmod, chown, chroot, clear, cmp,
      cp, crond, crontab, cut, date, dd, df, dirname, dmesg, du, echo, egrep, env, expr, false, fgrep, find,
      free, fsync, grep, gunzip, gzip, halt, head, hexdump, hostid, hwclock, id, ifconfig, init, insmod, kill,
      killall, klogd, less, ln, lock, logger, logread, ls, lsmod, md5sum, mkdir, mkfifo, mknod, mktemp, mount,
      mv, nc, netmsg, netstat, nice, nslookup, ntpd, passwd, pgrep, pidof, ping, ping6, pivot_root, pkill,
      poweroff, printf, ps, pwd, reboot, reset, rm, rmdir, rmmod, route, sed, seq, sh, sleep, sort,
      start-stop-daemon, strings, switch_root, sync, sysctl, syslogd, tail, tar, tee, telnet, telnetd, test,
      time, top, touch, tr, traceroute, true, udhcpc, umount, uname, uniq, uptime, vconfig, vi, watchdog, wc,
      wget, which, xargs, yes, zcat







      bash grep sed awk busybox






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited May 6 at 17:52









      learningbee

      33




      33










      asked Oct 2 '12 at 18:42







      slthomason











      migrated from serverfault.com Oct 6 '12 at 15:56


      This question came from our site for system and network administrators.






      migrated from serverfault.com Oct 6 '12 at 15:56


      This question came from our site for system and network administrators.






















          5 Answers
          5






          active

          oldest

          votes

















          up vote
          23
          down vote



          accepted










          I think



          sort file1 file2 | uniq
          aaaaaa
          bbbbbb
          cccccc
          mmmmmm
          nnnnnn
          yyyyyy
          zzzzzz


          will do what you want.



          Additional Documentation: uniq sort






          share|improve this answer


















          • 7




            busybox sort supports the unique flag -u.
            – Thor
            Oct 2 '12 at 18:53










          • @Thor: oooh cheers that's not a switch I'm familiar with.
            – Iain
            Oct 2 '12 at 20:18

















          up vote
          8
          down vote













          In just one command without any pipe :



          sort -u FILE1 FILE2


          search




          Suppress duplicate lines




          -> http://www.busybox.net/downloads/BusyBox.html






          share|improve this answer



























            up vote
            3
            down vote













            Another solution:



            awk '!a[$0]++' file_1 file_2





            share|improve this answer




















            • I saw that it made a difference which argument came first. Otherwise great solution, thanks.
              – dezza
              Jan 8 '17 at 15:21


















            up vote
            1
            down vote













            To sort according to some key column use following :



            awk '!duplicate[$1,$2,$3]++' file_1 file_2


            here consider first, second and third column as your primary key.






            share|improve this answer



























              up vote
              1
              down vote













              The files on your question are sorted.

              If the source files are indeed sorted, you can uniq and merge in one step:



              sort -um file1 file2 > mylist.merge


              For numeric sort (not alphanumeric), use:



              sort -num file1 file2 > mylist.merge


              That could not be done in-place (redirected to one source file).



              If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):



              sort -uo file1 file1
              sort -uo file2 file2
              sort -um file1 file2 > mylist.merge
              mv mylist.merge originallist


              That would be faster than the simpler "one command line" to sort all:



              cat file1 file2 | sort -u >mylist.merge


              However, this line could be useful for small files.






              share|improve this answer




















                Your Answer








                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "106"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                convertImagesToLinks: false,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                 

                draft saved


                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f50103%2fmerge-two-lists-while-removing-duplicates%23new-answer', 'question_page');

                );

                Post as a guest





























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                23
                down vote



                accepted










                I think



                sort file1 file2 | uniq
                aaaaaa
                bbbbbb
                cccccc
                mmmmmm
                nnnnnn
                yyyyyy
                zzzzzz


                will do what you want.



                Additional Documentation: uniq sort






                share|improve this answer


















                • 7




                  busybox sort supports the unique flag -u.
                  – Thor
                  Oct 2 '12 at 18:53










                • @Thor: oooh cheers that's not a switch I'm familiar with.
                  – Iain
                  Oct 2 '12 at 20:18














                up vote
                23
                down vote



                accepted










                I think



                sort file1 file2 | uniq
                aaaaaa
                bbbbbb
                cccccc
                mmmmmm
                nnnnnn
                yyyyyy
                zzzzzz


                will do what you want.



                Additional Documentation: uniq sort






                share|improve this answer


















                • 7




                  busybox sort supports the unique flag -u.
                  – Thor
                  Oct 2 '12 at 18:53










                • @Thor: oooh cheers that's not a switch I'm familiar with.
                  – Iain
                  Oct 2 '12 at 20:18












                up vote
                23
                down vote



                accepted







                up vote
                23
                down vote



                accepted






                I think



                sort file1 file2 | uniq
                aaaaaa
                bbbbbb
                cccccc
                mmmmmm
                nnnnnn
                yyyyyy
                zzzzzz


                will do what you want.



                Additional Documentation: uniq sort






                share|improve this answer














                I think



                sort file1 file2 | uniq
                aaaaaa
                bbbbbb
                cccccc
                mmmmmm
                nnnnnn
                yyyyyy
                zzzzzz


                will do what you want.



                Additional Documentation: uniq sort







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 30 mins ago









                Jon

                1198




                1198










                answered Oct 2 '12 at 18:46









                Iain

                4,41411426




                4,41411426







                • 7




                  busybox sort supports the unique flag -u.
                  – Thor
                  Oct 2 '12 at 18:53










                • @Thor: oooh cheers that's not a switch I'm familiar with.
                  – Iain
                  Oct 2 '12 at 20:18












                • 7




                  busybox sort supports the unique flag -u.
                  – Thor
                  Oct 2 '12 at 18:53










                • @Thor: oooh cheers that's not a switch I'm familiar with.
                  – Iain
                  Oct 2 '12 at 20:18







                7




                7




                busybox sort supports the unique flag -u.
                – Thor
                Oct 2 '12 at 18:53




                busybox sort supports the unique flag -u.
                – Thor
                Oct 2 '12 at 18:53












                @Thor: oooh cheers that's not a switch I'm familiar with.
                – Iain
                Oct 2 '12 at 20:18




                @Thor: oooh cheers that's not a switch I'm familiar with.
                – Iain
                Oct 2 '12 at 20:18












                up vote
                8
                down vote













                In just one command without any pipe :



                sort -u FILE1 FILE2


                search




                Suppress duplicate lines




                -> http://www.busybox.net/downloads/BusyBox.html






                share|improve this answer
























                  up vote
                  8
                  down vote













                  In just one command without any pipe :



                  sort -u FILE1 FILE2


                  search




                  Suppress duplicate lines




                  -> http://www.busybox.net/downloads/BusyBox.html






                  share|improve this answer






















                    up vote
                    8
                    down vote










                    up vote
                    8
                    down vote









                    In just one command without any pipe :



                    sort -u FILE1 FILE2


                    search




                    Suppress duplicate lines




                    -> http://www.busybox.net/downloads/BusyBox.html






                    share|improve this answer












                    In just one command without any pipe :



                    sort -u FILE1 FILE2


                    search




                    Suppress duplicate lines




                    -> http://www.busybox.net/downloads/BusyBox.html







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Oct 6 '12 at 16:52









                    Gilles Quenot

                    15.7k13649




                    15.7k13649




















                        up vote
                        3
                        down vote













                        Another solution:



                        awk '!a[$0]++' file_1 file_2





                        share|improve this answer




















                        • I saw that it made a difference which argument came first. Otherwise great solution, thanks.
                          – dezza
                          Jan 8 '17 at 15:21















                        up vote
                        3
                        down vote













                        Another solution:



                        awk '!a[$0]++' file_1 file_2





                        share|improve this answer




















                        • I saw that it made a difference which argument came first. Otherwise great solution, thanks.
                          – dezza
                          Jan 8 '17 at 15:21













                        up vote
                        3
                        down vote










                        up vote
                        3
                        down vote









                        Another solution:



                        awk '!a[$0]++' file_1 file_2





                        share|improve this answer












                        Another solution:



                        awk '!a[$0]++' file_1 file_2






                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Oct 6 '12 at 19:08









                        nowy1

                        3211413




                        3211413











                        • I saw that it made a difference which argument came first. Otherwise great solution, thanks.
                          – dezza
                          Jan 8 '17 at 15:21

















                        • I saw that it made a difference which argument came first. Otherwise great solution, thanks.
                          – dezza
                          Jan 8 '17 at 15:21
















                        I saw that it made a difference which argument came first. Otherwise great solution, thanks.
                        – dezza
                        Jan 8 '17 at 15:21





                        I saw that it made a difference which argument came first. Otherwise great solution, thanks.
                        – dezza
                        Jan 8 '17 at 15:21











                        up vote
                        1
                        down vote













                        To sort according to some key column use following :



                        awk '!duplicate[$1,$2,$3]++' file_1 file_2


                        here consider first, second and third column as your primary key.






                        share|improve this answer
























                          up vote
                          1
                          down vote













                          To sort according to some key column use following :



                          awk '!duplicate[$1,$2,$3]++' file_1 file_2


                          here consider first, second and third column as your primary key.






                          share|improve this answer






















                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote









                            To sort according to some key column use following :



                            awk '!duplicate[$1,$2,$3]++' file_1 file_2


                            here consider first, second and third column as your primary key.






                            share|improve this answer












                            To sort according to some key column use following :



                            awk '!duplicate[$1,$2,$3]++' file_1 file_2


                            here consider first, second and third column as your primary key.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Feb 17 '17 at 4:26









                            Prem Joshi

                            31229




                            31229




















                                up vote
                                1
                                down vote













                                The files on your question are sorted.

                                If the source files are indeed sorted, you can uniq and merge in one step:



                                sort -um file1 file2 > mylist.merge


                                For numeric sort (not alphanumeric), use:



                                sort -num file1 file2 > mylist.merge


                                That could not be done in-place (redirected to one source file).



                                If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):



                                sort -uo file1 file1
                                sort -uo file2 file2
                                sort -um file1 file2 > mylist.merge
                                mv mylist.merge originallist


                                That would be faster than the simpler "one command line" to sort all:



                                cat file1 file2 | sort -u >mylist.merge


                                However, this line could be useful for small files.






                                share|improve this answer
























                                  up vote
                                  1
                                  down vote













                                  The files on your question are sorted.

                                  If the source files are indeed sorted, you can uniq and merge in one step:



                                  sort -um file1 file2 > mylist.merge


                                  For numeric sort (not alphanumeric), use:



                                  sort -num file1 file2 > mylist.merge


                                  That could not be done in-place (redirected to one source file).



                                  If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):



                                  sort -uo file1 file1
                                  sort -uo file2 file2
                                  sort -um file1 file2 > mylist.merge
                                  mv mylist.merge originallist


                                  That would be faster than the simpler "one command line" to sort all:



                                  cat file1 file2 | sort -u >mylist.merge


                                  However, this line could be useful for small files.






                                  share|improve this answer






















                                    up vote
                                    1
                                    down vote










                                    up vote
                                    1
                                    down vote









                                    The files on your question are sorted.

                                    If the source files are indeed sorted, you can uniq and merge in one step:



                                    sort -um file1 file2 > mylist.merge


                                    For numeric sort (not alphanumeric), use:



                                    sort -num file1 file2 > mylist.merge


                                    That could not be done in-place (redirected to one source file).



                                    If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):



                                    sort -uo file1 file1
                                    sort -uo file2 file2
                                    sort -um file1 file2 > mylist.merge
                                    mv mylist.merge originallist


                                    That would be faster than the simpler "one command line" to sort all:



                                    cat file1 file2 | sort -u >mylist.merge


                                    However, this line could be useful for small files.






                                    share|improve this answer












                                    The files on your question are sorted.

                                    If the source files are indeed sorted, you can uniq and merge in one step:



                                    sort -um file1 file2 > mylist.merge


                                    For numeric sort (not alphanumeric), use:



                                    sort -num file1 file2 > mylist.merge


                                    That could not be done in-place (redirected to one source file).



                                    If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):



                                    sort -uo file1 file1
                                    sort -uo file2 file2
                                    sort -um file1 file2 > mylist.merge
                                    mv mylist.merge originallist


                                    That would be faster than the simpler "one command line" to sort all:



                                    cat file1 file2 | sort -u >mylist.merge


                                    However, this line could be useful for small files.







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered May 6 at 23:47









                                    Isaac

                                    8,99711342




                                    8,99711342



























                                         

                                        draft saved


                                        draft discarded















































                                         


                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f50103%2fmerge-two-lists-while-removing-duplicates%23new-answer', 'question_page');

                                        );

                                        Post as a guest













































































                                        Popular posts from this blog

                                        Peggy Mitchell

                                        Palaiologos

                                        The Forum (Inglewood, California)