Merge two lists while removing duplicates

Clash Royale CLAN TAG#URR8PPP
up vote
14
down vote
favorite
I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:
first file
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
second file
mmmmmm
nnnnnn
yyyyyy
zzzzzz
I need to merge these 2 lists into 1 file, and remove the duplicates. I don't have diff (space is limited) so we get to use the great awk, sed, and grep (or other tools that might be included in a standard Busybox instance). Going to a merge file like:
command1 > mylist.merge
command2 mylist.merge > originallist
is totally ok. It doesn't have to be a single-line command.
Currently defined functions in the instance of Busybox that I am using (default OpenWRT):
[, [[, arping, ash, awk, basename, brctl, bunzip2, bzcat, cat, chgrp, chmod, chown, chroot, clear, cmp,
cp, crond, crontab, cut, date, dd, df, dirname, dmesg, du, echo, egrep, env, expr, false, fgrep, find,
free, fsync, grep, gunzip, gzip, halt, head, hexdump, hostid, hwclock, id, ifconfig, init, insmod, kill,
killall, klogd, less, ln, lock, logger, logread, ls, lsmod, md5sum, mkdir, mkfifo, mknod, mktemp, mount,
mv, nc, netmsg, netstat, nice, nslookup, ntpd, passwd, pgrep, pidof, ping, ping6, pivot_root, pkill,
poweroff, printf, ps, pwd, reboot, reset, rm, rmdir, rmmod, route, sed, seq, sh, sleep, sort,
start-stop-daemon, strings, switch_root, sync, sysctl, syslogd, tail, tar, tee, telnet, telnetd, test,
time, top, touch, tr, traceroute, true, udhcpc, umount, uname, uniq, uptime, vconfig, vi, watchdog, wc,
wget, which, xargs, yes, zcat
bash grep sed awk busybox
migrated from serverfault.com Oct 6 '12 at 15:56
This question came from our site for system and network administrators.
add a comment |Â
up vote
14
down vote
favorite
I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:
first file
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
second file
mmmmmm
nnnnnn
yyyyyy
zzzzzz
I need to merge these 2 lists into 1 file, and remove the duplicates. I don't have diff (space is limited) so we get to use the great awk, sed, and grep (or other tools that might be included in a standard Busybox instance). Going to a merge file like:
command1 > mylist.merge
command2 mylist.merge > originallist
is totally ok. It doesn't have to be a single-line command.
Currently defined functions in the instance of Busybox that I am using (default OpenWRT):
[, [[, arping, ash, awk, basename, brctl, bunzip2, bzcat, cat, chgrp, chmod, chown, chroot, clear, cmp,
cp, crond, crontab, cut, date, dd, df, dirname, dmesg, du, echo, egrep, env, expr, false, fgrep, find,
free, fsync, grep, gunzip, gzip, halt, head, hexdump, hostid, hwclock, id, ifconfig, init, insmod, kill,
killall, klogd, less, ln, lock, logger, logread, ls, lsmod, md5sum, mkdir, mkfifo, mknod, mktemp, mount,
mv, nc, netmsg, netstat, nice, nslookup, ntpd, passwd, pgrep, pidof, ping, ping6, pivot_root, pkill,
poweroff, printf, ps, pwd, reboot, reset, rm, rmdir, rmmod, route, sed, seq, sh, sleep, sort,
start-stop-daemon, strings, switch_root, sync, sysctl, syslogd, tail, tar, tee, telnet, telnetd, test,
time, top, touch, tr, traceroute, true, udhcpc, umount, uname, uniq, uptime, vconfig, vi, watchdog, wc,
wget, which, xargs, yes, zcat
bash grep sed awk busybox
migrated from serverfault.com Oct 6 '12 at 15:56
This question came from our site for system and network administrators.
add a comment |Â
up vote
14
down vote
favorite
up vote
14
down vote
favorite
I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:
first file
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
second file
mmmmmm
nnnnnn
yyyyyy
zzzzzz
I need to merge these 2 lists into 1 file, and remove the duplicates. I don't have diff (space is limited) so we get to use the great awk, sed, and grep (or other tools that might be included in a standard Busybox instance). Going to a merge file like:
command1 > mylist.merge
command2 mylist.merge > originallist
is totally ok. It doesn't have to be a single-line command.
Currently defined functions in the instance of Busybox that I am using (default OpenWRT):
[, [[, arping, ash, awk, basename, brctl, bunzip2, bzcat, cat, chgrp, chmod, chown, chroot, clear, cmp,
cp, crond, crontab, cut, date, dd, df, dirname, dmesg, du, echo, egrep, env, expr, false, fgrep, find,
free, fsync, grep, gunzip, gzip, halt, head, hexdump, hostid, hwclock, id, ifconfig, init, insmod, kill,
killall, klogd, less, ln, lock, logger, logread, ls, lsmod, md5sum, mkdir, mkfifo, mknod, mktemp, mount,
mv, nc, netmsg, netstat, nice, nslookup, ntpd, passwd, pgrep, pidof, ping, ping6, pivot_root, pkill,
poweroff, printf, ps, pwd, reboot, reset, rm, rmdir, rmmod, route, sed, seq, sh, sleep, sort,
start-stop-daemon, strings, switch_root, sync, sysctl, syslogd, tail, tar, tee, telnet, telnetd, test,
time, top, touch, tr, traceroute, true, udhcpc, umount, uname, uniq, uptime, vconfig, vi, watchdog, wc,
wget, which, xargs, yes, zcat
bash grep sed awk busybox
I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:
first file
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
second file
mmmmmm
nnnnnn
yyyyyy
zzzzzz
I need to merge these 2 lists into 1 file, and remove the duplicates. I don't have diff (space is limited) so we get to use the great awk, sed, and grep (or other tools that might be included in a standard Busybox instance). Going to a merge file like:
command1 > mylist.merge
command2 mylist.merge > originallist
is totally ok. It doesn't have to be a single-line command.
Currently defined functions in the instance of Busybox that I am using (default OpenWRT):
[, [[, arping, ash, awk, basename, brctl, bunzip2, bzcat, cat, chgrp, chmod, chown, chroot, clear, cmp,
cp, crond, crontab, cut, date, dd, df, dirname, dmesg, du, echo, egrep, env, expr, false, fgrep, find,
free, fsync, grep, gunzip, gzip, halt, head, hexdump, hostid, hwclock, id, ifconfig, init, insmod, kill,
killall, klogd, less, ln, lock, logger, logread, ls, lsmod, md5sum, mkdir, mkfifo, mknod, mktemp, mount,
mv, nc, netmsg, netstat, nice, nslookup, ntpd, passwd, pgrep, pidof, ping, ping6, pivot_root, pkill,
poweroff, printf, ps, pwd, reboot, reset, rm, rmdir, rmmod, route, sed, seq, sh, sleep, sort,
start-stop-daemon, strings, switch_root, sync, sysctl, syslogd, tail, tar, tee, telnet, telnetd, test,
time, top, touch, tr, traceroute, true, udhcpc, umount, uname, uniq, uptime, vconfig, vi, watchdog, wc,
wget, which, xargs, yes, zcat
bash grep sed awk busybox
bash grep sed awk busybox
edited May 6 at 17:52
learningbee
33
33
asked Oct 2 '12 at 18:42
slthomason
migrated from serverfault.com Oct 6 '12 at 15:56
This question came from our site for system and network administrators.
migrated from serverfault.com Oct 6 '12 at 15:56
This question came from our site for system and network administrators.
add a comment |Â
add a comment |Â
5 Answers
5
active
oldest
votes
up vote
23
down vote
accepted
I think
sort file1 file2 | uniq
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
yyyyyy
zzzzzz
will do what you want.
Additional Documentation: uniq sort
7
busybox sort supports the unique flag-u.
â Thor
Oct 2 '12 at 18:53
@Thor: oooh cheers that's not a switch I'm familiar with.
â Iain
Oct 2 '12 at 20:18
add a comment |Â
up vote
8
down vote
In just one command without any pipe :
sort -u FILE1 FILE2
search
Suppress duplicate lines
-> http://www.busybox.net/downloads/BusyBox.html
add a comment |Â
up vote
3
down vote
Another solution:
awk '!a[$0]++' file_1 file_2
I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â dezza
Jan 8 '17 at 15:21
add a comment |Â
up vote
1
down vote
To sort according to some key column use following :
awk '!duplicate[$1,$2,$3]++' file_1 file_2
here consider first, second and third column as your primary key.
add a comment |Â
up vote
1
down vote
The files on your question are sorted.
If the source files are indeed sorted, you can uniq and merge in one step:
sort -um file1 file2 > mylist.merge
For numeric sort (not alphanumeric), use:
sort -num file1 file2 > mylist.merge
That could not be done in-place (redirected to one source file).
If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):
sort -uo file1 file1
sort -uo file2 file2
sort -um file1 file2 > mylist.merge
mv mylist.merge originallist
That would be faster than the simpler "one command line" to sort all:
cat file1 file2 | sort -u >mylist.merge
However, this line could be useful for small files.
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
23
down vote
accepted
I think
sort file1 file2 | uniq
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
yyyyyy
zzzzzz
will do what you want.
Additional Documentation: uniq sort
7
busybox sort supports the unique flag-u.
â Thor
Oct 2 '12 at 18:53
@Thor: oooh cheers that's not a switch I'm familiar with.
â Iain
Oct 2 '12 at 20:18
add a comment |Â
up vote
23
down vote
accepted
I think
sort file1 file2 | uniq
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
yyyyyy
zzzzzz
will do what you want.
Additional Documentation: uniq sort
7
busybox sort supports the unique flag-u.
â Thor
Oct 2 '12 at 18:53
@Thor: oooh cheers that's not a switch I'm familiar with.
â Iain
Oct 2 '12 at 20:18
add a comment |Â
up vote
23
down vote
accepted
up vote
23
down vote
accepted
I think
sort file1 file2 | uniq
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
yyyyyy
zzzzzz
will do what you want.
Additional Documentation: uniq sort
I think
sort file1 file2 | uniq
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
yyyyyy
zzzzzz
will do what you want.
Additional Documentation: uniq sort
edited 30 mins ago
Jon
1198
1198
answered Oct 2 '12 at 18:46
Iain
4,41411426
4,41411426
7
busybox sort supports the unique flag-u.
â Thor
Oct 2 '12 at 18:53
@Thor: oooh cheers that's not a switch I'm familiar with.
â Iain
Oct 2 '12 at 20:18
add a comment |Â
7
busybox sort supports the unique flag-u.
â Thor
Oct 2 '12 at 18:53
@Thor: oooh cheers that's not a switch I'm familiar with.
â Iain
Oct 2 '12 at 20:18
7
7
busybox sort supports the unique flag
-u.â Thor
Oct 2 '12 at 18:53
busybox sort supports the unique flag
-u.â Thor
Oct 2 '12 at 18:53
@Thor: oooh cheers that's not a switch I'm familiar with.
â Iain
Oct 2 '12 at 20:18
@Thor: oooh cheers that's not a switch I'm familiar with.
â Iain
Oct 2 '12 at 20:18
add a comment |Â
up vote
8
down vote
In just one command without any pipe :
sort -u FILE1 FILE2
search
Suppress duplicate lines
-> http://www.busybox.net/downloads/BusyBox.html
add a comment |Â
up vote
8
down vote
In just one command without any pipe :
sort -u FILE1 FILE2
search
Suppress duplicate lines
-> http://www.busybox.net/downloads/BusyBox.html
add a comment |Â
up vote
8
down vote
up vote
8
down vote
In just one command without any pipe :
sort -u FILE1 FILE2
search
Suppress duplicate lines
-> http://www.busybox.net/downloads/BusyBox.html
In just one command without any pipe :
sort -u FILE1 FILE2
search
Suppress duplicate lines
-> http://www.busybox.net/downloads/BusyBox.html
answered Oct 6 '12 at 16:52
Gilles Quenot
15.7k13649
15.7k13649
add a comment |Â
add a comment |Â
up vote
3
down vote
Another solution:
awk '!a[$0]++' file_1 file_2
I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â dezza
Jan 8 '17 at 15:21
add a comment |Â
up vote
3
down vote
Another solution:
awk '!a[$0]++' file_1 file_2
I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â dezza
Jan 8 '17 at 15:21
add a comment |Â
up vote
3
down vote
up vote
3
down vote
Another solution:
awk '!a[$0]++' file_1 file_2
Another solution:
awk '!a[$0]++' file_1 file_2
answered Oct 6 '12 at 19:08
nowy1
3211413
3211413
I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â dezza
Jan 8 '17 at 15:21
add a comment |Â
I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â dezza
Jan 8 '17 at 15:21
I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â dezza
Jan 8 '17 at 15:21
I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â dezza
Jan 8 '17 at 15:21
add a comment |Â
up vote
1
down vote
To sort according to some key column use following :
awk '!duplicate[$1,$2,$3]++' file_1 file_2
here consider first, second and third column as your primary key.
add a comment |Â
up vote
1
down vote
To sort according to some key column use following :
awk '!duplicate[$1,$2,$3]++' file_1 file_2
here consider first, second and third column as your primary key.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
To sort according to some key column use following :
awk '!duplicate[$1,$2,$3]++' file_1 file_2
here consider first, second and third column as your primary key.
To sort according to some key column use following :
awk '!duplicate[$1,$2,$3]++' file_1 file_2
here consider first, second and third column as your primary key.
answered Feb 17 '17 at 4:26
Prem Joshi
31229
31229
add a comment |Â
add a comment |Â
up vote
1
down vote
The files on your question are sorted.
If the source files are indeed sorted, you can uniq and merge in one step:
sort -um file1 file2 > mylist.merge
For numeric sort (not alphanumeric), use:
sort -num file1 file2 > mylist.merge
That could not be done in-place (redirected to one source file).
If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):
sort -uo file1 file1
sort -uo file2 file2
sort -um file1 file2 > mylist.merge
mv mylist.merge originallist
That would be faster than the simpler "one command line" to sort all:
cat file1 file2 | sort -u >mylist.merge
However, this line could be useful for small files.
add a comment |Â
up vote
1
down vote
The files on your question are sorted.
If the source files are indeed sorted, you can uniq and merge in one step:
sort -um file1 file2 > mylist.merge
For numeric sort (not alphanumeric), use:
sort -num file1 file2 > mylist.merge
That could not be done in-place (redirected to one source file).
If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):
sort -uo file1 file1
sort -uo file2 file2
sort -um file1 file2 > mylist.merge
mv mylist.merge originallist
That would be faster than the simpler "one command line" to sort all:
cat file1 file2 | sort -u >mylist.merge
However, this line could be useful for small files.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
The files on your question are sorted.
If the source files are indeed sorted, you can uniq and merge in one step:
sort -um file1 file2 > mylist.merge
For numeric sort (not alphanumeric), use:
sort -num file1 file2 > mylist.merge
That could not be done in-place (redirected to one source file).
If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):
sort -uo file1 file1
sort -uo file2 file2
sort -um file1 file2 > mylist.merge
mv mylist.merge originallist
That would be faster than the simpler "one command line" to sort all:
cat file1 file2 | sort -u >mylist.merge
However, this line could be useful for small files.
The files on your question are sorted.
If the source files are indeed sorted, you can uniq and merge in one step:
sort -um file1 file2 > mylist.merge
For numeric sort (not alphanumeric), use:
sort -num file1 file2 > mylist.merge
That could not be done in-place (redirected to one source file).
If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):
sort -uo file1 file1
sort -uo file2 file2
sort -um file1 file2 > mylist.merge
mv mylist.merge originallist
That would be faster than the simpler "one command line" to sort all:
cat file1 file2 | sort -u >mylist.merge
However, this line could be useful for small files.
answered May 6 at 23:47
Isaac
8,99711342
8,99711342
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f50103%2fmerge-two-lists-while-removing-duplicates%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password