Merge two lists while removing duplicates

up vote
14
down vote

favorite

I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:

first file

aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn

second file

mmmmmm
nnnnnn
yyyyyy
zzzzzz

I need to merge these 2 lists into 1 file, and remove the duplicates. I don't have diff (space is limited) so we get to use the great awk, sed, and grep (or other tools that might be included in a standard Busybox instance). Going to a merge file like:

command1 > mylist.merge 
command2 mylist.merge > originallist

is totally ok. It doesn't have to be a single-line command.

Currently defined functions in the instance of Busybox that I am using (default OpenWRT):
[, [[, arping, ash, awk, basename, brctl, bunzip2, bzcat, cat, chgrp, chmod, chown, chroot, clear, cmp,
cp, crond, crontab, cut, date, dd, df, dirname, dmesg, du, echo, egrep, env, expr, false, fgrep, find,
free, fsync, grep, gunzip, gzip, halt, head, hexdump, hostid, hwclock, id, ifconfig, init, insmod, kill,
killall, klogd, less, ln, lock, logger, logread, ls, lsmod, md5sum, mkdir, mkfifo, mknod, mktemp, mount,
mv, nc, netmsg, netstat, nice, nslookup, ntpd, passwd, pgrep, pidof, ping, ping6, pivot_root, pkill,
poweroff, printf, ps, pwd, reboot, reset, rm, rmdir, rmmod, route, sed, seq, sh, sleep, sort,
start-stop-daemon, strings, switch_root, sync, sysctl, syslogd, tail, tar, tee, telnet, telnetd, test,
time, top, touch, tr, traceroute, true, udhcpc, umount, uname, uniq, uptime, vconfig, vi, watchdog, wc,
wget, which, xargs, yes, zcat

edited May 6 at 17:52

learningbee

asked Oct 2 '12 at 18:42

slthomason

migrated from serverfault.com Oct 6 '12 at 15:56

This question came from our site for system and network administrators.

add a commentÂ |Â

up vote
14
down vote

favorite

I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:

first file

aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn

second file

mmmmmm
nnnnnn
yyyyyy
zzzzzz

command1 > mylist.merge 
command2 mylist.merge > originallist

is totally ok. It doesn't have to be a single-line command.

edited May 6 at 17:52

learningbee

asked Oct 2 '12 at 18:42

slthomason

migrated from serverfault.com Oct 6 '12 at 15:56

This question came from our site for system and network administrators.

add a commentÂ |Â

up vote
14
down vote

favorite

I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:

first file

aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn

second file

mmmmmm
nnnnnn
yyyyyy
zzzzzz

command1 > mylist.merge 
command2 mylist.merge > originallist

is totally ok. It doesn't have to be a single-line command.

edited May 6 at 17:52

learningbee

asked Oct 2 '12 at 18:42

slthomason

I have an embedded linux system using Busybox (OpenWRT) - so commands are limited. I have two files that look like:

first file

aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn

second file

mmmmmm
nnnnnn
yyyyyy
zzzzzz

command1 > mylist.merge 
command2 mylist.merge > originallist

is totally ok. It doesn't have to be a single-line command.

bash grep sed awk busybox

edited May 6 at 17:52

learningbee

asked Oct 2 '12 at 18:42

slthomason

edited May 6 at 17:52

learningbee

asked Oct 2 '12 at 18:42

slthomason

edited May 6 at 17:52

learningbee

edited May 6 at 17:52

learningbee

edited May 6 at 17:52

learningbee

asked Oct 2 '12 at 18:42

slthomason

asked Oct 2 '12 at 18:42

slthomason

asked Oct 2 '12 at 18:42

slthomason

migrated from serverfault.com Oct 6 '12 at 15:56

This question came from our site for system and network administrators.

migrated from serverfault.com Oct 6 '12 at 15:56

This question came from our site for system and network administrators.

add a commentÂ |Â

5 Answers
5

active

oldest

votes

up vote
23
down vote

accepted

I think

sort file1 file2 | uniq
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
yyyyyy
zzzzzz

will do what you want.

Additional Documentation: uniq sort

edited 30 mins ago

Jon

1198

answered Oct 2 '12 at 18:46

Iain

4,41411426

7

busybox sort supports the unique flag -u.
â€“Â Thor
Oct 2 '12 at 18:53

@Thor: oooh cheers that's not a switch I'm familiar with.
â€“Â Iain
Oct 2 '12 at 20:18

add a commentÂ |Â

up vote
8
down vote

In just one command without any pipe :

sort -u FILE1 FILE2

Suppress duplicate lines

-> http://www.busybox.net/downloads/BusyBox.html

answered Oct 6 '12 at 16:52

Gilles Quenot

15.7k13649

add a commentÂ |Â

up vote
3
down vote

Another solution:

awk '!a[$0]++' file_1 file_2

answered Oct 6 '12 at 19:08

nowy1

3211413

I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â€“Â dezza
Jan 8 '17 at 15:21

add a commentÂ |Â

up vote
1
down vote

To sort according to some key column use following :

awk '!duplicate[$1,$2,$3]++' file_1 file_2

here consider first, second and third column as your primary key.

answered Feb 17 '17 at 4:26

Prem Joshi

31229

add a commentÂ |Â

up vote
1
down vote

The files on your question are sorted.

If the source files are indeed sorted, you can uniq and merge in one step:

sort -um file1 file2 > mylist.merge

For numeric sort (not alphanumeric), use:

sort -num file1 file2 > mylist.merge

That could not be done in-place (redirected to one source file).

If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):

sort -uo file1 file1
sort -uo file2 file2
sort -um file1 file2 > mylist.merge
mv mylist.merge originallist

That would be faster than the simpler "one command line" to sort all:

cat file1 file2 | sort -u >mylist.merge

However, this line could be useful for small files.

answered May 6 at 23:47

Isaac

8,99711342

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f50103%2fmerge-two-lists-while-removing-duplicates%23new-answer', 'question_page');

);

Post as a guest

Name

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

up vote
23
down vote

accepted

I think

sort file1 file2 | uniq
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
yyyyyy
zzzzzz

will do what you want.

Additional Documentation: uniq sort

edited 30 mins ago

Jon

1198

answered Oct 2 '12 at 18:46

Iain

4,41411426

7

busybox sort supports the unique flag -u.
â€“Â Thor
Oct 2 '12 at 18:53

@Thor: oooh cheers that's not a switch I'm familiar with.
â€“Â Iain
Oct 2 '12 at 20:18

add a commentÂ |Â

up vote
23
down vote

accepted

I think

sort file1 file2 | uniq
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
yyyyyy
zzzzzz

will do what you want.

Additional Documentation: uniq sort

edited 30 mins ago

Jon

1198

answered Oct 2 '12 at 18:46

Iain

4,41411426

7

busybox sort supports the unique flag -u.
â€“Â Thor
Oct 2 '12 at 18:53

@Thor: oooh cheers that's not a switch I'm familiar with.
â€“Â Iain
Oct 2 '12 at 20:18

add a commentÂ |Â

up vote
23
down vote

accepted

I think

sort file1 file2 | uniq
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
yyyyyy
zzzzzz

will do what you want.

Additional Documentation: uniq sort

edited 30 mins ago

Jon

1198

answered Oct 2 '12 at 18:46

Iain

4,41411426

I think

sort file1 file2 | uniq
aaaaaa
bbbbbb
cccccc
mmmmmm
nnnnnn
yyyyyy
zzzzzz

will do what you want.

Additional Documentation: uniq sort

edited 30 mins ago

Jon

1198

answered Oct 2 '12 at 18:46

Iain

4,41411426

edited 30 mins ago

Jon

1198

edited 30 mins ago

Jon

1198

edited 30 mins ago

Jon

1198

answered Oct 2 '12 at 18:46

Iain

4,41411426

answered Oct 2 '12 at 18:46

Iain

4,41411426

answered Oct 2 '12 at 18:46

Iain

4,41411426

7

busybox sort supports the unique flag -u.
â€“Â Thor
Oct 2 '12 at 18:53

@Thor: oooh cheers that's not a switch I'm familiar with.
â€“Â Iain
Oct 2 '12 at 20:18

add a commentÂ |Â

7

busybox sort supports the unique flag -u.
â€“Â Thor
Oct 2 '12 at 18:53

@Thor: oooh cheers that's not a switch I'm familiar with.
â€“Â Iain
Oct 2 '12 at 20:18

busybox sort supports the unique flag -u.
â€“Â Thor
Oct 2 '12 at 18:53

@Thor: oooh cheers that's not a switch I'm familiar with.
â€“Â Iain
Oct 2 '12 at 20:18

add a commentÂ |Â

up vote
8
down vote

In just one command without any pipe :

sort -u FILE1 FILE2

Suppress duplicate lines

-> http://www.busybox.net/downloads/BusyBox.html

answered Oct 6 '12 at 16:52

Gilles Quenot

15.7k13649

add a commentÂ |Â

up vote
8
down vote

In just one command without any pipe :

sort -u FILE1 FILE2

Suppress duplicate lines

-> http://www.busybox.net/downloads/BusyBox.html

answered Oct 6 '12 at 16:52

Gilles Quenot

15.7k13649

add a commentÂ |Â

up vote
8
down vote

In just one command without any pipe :

sort -u FILE1 FILE2

Suppress duplicate lines

-> http://www.busybox.net/downloads/BusyBox.html

answered Oct 6 '12 at 16:52

Gilles Quenot

15.7k13649

In just one command without any pipe :

sort -u FILE1 FILE2

Suppress duplicate lines

-> http://www.busybox.net/downloads/BusyBox.html

answered Oct 6 '12 at 16:52

Gilles Quenot

15.7k13649

answered Oct 6 '12 at 16:52

Gilles Quenot

15.7k13649

answered Oct 6 '12 at 16:52

Gilles Quenot

15.7k13649

answered Oct 6 '12 at 16:52

Gilles Quenot

15.7k13649

add a commentÂ |Â

up vote
3
down vote

Another solution:

awk '!a[$0]++' file_1 file_2

answered Oct 6 '12 at 19:08

nowy1

3211413

I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â€“Â dezza
Jan 8 '17 at 15:21

add a commentÂ |Â

up vote
3
down vote

Another solution:

awk '!a[$0]++' file_1 file_2

answered Oct 6 '12 at 19:08

nowy1

3211413

I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â€“Â dezza
Jan 8 '17 at 15:21

add a commentÂ |Â

up vote
3
down vote

Another solution:

awk '!a[$0]++' file_1 file_2

answered Oct 6 '12 at 19:08

nowy1

3211413

Another solution:

awk '!a[$0]++' file_1 file_2

answered Oct 6 '12 at 19:08

nowy1

3211413

answered Oct 6 '12 at 19:08

nowy1

3211413

answered Oct 6 '12 at 19:08

nowy1

3211413

answered Oct 6 '12 at 19:08

nowy1

3211413

I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â€“Â dezza
Jan 8 '17 at 15:21

add a commentÂ |Â

I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â€“Â dezza
Jan 8 '17 at 15:21

I saw that it made a difference which argument came first. Otherwise great solution, thanks.
â€“Â dezza
Jan 8 '17 at 15:21

add a commentÂ |Â

up vote
1
down vote

To sort according to some key column use following :

awk '!duplicate[$1,$2,$3]++' file_1 file_2

here consider first, second and third column as your primary key.

answered Feb 17 '17 at 4:26

Prem Joshi

31229

add a commentÂ |Â

up vote
1
down vote

To sort according to some key column use following :

awk '!duplicate[$1,$2,$3]++' file_1 file_2

here consider first, second and third column as your primary key.

answered Feb 17 '17 at 4:26

Prem Joshi

31229

add a commentÂ |Â

up vote
1
down vote

To sort according to some key column use following :

awk '!duplicate[$1,$2,$3]++' file_1 file_2

here consider first, second and third column as your primary key.

answered Feb 17 '17 at 4:26

Prem Joshi

31229

To sort according to some key column use following :

awk '!duplicate[$1,$2,$3]++' file_1 file_2

here consider first, second and third column as your primary key.

answered Feb 17 '17 at 4:26

Prem Joshi

31229

answered Feb 17 '17 at 4:26

Prem Joshi

31229

answered Feb 17 '17 at 4:26

Prem Joshi

31229

answered Feb 17 '17 at 4:26

Prem Joshi

31229

add a commentÂ |Â

up vote
1
down vote

The files on your question are sorted.

If the source files are indeed sorted, you can uniq and merge in one step:

sort -um file1 file2 > mylist.merge

For numeric sort (not alphanumeric), use:

sort -num file1 file2 > mylist.merge

That could not be done in-place (redirected to one source file).

If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):

sort -uo file1 file1
sort -uo file2 file2
sort -um file1 file2 > mylist.merge
mv mylist.merge originallist

That would be faster than the simpler "one command line" to sort all:

cat file1 file2 | sort -u >mylist.merge

However, this line could be useful for small files.

answered May 6 at 23:47

Isaac

8,99711342

add a commentÂ |Â

up vote
1
down vote

The files on your question are sorted.

If the source files are indeed sorted, you can uniq and merge in one step:

sort -um file1 file2 > mylist.merge

For numeric sort (not alphanumeric), use:

sort -num file1 file2 > mylist.merge

That could not be done in-place (redirected to one source file).

If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):

sort -uo file1 file1
sort -uo file2 file2
sort -um file1 file2 > mylist.merge
mv mylist.merge originallist

That would be faster than the simpler "one command line" to sort all:

cat file1 file2 | sort -u >mylist.merge

However, this line could be useful for small files.

answered May 6 at 23:47

Isaac

8,99711342

add a commentÂ |Â

up vote
1
down vote

The files on your question are sorted.

If the source files are indeed sorted, you can uniq and merge in one step:

sort -um file1 file2 > mylist.merge

For numeric sort (not alphanumeric), use:

sort -num file1 file2 > mylist.merge

That could not be done in-place (redirected to one source file).

If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):

sort -uo file1 file1
sort -uo file2 file2
sort -um file1 file2 > mylist.merge
mv mylist.merge originallist

That would be faster than the simpler "one command line" to sort all:

cat file1 file2 | sort -u >mylist.merge

However, this line could be useful for small files.

answered May 6 at 23:47

Isaac

8,99711342

The files on your question are sorted.

If the source files are indeed sorted, you can uniq and merge in one step:

sort -um file1 file2 > mylist.merge

For numeric sort (not alphanumeric), use:

sort -num file1 file2 > mylist.merge

That could not be done in-place (redirected to one source file).

If the files are not sorted, sort them (this sort could be done in place, using the sort option -o. However, the whole file needs to be loaded into memory):

sort -uo file1 file1
sort -uo file2 file2
sort -um file1 file2 > mylist.merge
mv mylist.merge originallist

That would be faster than the simpler "one command line" to sort all:

cat file1 file2 | sort -u >mylist.merge

However, this line could be useful for small files.

answered May 6 at 23:47

Isaac

8,99711342

answered May 6 at 23:47

Isaac

8,99711342

answered May 6 at 23:47

Isaac

8,99711342

answered May 6 at 23:47

Isaac

8,99711342

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu