Combine text files and delete duplicate lines

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
0
down vote

favorite

How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?

I have these files:

file1.txt contains

alpha
beta
gamma
delta

file2.txt contains

beta
gamma
delta
epsilon

file3.txt contains

delta
epsilon
zeta
eta

I would like the final.txt file to contain:

alpha
beta
gamma
delta
epsilon
zeta
eta

I would appreciate the help.

asked Jul 20 at 1:04

AvidLearner

1

Does the order of the lines in the final file matter? Otherwise, sort -u all the input files > output would do it.
â€“Â Jeff Schaller
Jul 20 at 1:27

The order of lines doesn't matter. The result of sort -u file1.txt file2.txt file3.txt > final.txt contains 2 of delta and 2 of epsilon. I was looking for something that matches the final.txt
â€“Â AvidLearner
Jul 20 at 1:35

add a commentÂ |Â

up vote
0
down vote

favorite

How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?

I have these files:

file1.txt contains

alpha
beta
gamma
delta

file2.txt contains

beta
gamma
delta
epsilon

file3.txt contains

delta
epsilon
zeta
eta

I would like the final.txt file to contain:

alpha
beta
gamma
delta
epsilon
zeta
eta

I would appreciate the help.

asked Jul 20 at 1:04

AvidLearner

1

Does the order of the lines in the final file matter? Otherwise, sort -u all the input files > output would do it.
â€“Â Jeff Schaller
Jul 20 at 1:27

The order of lines doesn't matter. The result of sort -u file1.txt file2.txt file3.txt > final.txt contains 2 of delta and 2 of epsilon. I was looking for something that matches the final.txt
â€“Â AvidLearner
Jul 20 at 1:35

add a commentÂ |Â

up vote
0
down vote

favorite

How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?

I have these files:

file1.txt contains

alpha
beta
gamma
delta

file2.txt contains

beta
gamma
delta
epsilon

file3.txt contains

delta
epsilon
zeta
eta

I would like the final.txt file to contain:

alpha
beta
gamma
delta
epsilon
zeta
eta

I would appreciate the help.

asked Jul 20 at 1:04

AvidLearner

How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?

I have these files:

file1.txt contains

alpha
beta
gamma
delta

file2.txt contains

beta
gamma
delta
epsilon

file3.txt contains

delta
epsilon
zeta
eta

I would like the final.txt file to contain:

alpha
beta
gamma
delta
epsilon
zeta
eta

I would appreciate the help.

asked Jul 20 at 1:04

AvidLearner

asked Jul 20 at 1:04

AvidLearner

asked Jul 20 at 1:04

AvidLearner

asked Jul 20 at 1:04

AvidLearner

1

Does the order of the lines in the final file matter? Otherwise, sort -u all the input files > output would do it.
â€“Â Jeff Schaller
Jul 20 at 1:27

The order of lines doesn't matter. The result of sort -u file1.txt file2.txt file3.txt > final.txt contains 2 of delta and 2 of epsilon. I was looking for something that matches the final.txt
â€“Â AvidLearner
Jul 20 at 1:35

add a commentÂ |Â

1

Does the order of the lines in the final file matter? Otherwise, sort -u all the input files > output would do it.
â€“Â Jeff Schaller
Jul 20 at 1:27

The order of lines doesn't matter. The result of sort -u file1.txt file2.txt file3.txt > final.txt contains 2 of delta and 2 of epsilon. I was looking for something that matches the final.txt
â€“Â AvidLearner
Jul 20 at 1:35

Does the order of the lines in the final file matter? Otherwise, sort -u all the input files > output would do it.
â€“Â Jeff Schaller
Jul 20 at 1:27

The order of lines doesn't matter. The result of sort -u file1.txt file2.txt file3.txt > final.txt contains 2 of delta and 2 of epsilon. I was looking for something that matches the final.txt
â€“Â AvidLearner
Jul 20 at 1:35

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

If you want to print only the first instance of each line without sorting:

$ awk '!seen[$0]++' file1.txt file2.txt file3.txt
alpha
beta
gamma
delta
epsilon
zeta
eta

answered Jul 20 at 1:58

steeldriver

30.8k34877

The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
â€“Â AvidLearner
Jul 20 at 2:04

@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â€“Â steeldriver
Jul 20 at 2:08

Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â€“Â AvidLearner
Jul 20 at 2:15

@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
â€“Â steeldriver
Jul 20 at 2:17

add a commentÂ |Â

up vote
2
down vote

Very Simple

sort -u file[123].txt

answered Jul 20 at 3:12

Isaac

6,2031632

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f457320%2fcombine-text-files-and-delete-duplicate-lines%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

If you want to print only the first instance of each line without sorting:

$ awk '!seen[$0]++' file1.txt file2.txt file3.txt
alpha
beta
gamma
delta
epsilon
zeta
eta

answered Jul 20 at 1:58

steeldriver

30.8k34877

The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
â€“Â AvidLearner
Jul 20 at 2:04

@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â€“Â steeldriver
Jul 20 at 2:08

Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â€“Â AvidLearner
Jul 20 at 2:15

@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
â€“Â steeldriver
Jul 20 at 2:17

add a commentÂ |Â

up vote
0
down vote

accepted

If you want to print only the first instance of each line without sorting:

$ awk '!seen[$0]++' file1.txt file2.txt file3.txt
alpha
beta
gamma
delta
epsilon
zeta
eta

answered Jul 20 at 1:58

steeldriver

30.8k34877

The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
â€“Â AvidLearner
Jul 20 at 2:04

@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â€“Â steeldriver
Jul 20 at 2:08

Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â€“Â AvidLearner
Jul 20 at 2:15

@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
â€“Â steeldriver
Jul 20 at 2:17

add a commentÂ |Â

up vote
0
down vote

accepted

If you want to print only the first instance of each line without sorting:

$ awk '!seen[$0]++' file1.txt file2.txt file3.txt
alpha
beta
gamma
delta
epsilon
zeta
eta

answered Jul 20 at 1:58

steeldriver

30.8k34877

If you want to print only the first instance of each line without sorting:

$ awk '!seen[$0]++' file1.txt file2.txt file3.txt
alpha
beta
gamma
delta
epsilon
zeta
eta

answered Jul 20 at 1:58

steeldriver

30.8k34877

answered Jul 20 at 1:58

steeldriver

30.8k34877

answered Jul 20 at 1:58

steeldriver

30.8k34877

answered Jul 20 at 1:58

steeldriver

30.8k34877

The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
â€“Â AvidLearner
Jul 20 at 2:04

@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â€“Â steeldriver
Jul 20 at 2:08

Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â€“Â AvidLearner
Jul 20 at 2:15

@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
â€“Â steeldriver
Jul 20 at 2:17

add a commentÂ |Â

The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
â€“Â AvidLearner
Jul 20 at 2:04

@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â€“Â steeldriver
Jul 20 at 2:08

Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â€“Â AvidLearner
Jul 20 at 2:15

@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
â€“Â steeldriver
Jul 20 at 2:17

The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
â€“Â AvidLearner
Jul 20 at 2:04

@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â€“Â steeldriver
Jul 20 at 2:08

Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â€“Â AvidLearner
Jul 20 at 2:15

@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
â€“Â steeldriver
Jul 20 at 2:17

add a commentÂ |Â

up vote
2
down vote

Very Simple

sort -u file[123].txt

answered Jul 20 at 3:12

Isaac

6,2031632

add a commentÂ |Â

up vote
2
down vote

Very Simple

sort -u file[123].txt

answered Jul 20 at 3:12

Isaac

6,2031632

add a commentÂ |Â

up vote
2
down vote

Very Simple

sort -u file[123].txt

answered Jul 20 at 3:12

Isaac

6,2031632

Very Simple

sort -u file[123].txt

answered Jul 20 at 3:12

Isaac

6,2031632

answered Jul 20 at 3:12

Isaac

6,2031632

answered Jul 20 at 3:12

Isaac

6,2031632

answered Jul 20 at 3:12

Isaac

6,2031632

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu