Combine text files and delete duplicate lines
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
0
down vote
favorite
How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?
I have these files:
file1.txt contains
alpha
beta
gamma
delta
file2.txt contains
beta
gamma
delta
epsilon
file3.txt contains
delta
epsilon
zeta
eta
I would like the final.txt file to contain:
alpha
beta
gamma
delta
epsilon
zeta
eta
I would appreciate the help.
linux ubuntu command-line
add a comment |Â
up vote
0
down vote
favorite
How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?
I have these files:
file1.txt contains
alpha
beta
gamma
delta
file2.txt contains
beta
gamma
delta
epsilon
file3.txt contains
delta
epsilon
zeta
eta
I would like the final.txt file to contain:
alpha
beta
gamma
delta
epsilon
zeta
eta
I would appreciate the help.
linux ubuntu command-line
1
Does the order of the lines in the final file matter? Otherwise,sort -u all the input files > output
would do it.
â Jeff Schaller
Jul 20 at 1:27
The order of lines doesn't matter. The result ofsort -u file1.txt file2.txt file3.txt > final.txt
contains 2 ofdelta
and 2 ofepsilon
. I was looking for something that matches thefinal.txt
â AvidLearner
Jul 20 at 1:35
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?
I have these files:
file1.txt contains
alpha
beta
gamma
delta
file2.txt contains
beta
gamma
delta
epsilon
file3.txt contains
delta
epsilon
zeta
eta
I would like the final.txt file to contain:
alpha
beta
gamma
delta
epsilon
zeta
eta
I would appreciate the help.
linux ubuntu command-line
How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?
I have these files:
file1.txt contains
alpha
beta
gamma
delta
file2.txt contains
beta
gamma
delta
epsilon
file3.txt contains
delta
epsilon
zeta
eta
I would like the final.txt file to contain:
alpha
beta
gamma
delta
epsilon
zeta
eta
I would appreciate the help.
linux ubuntu command-line
asked Jul 20 at 1:04
AvidLearner
31
31
1
Does the order of the lines in the final file matter? Otherwise,sort -u all the input files > output
would do it.
â Jeff Schaller
Jul 20 at 1:27
The order of lines doesn't matter. The result ofsort -u file1.txt file2.txt file3.txt > final.txt
contains 2 ofdelta
and 2 ofepsilon
. I was looking for something that matches thefinal.txt
â AvidLearner
Jul 20 at 1:35
add a comment |Â
1
Does the order of the lines in the final file matter? Otherwise,sort -u all the input files > output
would do it.
â Jeff Schaller
Jul 20 at 1:27
The order of lines doesn't matter. The result ofsort -u file1.txt file2.txt file3.txt > final.txt
contains 2 ofdelta
and 2 ofepsilon
. I was looking for something that matches thefinal.txt
â AvidLearner
Jul 20 at 1:35
1
1
Does the order of the lines in the final file matter? Otherwise,
sort -u all the input files > output
would do it.â Jeff Schaller
Jul 20 at 1:27
Does the order of the lines in the final file matter? Otherwise,
sort -u all the input files > output
would do it.â Jeff Schaller
Jul 20 at 1:27
The order of lines doesn't matter. The result of
sort -u file1.txt file2.txt file3.txt > final.txt
contains 2 of delta
and 2 of epsilon
. I was looking for something that matches the final.txt
â AvidLearner
Jul 20 at 1:35
The order of lines doesn't matter. The result of
sort -u file1.txt file2.txt file3.txt > final.txt
contains 2 of delta
and 2 of epsilon
. I was looking for something that matches the final.txt
â AvidLearner
Jul 20 at 1:35
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
If you want to print only the first instance of each line without sorting:
$ awk '!seen[$0]++' file1.txt file2.txt file3.txt
alpha
beta
gamma
delta
epsilon
zeta
eta
The output forawk '!seen[$0]++' file1.txt file2.txt file3.txt
contains 2 lines ofdelta
and 2 lines ofepsilon
. I am looking to remove any additional duplicates.
â AvidLearner
Jul 20 at 2:04
@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â steeldriver
Jul 20 at 2:08
Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â AvidLearner
Jul 20 at 2:15
@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on$1
rather than$0
â steeldriver
Jul 20 at 2:17
add a comment |Â
up vote
2
down vote
Very Simple
sort -u file[123].txt
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
If you want to print only the first instance of each line without sorting:
$ awk '!seen[$0]++' file1.txt file2.txt file3.txt
alpha
beta
gamma
delta
epsilon
zeta
eta
The output forawk '!seen[$0]++' file1.txt file2.txt file3.txt
contains 2 lines ofdelta
and 2 lines ofepsilon
. I am looking to remove any additional duplicates.
â AvidLearner
Jul 20 at 2:04
@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â steeldriver
Jul 20 at 2:08
Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â AvidLearner
Jul 20 at 2:15
@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on$1
rather than$0
â steeldriver
Jul 20 at 2:17
add a comment |Â
up vote
0
down vote
accepted
If you want to print only the first instance of each line without sorting:
$ awk '!seen[$0]++' file1.txt file2.txt file3.txt
alpha
beta
gamma
delta
epsilon
zeta
eta
The output forawk '!seen[$0]++' file1.txt file2.txt file3.txt
contains 2 lines ofdelta
and 2 lines ofepsilon
. I am looking to remove any additional duplicates.
â AvidLearner
Jul 20 at 2:04
@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â steeldriver
Jul 20 at 2:08
Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â AvidLearner
Jul 20 at 2:15
@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on$1
rather than$0
â steeldriver
Jul 20 at 2:17
add a comment |Â
up vote
0
down vote
accepted
up vote
0
down vote
accepted
If you want to print only the first instance of each line without sorting:
$ awk '!seen[$0]++' file1.txt file2.txt file3.txt
alpha
beta
gamma
delta
epsilon
zeta
eta
If you want to print only the first instance of each line without sorting:
$ awk '!seen[$0]++' file1.txt file2.txt file3.txt
alpha
beta
gamma
delta
epsilon
zeta
eta
answered Jul 20 at 1:58
steeldriver
30.8k34877
30.8k34877
The output forawk '!seen[$0]++' file1.txt file2.txt file3.txt
contains 2 lines ofdelta
and 2 lines ofepsilon
. I am looking to remove any additional duplicates.
â AvidLearner
Jul 20 at 2:04
@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â steeldriver
Jul 20 at 2:08
Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â AvidLearner
Jul 20 at 2:15
@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on$1
rather than$0
â steeldriver
Jul 20 at 2:17
add a comment |Â
The output forawk '!seen[$0]++' file1.txt file2.txt file3.txt
contains 2 lines ofdelta
and 2 lines ofepsilon
. I am looking to remove any additional duplicates.
â AvidLearner
Jul 20 at 2:04
@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â steeldriver
Jul 20 at 2:08
Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â AvidLearner
Jul 20 at 2:15
@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on$1
rather than$0
â steeldriver
Jul 20 at 2:17
The output for
awk '!seen[$0]++' file1.txt file2.txt file3.txt
contains 2 lines of delta
and 2 lines of epsilon
. I am looking to remove any additional duplicates.â AvidLearner
Jul 20 at 2:04
The output for
awk '!seen[$0]++' file1.txt file2.txt file3.txt
contains 2 lines of delta
and 2 lines of epsilon
. I am looking to remove any additional duplicates.â AvidLearner
Jul 20 at 2:04
@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â steeldriver
Jul 20 at 2:08
@AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
â steeldriver
Jul 20 at 2:08
Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â AvidLearner
Jul 20 at 2:15
Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
â AvidLearner
Jul 20 at 2:15
@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on
$1
rather than $0
â steeldriver
Jul 20 at 2:17
@AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on
$1
rather than $0
â steeldriver
Jul 20 at 2:17
add a comment |Â
up vote
2
down vote
Very Simple
sort -u file[123].txt
add a comment |Â
up vote
2
down vote
Very Simple
sort -u file[123].txt
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Very Simple
sort -u file[123].txt
Very Simple
sort -u file[123].txt
answered Jul 20 at 3:12
Isaac
6,2031632
6,2031632
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f457320%2fcombine-text-files-and-delete-duplicate-lines%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Does the order of the lines in the final file matter? Otherwise,
sort -u all the input files > output
would do it.â Jeff Schaller
Jul 20 at 1:27
The order of lines doesn't matter. The result of
sort -u file1.txt file2.txt file3.txt > final.txt
contains 2 ofdelta
and 2 ofepsilon
. I was looking for something that matches thefinal.txt
â AvidLearner
Jul 20 at 1:35