Combine text files and delete duplicate lines

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
0
down vote

favorite












How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?



I have these files:



file1.txt contains



alpha
beta
gamma
delta


file2.txt contains



beta
gamma
delta
epsilon


file3.txt contains



delta
epsilon
zeta
eta


I would like the final.txt file to contain:



alpha
beta
gamma
delta
epsilon
zeta
eta


I would appreciate the help.







share|improve this question















  • 1




    Does the order of the lines in the final file matter? Otherwise, sort -u all the input files > output would do it.
    – Jeff Schaller
    Jul 20 at 1:27










  • The order of lines doesn't matter. The result of sort -u file1.txt file2.txt file3.txt > final.txt contains 2 of delta and 2 of epsilon. I was looking for something that matches the final.txt
    – AvidLearner
    Jul 20 at 1:35

















up vote
0
down vote

favorite












How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?



I have these files:



file1.txt contains



alpha
beta
gamma
delta


file2.txt contains



beta
gamma
delta
epsilon


file3.txt contains



delta
epsilon
zeta
eta


I would like the final.txt file to contain:



alpha
beta
gamma
delta
epsilon
zeta
eta


I would appreciate the help.







share|improve this question















  • 1




    Does the order of the lines in the final file matter? Otherwise, sort -u all the input files > output would do it.
    – Jeff Schaller
    Jul 20 at 1:27










  • The order of lines doesn't matter. The result of sort -u file1.txt file2.txt file3.txt > final.txt contains 2 of delta and 2 of epsilon. I was looking for something that matches the final.txt
    – AvidLearner
    Jul 20 at 1:35













up vote
0
down vote

favorite









up vote
0
down vote

favorite











How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?



I have these files:



file1.txt contains



alpha
beta
gamma
delta


file2.txt contains



beta
gamma
delta
epsilon


file3.txt contains



delta
epsilon
zeta
eta


I would like the final.txt file to contain:



alpha
beta
gamma
delta
epsilon
zeta
eta


I would appreciate the help.







share|improve this question











How do I efficiently combine multiple text files and remove duplicate lines in the final file in Ubuntu?



I have these files:



file1.txt contains



alpha
beta
gamma
delta


file2.txt contains



beta
gamma
delta
epsilon


file3.txt contains



delta
epsilon
zeta
eta


I would like the final.txt file to contain:



alpha
beta
gamma
delta
epsilon
zeta
eta


I would appreciate the help.









share|improve this question










share|improve this question




share|improve this question









asked Jul 20 at 1:04









AvidLearner

31




31







  • 1




    Does the order of the lines in the final file matter? Otherwise, sort -u all the input files > output would do it.
    – Jeff Schaller
    Jul 20 at 1:27










  • The order of lines doesn't matter. The result of sort -u file1.txt file2.txt file3.txt > final.txt contains 2 of delta and 2 of epsilon. I was looking for something that matches the final.txt
    – AvidLearner
    Jul 20 at 1:35













  • 1




    Does the order of the lines in the final file matter? Otherwise, sort -u all the input files > output would do it.
    – Jeff Schaller
    Jul 20 at 1:27










  • The order of lines doesn't matter. The result of sort -u file1.txt file2.txt file3.txt > final.txt contains 2 of delta and 2 of epsilon. I was looking for something that matches the final.txt
    – AvidLearner
    Jul 20 at 1:35








1




1




Does the order of the lines in the final file matter? Otherwise, sort -u all the input files > output would do it.
– Jeff Schaller
Jul 20 at 1:27




Does the order of the lines in the final file matter? Otherwise, sort -u all the input files > output would do it.
– Jeff Schaller
Jul 20 at 1:27












The order of lines doesn't matter. The result of sort -u file1.txt file2.txt file3.txt > final.txt contains 2 of delta and 2 of epsilon. I was looking for something that matches the final.txt
– AvidLearner
Jul 20 at 1:35





The order of lines doesn't matter. The result of sort -u file1.txt file2.txt file3.txt > final.txt contains 2 of delta and 2 of epsilon. I was looking for something that matches the final.txt
– AvidLearner
Jul 20 at 1:35











2 Answers
2






active

oldest

votes

















up vote
0
down vote



accepted










If you want to print only the first instance of each line without sorting:



$ awk '!seen[$0]++' file1.txt file2.txt file3.txt
alpha
beta
gamma
delta
epsilon
zeta
eta





share|improve this answer





















  • The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
    – AvidLearner
    Jul 20 at 2:04











  • @AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
    – steeldriver
    Jul 20 at 2:08











  • Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
    – AvidLearner
    Jul 20 at 2:15











  • @AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
    – steeldriver
    Jul 20 at 2:17

















up vote
2
down vote













Very Simple



sort -u file[123].txt





share|improve this answer





















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f457320%2fcombine-text-files-and-delete-duplicate-lines%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote



    accepted










    If you want to print only the first instance of each line without sorting:



    $ awk '!seen[$0]++' file1.txt file2.txt file3.txt
    alpha
    beta
    gamma
    delta
    epsilon
    zeta
    eta





    share|improve this answer





















    • The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
      – AvidLearner
      Jul 20 at 2:04











    • @AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
      – steeldriver
      Jul 20 at 2:08











    • Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
      – AvidLearner
      Jul 20 at 2:15











    • @AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
      – steeldriver
      Jul 20 at 2:17














    up vote
    0
    down vote



    accepted










    If you want to print only the first instance of each line without sorting:



    $ awk '!seen[$0]++' file1.txt file2.txt file3.txt
    alpha
    beta
    gamma
    delta
    epsilon
    zeta
    eta





    share|improve this answer





















    • The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
      – AvidLearner
      Jul 20 at 2:04











    • @AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
      – steeldriver
      Jul 20 at 2:08











    • Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
      – AvidLearner
      Jul 20 at 2:15











    • @AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
      – steeldriver
      Jul 20 at 2:17












    up vote
    0
    down vote



    accepted







    up vote
    0
    down vote



    accepted






    If you want to print only the first instance of each line without sorting:



    $ awk '!seen[$0]++' file1.txt file2.txt file3.txt
    alpha
    beta
    gamma
    delta
    epsilon
    zeta
    eta





    share|improve this answer













    If you want to print only the first instance of each line without sorting:



    $ awk '!seen[$0]++' file1.txt file2.txt file3.txt
    alpha
    beta
    gamma
    delta
    epsilon
    zeta
    eta






    share|improve this answer













    share|improve this answer



    share|improve this answer











    answered Jul 20 at 1:58









    steeldriver

    30.8k34877




    30.8k34877











    • The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
      – AvidLearner
      Jul 20 at 2:04











    • @AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
      – steeldriver
      Jul 20 at 2:08











    • Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
      – AvidLearner
      Jul 20 at 2:15











    • @AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
      – steeldriver
      Jul 20 at 2:17
















    • The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
      – AvidLearner
      Jul 20 at 2:04











    • @AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
      – steeldriver
      Jul 20 at 2:08











    • Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
      – AvidLearner
      Jul 20 at 2:15











    • @AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
      – steeldriver
      Jul 20 at 2:17















    The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
    – AvidLearner
    Jul 20 at 2:04





    The output for awk '!seen[$0]++' file1.txt file2.txt file3.txt contains 2 lines of delta and 2 lines of epsilon. I am looking to remove any additional duplicates.
    – AvidLearner
    Jul 20 at 2:04













    @AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
    – steeldriver
    Jul 20 at 2:08





    @AvidLearner I tested it with the exact input you posted - if you are seeing something different, then your files are not the same (i.e. some apparently duplicate lines are actually distinct - for example, they have trailing whitespace)
    – steeldriver
    Jul 20 at 2:08













    Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
    – AvidLearner
    Jul 20 at 2:15





    Thank you. The trailing white spaces were the issue. I should have added my output of commands I tried in the original post for clarity.
    – AvidLearner
    Jul 20 at 2:15













    @AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
    – steeldriver
    Jul 20 at 2:17




    @AvidLearner if the inputs consist of single words per line, then you can avoid the trailing whitespace issue by keying on $1 rather than $0
    – steeldriver
    Jul 20 at 2:17












    up vote
    2
    down vote













    Very Simple



    sort -u file[123].txt





    share|improve this answer

























      up vote
      2
      down vote













      Very Simple



      sort -u file[123].txt





      share|improve this answer























        up vote
        2
        down vote










        up vote
        2
        down vote









        Very Simple



        sort -u file[123].txt





        share|improve this answer













        Very Simple



        sort -u file[123].txt






        share|improve this answer













        share|improve this answer



        share|improve this answer











        answered Jul 20 at 3:12









        Isaac

        6,2031632




        6,2031632






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f457320%2fcombine-text-files-and-delete-duplicate-lines%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Bahrain

            Postfix configuration issue with fips on centos 7; mailgun relay