Common lines between two files [duplicate]

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite













This question already has an answer here:



  • Output the common lines (similarities) of two text files (the opposite of diff)?

    5 answers



I have the following code that I run on my Terminal.



LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed


This doesn't give me the common lines between the two files. What am I missing there?







share|improve this question














marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, Satō Katsura Oct 15 '17 at 17:34


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


















    up vote
    1
    down vote

    favorite













    This question already has an answer here:



    • Output the common lines (similarities) of two text files (the opposite of diff)?

      5 answers



    I have the following code that I run on my Terminal.



    LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed


    This doesn't give me the common lines between the two files. What am I missing there?







    share|improve this question














    marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, Satō Katsura Oct 15 '17 at 17:34


    This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
















      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite












      This question already has an answer here:



      • Output the common lines (similarities) of two text files (the opposite of diff)?

        5 answers



      I have the following code that I run on my Terminal.



      LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed


      This doesn't give me the common lines between the two files. What am I missing there?







      share|improve this question















      This question already has an answer here:



      • Output the common lines (similarities) of two text files (the opposite of diff)?

        5 answers



      I have the following code that I run on my Terminal.



      LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed


      This doesn't give me the common lines between the two files. What am I missing there?





      This question already has an answer here:



      • Output the common lines (similarities) of two text files (the opposite of diff)?

        5 answers









      share|improve this question













      share|improve this question




      share|improve this question








      edited Oct 16 '17 at 4:38









      αғsнιη

      15.6k92563




      15.6k92563










      asked Oct 14 '17 at 18:46









      Marwah Soliman

      4818




      4818




      marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, Satō Katsura Oct 15 '17 at 17:34


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






      marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, Satō Katsura Oct 15 '17 at 17:34


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          5
          down vote



          accepted










          Use comm -12 file1 file2 to get common lines in both files.



          You may also needs your file to be sorted to comm to work as expected.



          comm -12 <(sort file1) <(sort file2)


          From man comm:



          -1 suppress column 1 (lines unique to FILE1)
          -2 suppress column 2 (lines unique to FILE2)


          Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.



          grep -Fxf file1 file2


          Or using awk.



          awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2


          This is reading whole line of file1 into an array called seen with the key as whole line (in awk the $0 represent the whole current line).



          We used NR==FNR as condition to run its followed block only for first input fle1 not file2, because NR in awk refer to the current processing line number and FNR is referring to the current line number in all inputs. so NR is unique for each input file but FNR is unique for all inputs.



          The next is there telling awk do not continue rest code and start again until NR wan not equal with FNR that means all lines of file1 read by awk.



          Then next seen[$0] will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.



          Another simple option is using sort and uniq:



          sort file1 file2|uniq -d


          This will print both files sorted then uniq -d will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.



          uniq -d <(sort <(sort -u file1) <(sort -u file2))





          share|improve this answer





























            up vote
            2
            down vote













            Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff command.



            If you're running the GNU diff command, this is how to see all changed lines as well as common lines:



            diff 
            --old-line-format='-%l
            '
            --new-line-format='+%l
            '
            --unchanged-line-format=' %l
            '
            "$@"


            This is similar to classic diff output, but no file names or separator lines appear in output, and old lines are marked with -, new lines are prefixed with +, and common lines are prefixed with a space .



            Here's an example shell script and the resulting output on test files:



            $ cat diffcomm.sh
            #!/bin/sh
            diff
            --old-line-format='-%l
            '
            --new-line-format='+%l
            '
            --unchanged-line-format=' %l
            '
            "$@"
            $ cat > filea
            a
            b
            c
            d
            $ cat > fileb
            a
            z
            d
            $ ./diffcomm.sh filea fileb
            a
            -b
            -c
            +z
            d
            $


            You can modify the output format for each class of line.



            See man diff or info diff or the GNU diffutils documentation for more information.






            share|improve this answer



























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              5
              down vote



              accepted










              Use comm -12 file1 file2 to get common lines in both files.



              You may also needs your file to be sorted to comm to work as expected.



              comm -12 <(sort file1) <(sort file2)


              From man comm:



              -1 suppress column 1 (lines unique to FILE1)
              -2 suppress column 2 (lines unique to FILE2)


              Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.



              grep -Fxf file1 file2


              Or using awk.



              awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2


              This is reading whole line of file1 into an array called seen with the key as whole line (in awk the $0 represent the whole current line).



              We used NR==FNR as condition to run its followed block only for first input fle1 not file2, because NR in awk refer to the current processing line number and FNR is referring to the current line number in all inputs. so NR is unique for each input file but FNR is unique for all inputs.



              The next is there telling awk do not continue rest code and start again until NR wan not equal with FNR that means all lines of file1 read by awk.



              Then next seen[$0] will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.



              Another simple option is using sort and uniq:



              sort file1 file2|uniq -d


              This will print both files sorted then uniq -d will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.



              uniq -d <(sort <(sort -u file1) <(sort -u file2))





              share|improve this answer


























                up vote
                5
                down vote



                accepted










                Use comm -12 file1 file2 to get common lines in both files.



                You may also needs your file to be sorted to comm to work as expected.



                comm -12 <(sort file1) <(sort file2)


                From man comm:



                -1 suppress column 1 (lines unique to FILE1)
                -2 suppress column 2 (lines unique to FILE2)


                Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.



                grep -Fxf file1 file2


                Or using awk.



                awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2


                This is reading whole line of file1 into an array called seen with the key as whole line (in awk the $0 represent the whole current line).



                We used NR==FNR as condition to run its followed block only for first input fle1 not file2, because NR in awk refer to the current processing line number and FNR is referring to the current line number in all inputs. so NR is unique for each input file but FNR is unique for all inputs.



                The next is there telling awk do not continue rest code and start again until NR wan not equal with FNR that means all lines of file1 read by awk.



                Then next seen[$0] will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.



                Another simple option is using sort and uniq:



                sort file1 file2|uniq -d


                This will print both files sorted then uniq -d will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.



                uniq -d <(sort <(sort -u file1) <(sort -u file2))





                share|improve this answer
























                  up vote
                  5
                  down vote



                  accepted







                  up vote
                  5
                  down vote



                  accepted






                  Use comm -12 file1 file2 to get common lines in both files.



                  You may also needs your file to be sorted to comm to work as expected.



                  comm -12 <(sort file1) <(sort file2)


                  From man comm:



                  -1 suppress column 1 (lines unique to FILE1)
                  -2 suppress column 2 (lines unique to FILE2)


                  Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.



                  grep -Fxf file1 file2


                  Or using awk.



                  awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2


                  This is reading whole line of file1 into an array called seen with the key as whole line (in awk the $0 represent the whole current line).



                  We used NR==FNR as condition to run its followed block only for first input fle1 not file2, because NR in awk refer to the current processing line number and FNR is referring to the current line number in all inputs. so NR is unique for each input file but FNR is unique for all inputs.



                  The next is there telling awk do not continue rest code and start again until NR wan not equal with FNR that means all lines of file1 read by awk.



                  Then next seen[$0] will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.



                  Another simple option is using sort and uniq:



                  sort file1 file2|uniq -d


                  This will print both files sorted then uniq -d will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.



                  uniq -d <(sort <(sort -u file1) <(sort -u file2))





                  share|improve this answer














                  Use comm -12 file1 file2 to get common lines in both files.



                  You may also needs your file to be sorted to comm to work as expected.



                  comm -12 <(sort file1) <(sort file2)


                  From man comm:



                  -1 suppress column 1 (lines unique to FILE1)
                  -2 suppress column 2 (lines unique to FILE2)


                  Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.



                  grep -Fxf file1 file2


                  Or using awk.



                  awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2


                  This is reading whole line of file1 into an array called seen with the key as whole line (in awk the $0 represent the whole current line).



                  We used NR==FNR as condition to run its followed block only for first input fle1 not file2, because NR in awk refer to the current processing line number and FNR is referring to the current line number in all inputs. so NR is unique for each input file but FNR is unique for all inputs.



                  The next is there telling awk do not continue rest code and start again until NR wan not equal with FNR that means all lines of file1 read by awk.



                  Then next seen[$0] will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.



                  Another simple option is using sort and uniq:



                  sort file1 file2|uniq -d


                  This will print both files sorted then uniq -d will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.



                  uniq -d <(sort <(sort -u file1) <(sort -u file2))






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Oct 14 '17 at 20:08

























                  answered Oct 14 '17 at 18:50









                  αғsнιη

                  15.6k92563




                  15.6k92563






















                      up vote
                      2
                      down vote













                      Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff command.



                      If you're running the GNU diff command, this is how to see all changed lines as well as common lines:



                      diff 
                      --old-line-format='-%l
                      '
                      --new-line-format='+%l
                      '
                      --unchanged-line-format=' %l
                      '
                      "$@"


                      This is similar to classic diff output, but no file names or separator lines appear in output, and old lines are marked with -, new lines are prefixed with +, and common lines are prefixed with a space .



                      Here's an example shell script and the resulting output on test files:



                      $ cat diffcomm.sh
                      #!/bin/sh
                      diff
                      --old-line-format='-%l
                      '
                      --new-line-format='+%l
                      '
                      --unchanged-line-format=' %l
                      '
                      "$@"
                      $ cat > filea
                      a
                      b
                      c
                      d
                      $ cat > fileb
                      a
                      z
                      d
                      $ ./diffcomm.sh filea fileb
                      a
                      -b
                      -c
                      +z
                      d
                      $


                      You can modify the output format for each class of line.



                      See man diff or info diff or the GNU diffutils documentation for more information.






                      share|improve this answer
























                        up vote
                        2
                        down vote













                        Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff command.



                        If you're running the GNU diff command, this is how to see all changed lines as well as common lines:



                        diff 
                        --old-line-format='-%l
                        '
                        --new-line-format='+%l
                        '
                        --unchanged-line-format=' %l
                        '
                        "$@"


                        This is similar to classic diff output, but no file names or separator lines appear in output, and old lines are marked with -, new lines are prefixed with +, and common lines are prefixed with a space .



                        Here's an example shell script and the resulting output on test files:



                        $ cat diffcomm.sh
                        #!/bin/sh
                        diff
                        --old-line-format='-%l
                        '
                        --new-line-format='+%l
                        '
                        --unchanged-line-format=' %l
                        '
                        "$@"
                        $ cat > filea
                        a
                        b
                        c
                        d
                        $ cat > fileb
                        a
                        z
                        d
                        $ ./diffcomm.sh filea fileb
                        a
                        -b
                        -c
                        +z
                        d
                        $


                        You can modify the output format for each class of line.



                        See man diff or info diff or the GNU diffutils documentation for more information.






                        share|improve this answer






















                          up vote
                          2
                          down vote










                          up vote
                          2
                          down vote









                          Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff command.



                          If you're running the GNU diff command, this is how to see all changed lines as well as common lines:



                          diff 
                          --old-line-format='-%l
                          '
                          --new-line-format='+%l
                          '
                          --unchanged-line-format=' %l
                          '
                          "$@"


                          This is similar to classic diff output, but no file names or separator lines appear in output, and old lines are marked with -, new lines are prefixed with +, and common lines are prefixed with a space .



                          Here's an example shell script and the resulting output on test files:



                          $ cat diffcomm.sh
                          #!/bin/sh
                          diff
                          --old-line-format='-%l
                          '
                          --new-line-format='+%l
                          '
                          --unchanged-line-format=' %l
                          '
                          "$@"
                          $ cat > filea
                          a
                          b
                          c
                          d
                          $ cat > fileb
                          a
                          z
                          d
                          $ ./diffcomm.sh filea fileb
                          a
                          -b
                          -c
                          +z
                          d
                          $


                          You can modify the output format for each class of line.



                          See man diff or info diff or the GNU diffutils documentation for more information.






                          share|improve this answer












                          Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff command.



                          If you're running the GNU diff command, this is how to see all changed lines as well as common lines:



                          diff 
                          --old-line-format='-%l
                          '
                          --new-line-format='+%l
                          '
                          --unchanged-line-format=' %l
                          '
                          "$@"


                          This is similar to classic diff output, but no file names or separator lines appear in output, and old lines are marked with -, new lines are prefixed with +, and common lines are prefixed with a space .



                          Here's an example shell script and the resulting output on test files:



                          $ cat diffcomm.sh
                          #!/bin/sh
                          diff
                          --old-line-format='-%l
                          '
                          --new-line-format='+%l
                          '
                          --unchanged-line-format=' %l
                          '
                          "$@"
                          $ cat > filea
                          a
                          b
                          c
                          d
                          $ cat > fileb
                          a
                          z
                          d
                          $ ./diffcomm.sh filea fileb
                          a
                          -b
                          -c
                          +z
                          d
                          $


                          You can modify the output format for each class of line.



                          See man diff or info diff or the GNU diffutils documentation for more information.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Oct 14 '17 at 19:35









                          RobertL

                          4,685523




                          4,685523












                              Popular posts from this blog

                              How to check contact read email or not when send email to Individual?

                              Displaying single band from multi-band raster using QGIS

                              How many registers does an x86_64 CPU actually have?