Common lines between two files [duplicate]

up vote
1
down vote

favorite

This question already has an answer here:

Output the common lines (similarities) of two text files (the opposite of diff)?

5 answers

I have the following code that I run on my Terminal.

LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed

This doesn't give me the common lines between the two files. What am I missing there?

edited Oct 16 '17 at 4:38

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

asked Oct 14 '17 at 18:46

Marwah Soliman

4818

marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, SatÃ…Â Katsura Oct 15 '17 at 17:34

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a commentÂ |Â

up vote
1
down vote

favorite

This question already has an answer here:

Output the common lines (similarities) of two text files (the opposite of diff)?

5 answers

I have the following code that I run on my Terminal.

LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed

This doesn't give me the common lines between the two files. What am I missing there?

edited Oct 16 '17 at 4:38

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

asked Oct 14 '17 at 18:46

Marwah Soliman

4818

marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, SatÃ…Â Katsura Oct 15 '17 at 17:34

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a commentÂ |Â

up vote
1
down vote

favorite

This question already has an answer here:

Output the common lines (similarities) of two text files (the opposite of diff)?

5 answers

I have the following code that I run on my Terminal.

LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed

This doesn't give me the common lines between the two files. What am I missing there?

edited Oct 16 '17 at 4:38

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

asked Oct 14 '17 at 18:46

Marwah Soliman

4818

This question already has an answer here:

Output the common lines (similarities) of two text files (the opposite of diff)?

5 answers

I have the following code that I run on my Terminal.

LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed

This doesn't give me the common lines between the two files. What am I missing there?

This question already has an answer here:

Output the common lines (similarities) of two text files (the opposite of diff)?

5 answers

edited Oct 16 '17 at 4:38

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

asked Oct 14 '17 at 18:46

Marwah Soliman

4818

edited Oct 16 '17 at 4:38

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

edited Oct 16 '17 at 4:38

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

edited Oct 16 '17 at 4:38

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

asked Oct 14 '17 at 18:46

Marwah Soliman

4818

asked Oct 14 '17 at 18:46

Marwah Soliman

4818

asked Oct 14 '17 at 18:46

Marwah Soliman

4818

marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, SatÃ…Â Katsura Oct 15 '17 at 17:34

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, SatÃ…Â Katsura Oct 15 '17 at 17:34

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
5
down vote

accepted

Use comm -12 file1 file2 to get common lines in both files.

You may also needs your file to be sorted to comm to work as expected.

comm -12 <(sort file1) <(sort file2)

From man comm:

-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)

Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.

grep -Fxf file1 file2

Or using awk.

awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2

This is reading whole line of file1 into an array called seen with the key as whole line (in awk the $0 represent the whole current line).

We used NR==FNR as condition to run its followed block only for first input fle1 not file2, because NR in awk refer to the current processing line number and FNR is referring to the current line number in all inputs. so NR is unique for each input file but FNR is unique for all inputs.

The next is there telling awk do not continue rest code and start again until NR wan not equal with FNR that means all lines of file1 read by awk.

Then next seen[$0] will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.

Another simple option is using sort and uniq:

sort file1 file2|uniq -d

This will print both files sorted then uniq -d will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.

uniq -d <(sort <(sort -u file1) <(sort -u file2))

edited Oct 14 '17 at 20:08

answered Oct 14 '17 at 18:50

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

add a commentÂ |Â

up vote
2
down vote

Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff command.

If you're running the GNU diff command, this is how to see all changed lines as well as common lines:

diff 
--old-line-format='-%l
' 
--new-line-format='+%l
' 
--unchanged-line-format=' %l
' 
"$@"

This is similar to classic diff output, but no file names or separator lines appear in output, and old lines are marked with -, new lines are prefixed with +, and common lines are prefixed with a space .

Here's an example shell script and the resulting output on test files:

$ cat diffcomm.sh
#!/bin/sh
diff 
--old-line-format='-%l
' 
--new-line-format='+%l
' 
--unchanged-line-format=' %l
' 
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh filea fileb
 a
-b
-c
+z
 d
$

You can modify the output format for each class of line.

See man diff or info diff or the GNU diffutils documentation for more information.

answered Oct 14 '17 at 19:35

RobertL

4,685523

add a commentÂ |Â

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
5
down vote

accepted

Use comm -12 file1 file2 to get common lines in both files.

You may also needs your file to be sorted to comm to work as expected.

comm -12 <(sort file1) <(sort file2)

From man comm:

-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)

Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.

grep -Fxf file1 file2

Or using awk.

awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2

This is reading whole line of file1 into an array called seen with the key as whole line (in awk the $0 represent the whole current line).

The next is there telling awk do not continue rest code and start again until NR wan not equal with FNR that means all lines of file1 read by awk.

Then next seen[$0] will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.

Another simple option is using sort and uniq:

sort file1 file2|uniq -d

uniq -d <(sort <(sort -u file1) <(sort -u file2))

edited Oct 14 '17 at 20:08

answered Oct 14 '17 at 18:50

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

add a commentÂ |Â

up vote
5
down vote

accepted

Use comm -12 file1 file2 to get common lines in both files.

You may also needs your file to be sorted to comm to work as expected.

comm -12 <(sort file1) <(sort file2)

From man comm:

-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)

Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.

grep -Fxf file1 file2

Or using awk.

awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2

This is reading whole line of file1 into an array called seen with the key as whole line (in awk the $0 represent the whole current line).

The next is there telling awk do not continue rest code and start again until NR wan not equal with FNR that means all lines of file1 read by awk.

Then next seen[$0] will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.

Another simple option is using sort and uniq:

sort file1 file2|uniq -d

uniq -d <(sort <(sort -u file1) <(sort -u file2))

edited Oct 14 '17 at 20:08

answered Oct 14 '17 at 18:50

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

add a commentÂ |Â

up vote
5
down vote

accepted

Use comm -12 file1 file2 to get common lines in both files.

You may also needs your file to be sorted to comm to work as expected.

comm -12 <(sort file1) <(sort file2)

From man comm:

-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)

Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.

grep -Fxf file1 file2

Or using awk.

awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2

This is reading whole line of file1 into an array called seen with the key as whole line (in awk the $0 represent the whole current line).

The next is there telling awk do not continue rest code and start again until NR wan not equal with FNR that means all lines of file1 read by awk.

Then next seen[$0] will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.

Another simple option is using sort and uniq:

sort file1 file2|uniq -d

uniq -d <(sort <(sort -u file1) <(sort -u file2))

edited Oct 14 '17 at 20:08

answered Oct 14 '17 at 18:50

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

Use comm -12 file1 file2 to get common lines in both files.

You may also needs your file to be sorted to comm to work as expected.

comm -12 <(sort file1) <(sort file2)

From man comm:

-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)

Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.

grep -Fxf file1 file2

Or using awk.

awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2

This is reading whole line of file1 into an array called seen with the key as whole line (in awk the $0 represent the whole current line).

The next is there telling awk do not continue rest code and start again until NR wan not equal with FNR that means all lines of file1 read by awk.

Then next seen[$0] will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.

Another simple option is using sort and uniq:

sort file1 file2|uniq -d

uniq -d <(sort <(sort -u file1) <(sort -u file2))

edited Oct 14 '17 at 20:08

answered Oct 14 '17 at 18:50

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

edited Oct 14 '17 at 20:08

answered Oct 14 '17 at 18:50

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

answered Oct 14 '17 at 18:50

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

answered Oct 14 '17 at 18:50

ÃŽÂ±Ã’Â“sÃÂ½ÃŽÂ¹ÃŽÂ·

15.6k92563

add a commentÂ |Â

up vote
2
down vote

Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff command.

If you're running the GNU diff command, this is how to see all changed lines as well as common lines:

diff 
--old-line-format='-%l
' 
--new-line-format='+%l
' 
--unchanged-line-format=' %l
' 
"$@"

Here's an example shell script and the resulting output on test files:

$ cat diffcomm.sh
#!/bin/sh
diff 
--old-line-format='-%l
' 
--new-line-format='+%l
' 
--unchanged-line-format=' %l
' 
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh filea fileb
 a
-b
-c
+z
 d
$

You can modify the output format for each class of line.

See man diff or info diff or the GNU diffutils documentation for more information.

answered Oct 14 '17 at 19:35

RobertL

4,685523

add a commentÂ |Â

up vote
2
down vote

Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff command.

If you're running the GNU diff command, this is how to see all changed lines as well as common lines:

diff 
--old-line-format='-%l
' 
--new-line-format='+%l
' 
--unchanged-line-format=' %l
' 
"$@"

Here's an example shell script and the resulting output on test files:

$ cat diffcomm.sh
#!/bin/sh
diff 
--old-line-format='-%l
' 
--new-line-format='+%l
' 
--unchanged-line-format=' %l
' 
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh filea fileb
 a
-b
-c
+z
 d
$

You can modify the output format for each class of line.

See man diff or info diff or the GNU diffutils documentation for more information.

answered Oct 14 '17 at 19:35

RobertL

4,685523

add a commentÂ |Â

up vote
2
down vote

Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff command.

If you're running the GNU diff command, this is how to see all changed lines as well as common lines:

diff 
--old-line-format='-%l
' 
--new-line-format='+%l
' 
--unchanged-line-format=' %l
' 
"$@"

Here's an example shell script and the resulting output on test files:

$ cat diffcomm.sh
#!/bin/sh
diff 
--old-line-format='-%l
' 
--new-line-format='+%l
' 
--unchanged-line-format=' %l
' 
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh filea fileb
 a
-b
-c
+z
 d
$

You can modify the output format for each class of line.

See man diff or info diff or the GNU diffutils documentation for more information.

answered Oct 14 '17 at 19:35

RobertL

4,685523

Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff command.

If you're running the GNU diff command, this is how to see all changed lines as well as common lines:

diff 
--old-line-format='-%l
' 
--new-line-format='+%l
' 
--unchanged-line-format=' %l
' 
"$@"

Here's an example shell script and the resulting output on test files:

$ cat diffcomm.sh
#!/bin/sh
diff 
--old-line-format='-%l
' 
--new-line-format='+%l
' 
--unchanged-line-format=' %l
' 
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh filea fileb
 a
-b
-c
+z
 d
$

You can modify the output format for each class of line.

See man diff or info diff or the GNU diffutils documentation for more information.

answered Oct 14 '17 at 19:35

RobertL

4,685523

answered Oct 14 '17 at 19:35

RobertL

4,685523

answered Oct 14 '17 at 19:35

RobertL

4,685523

answered Oct 14 '17 at 19:35

RobertL

4,685523

add a commentÂ |Â

搜尋此網誌

mjhjmtu