Common lines between two files [duplicate]
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
This question already has an answer here:
Output the common lines (similarities) of two text files (the opposite of diff)?
5 answers
I have the following code that I run on my Terminal.
LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed
This doesn't give me the common lines between the two files. What am I missing there?
text-processing uniq
marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, SatÃ
 Katsura Oct 15 '17 at 17:34
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |Â
up vote
1
down vote
favorite
This question already has an answer here:
Output the common lines (similarities) of two text files (the opposite of diff)?
5 answers
I have the following code that I run on my Terminal.
LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed
This doesn't give me the common lines between the two files. What am I missing there?
text-processing uniq
marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, SatÃ
 Katsura Oct 15 '17 at 17:34
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
This question already has an answer here:
Output the common lines (similarities) of two text files (the opposite of diff)?
5 answers
I have the following code that I run on my Terminal.
LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed
This doesn't give me the common lines between the two files. What am I missing there?
text-processing uniq
This question already has an answer here:
Output the common lines (similarities) of two text files (the opposite of diff)?
5 answers
I have the following code that I run on my Terminal.
LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed
This doesn't give me the common lines between the two files. What am I missing there?
This question already has an answer here:
Output the common lines (similarities) of two text files (the opposite of diff)?
5 answers
text-processing uniq
edited Oct 16 '17 at 4:38
ñÃÂsýù÷
15.6k92563
15.6k92563
asked Oct 14 '17 at 18:46
Marwah Soliman
4818
4818
marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, SatÃ
 Katsura Oct 15 '17 at 17:34
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by jasonwryan, don_crissti, Anthony Geoghegan, Archemar, SatÃ
 Katsura Oct 15 '17 at 17:34
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
5
down vote
accepted
Use comm -12 file1 file2
to get common lines in both files.
You may also needs your file to be sorted to comm
to work as expected.
comm -12 <(sort file1) <(sort file2)
From man comm
:
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
Or using grep
command you need to add -x
option to match the whole line as a matching pattern. The F
option is telling grep
that match pattern as a string not a regex match.
grep -Fxf file1 file2
Or using awk
.
awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2
This is reading whole line of file1 into an array called seen
with the key as whole line (in awk
the $0
represent the whole current line).
We used NR==FNR
as condition to run its followed block only for first input fle1 not file2, because NR
in awk
refer to the current processing line number and FNR
is referring to the current line number in all inputs. so NR
is unique for each input file but FNR
is unique for all inputs.
The next
is there telling awk
do not continue rest code and start again until NR
wan not equal with FNR
that means all lines of file1 read by awk
.
Then next seen[$0]
will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.
Another simple option is using sort
and uniq
:
sort file1 file2|uniq -d
This will print both files sorted then uniq -d
will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.
uniq -d <(sort <(sort -u file1) <(sort -u file2))
add a comment |Â
up vote
2
down vote
Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff
command.
If you're running the GNU diff
command, this is how to see all changed lines as well as common lines:
diff
--old-line-format='-%l
'
--new-line-format='+%l
'
--unchanged-line-format=' %l
'
"$@"
This is similar to classic diff
output, but no file names or separator lines appear in output, and old lines are marked with -
, new lines are prefixed with +
, and common lines are prefixed with a space .
Here's an example shell script and the resulting output on test files:
$ cat diffcomm.sh
#!/bin/sh
diff
--old-line-format='-%l
'
--new-line-format='+%l
'
--unchanged-line-format=' %l
'
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh filea fileb
a
-b
-c
+z
d
$
You can modify the output format for each class of line.
See man diff
or info diff
or the GNU diffutils documentation for more information.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
Use comm -12 file1 file2
to get common lines in both files.
You may also needs your file to be sorted to comm
to work as expected.
comm -12 <(sort file1) <(sort file2)
From man comm
:
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
Or using grep
command you need to add -x
option to match the whole line as a matching pattern. The F
option is telling grep
that match pattern as a string not a regex match.
grep -Fxf file1 file2
Or using awk
.
awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2
This is reading whole line of file1 into an array called seen
with the key as whole line (in awk
the $0
represent the whole current line).
We used NR==FNR
as condition to run its followed block only for first input fle1 not file2, because NR
in awk
refer to the current processing line number and FNR
is referring to the current line number in all inputs. so NR
is unique for each input file but FNR
is unique for all inputs.
The next
is there telling awk
do not continue rest code and start again until NR
wan not equal with FNR
that means all lines of file1 read by awk
.
Then next seen[$0]
will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.
Another simple option is using sort
and uniq
:
sort file1 file2|uniq -d
This will print both files sorted then uniq -d
will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.
uniq -d <(sort <(sort -u file1) <(sort -u file2))
add a comment |Â
up vote
5
down vote
accepted
Use comm -12 file1 file2
to get common lines in both files.
You may also needs your file to be sorted to comm
to work as expected.
comm -12 <(sort file1) <(sort file2)
From man comm
:
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
Or using grep
command you need to add -x
option to match the whole line as a matching pattern. The F
option is telling grep
that match pattern as a string not a regex match.
grep -Fxf file1 file2
Or using awk
.
awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2
This is reading whole line of file1 into an array called seen
with the key as whole line (in awk
the $0
represent the whole current line).
We used NR==FNR
as condition to run its followed block only for first input fle1 not file2, because NR
in awk
refer to the current processing line number and FNR
is referring to the current line number in all inputs. so NR
is unique for each input file but FNR
is unique for all inputs.
The next
is there telling awk
do not continue rest code and start again until NR
wan not equal with FNR
that means all lines of file1 read by awk
.
Then next seen[$0]
will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.
Another simple option is using sort
and uniq
:
sort file1 file2|uniq -d
This will print both files sorted then uniq -d
will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.
uniq -d <(sort <(sort -u file1) <(sort -u file2))
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
Use comm -12 file1 file2
to get common lines in both files.
You may also needs your file to be sorted to comm
to work as expected.
comm -12 <(sort file1) <(sort file2)
From man comm
:
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
Or using grep
command you need to add -x
option to match the whole line as a matching pattern. The F
option is telling grep
that match pattern as a string not a regex match.
grep -Fxf file1 file2
Or using awk
.
awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2
This is reading whole line of file1 into an array called seen
with the key as whole line (in awk
the $0
represent the whole current line).
We used NR==FNR
as condition to run its followed block only for first input fle1 not file2, because NR
in awk
refer to the current processing line number and FNR
is referring to the current line number in all inputs. so NR
is unique for each input file but FNR
is unique for all inputs.
The next
is there telling awk
do not continue rest code and start again until NR
wan not equal with FNR
that means all lines of file1 read by awk
.
Then next seen[$0]
will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.
Another simple option is using sort
and uniq
:
sort file1 file2|uniq -d
This will print both files sorted then uniq -d
will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.
uniq -d <(sort <(sort -u file1) <(sort -u file2))
Use comm -12 file1 file2
to get common lines in both files.
You may also needs your file to be sorted to comm
to work as expected.
comm -12 <(sort file1) <(sort file2)
From man comm
:
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
Or using grep
command you need to add -x
option to match the whole line as a matching pattern. The F
option is telling grep
that match pattern as a string not a regex match.
grep -Fxf file1 file2
Or using awk
.
awk 'NR==FNRseen[$0]=1; next seen[$0]' file1 file2
This is reading whole line of file1 into an array called seen
with the key as whole line (in awk
the $0
represent the whole current line).
We used NR==FNR
as condition to run its followed block only for first input fle1 not file2, because NR
in awk
refer to the current processing line number and FNR
is referring to the current line number in all inputs. so NR
is unique for each input file but FNR
is unique for all inputs.
The next
is there telling awk
do not continue rest code and start again until NR
wan not equal with FNR
that means all lines of file1 read by awk
.
Then next seen[$0]
will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.
Another simple option is using sort
and uniq
:
sort file1 file2|uniq -d
This will print both files sorted then uniq -d
will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.
uniq -d <(sort <(sort -u file1) <(sort -u file2))
edited Oct 14 '17 at 20:08
answered Oct 14 '17 at 18:50
ñÃÂsýù÷
15.6k92563
15.6k92563
add a comment |Â
add a comment |Â
up vote
2
down vote
Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff
command.
If you're running the GNU diff
command, this is how to see all changed lines as well as common lines:
diff
--old-line-format='-%l
'
--new-line-format='+%l
'
--unchanged-line-format=' %l
'
"$@"
This is similar to classic diff
output, but no file names or separator lines appear in output, and old lines are marked with -
, new lines are prefixed with +
, and common lines are prefixed with a space .
Here's an example shell script and the resulting output on test files:
$ cat diffcomm.sh
#!/bin/sh
diff
--old-line-format='-%l
'
--new-line-format='+%l
'
--unchanged-line-format=' %l
'
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh filea fileb
a
-b
-c
+z
d
$
You can modify the output format for each class of line.
See man diff
or info diff
or the GNU diffutils documentation for more information.
add a comment |Â
up vote
2
down vote
Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff
command.
If you're running the GNU diff
command, this is how to see all changed lines as well as common lines:
diff
--old-line-format='-%l
'
--new-line-format='+%l
'
--unchanged-line-format=' %l
'
"$@"
This is similar to classic diff
output, but no file names or separator lines appear in output, and old lines are marked with -
, new lines are prefixed with +
, and common lines are prefixed with a space .
Here's an example shell script and the resulting output on test files:
$ cat diffcomm.sh
#!/bin/sh
diff
--old-line-format='-%l
'
--new-line-format='+%l
'
--unchanged-line-format=' %l
'
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh filea fileb
a
-b
-c
+z
d
$
You can modify the output format for each class of line.
See man diff
or info diff
or the GNU diffutils documentation for more information.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff
command.
If you're running the GNU diff
command, this is how to see all changed lines as well as common lines:
diff
--old-line-format='-%l
'
--new-line-format='+%l
'
--unchanged-line-format=' %l
'
"$@"
This is similar to classic diff
output, but no file names or separator lines appear in output, and old lines are marked with -
, new lines are prefixed with +
, and common lines are prefixed with a space .
Here's an example shell script and the resulting output on test files:
$ cat diffcomm.sh
#!/bin/sh
diff
--old-line-format='-%l
'
--new-line-format='+%l
'
--unchanged-line-format=' %l
'
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh filea fileb
a
-b
-c
+z
d
$
You can modify the output format for each class of line.
See man diff
or info diff
or the GNU diffutils documentation for more information.
Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff
command.
If you're running the GNU diff
command, this is how to see all changed lines as well as common lines:
diff
--old-line-format='-%l
'
--new-line-format='+%l
'
--unchanged-line-format=' %l
'
"$@"
This is similar to classic diff
output, but no file names or separator lines appear in output, and old lines are marked with -
, new lines are prefixed with +
, and common lines are prefixed with a space .
Here's an example shell script and the resulting output on test files:
$ cat diffcomm.sh
#!/bin/sh
diff
--old-line-format='-%l
'
--new-line-format='+%l
'
--unchanged-line-format=' %l
'
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh filea fileb
a
-b
-c
+z
d
$
You can modify the output format for each class of line.
See man diff
or info diff
or the GNU diffutils documentation for more information.
answered Oct 14 '17 at 19:35
RobertL
4,685523
4,685523
add a comment |Â
add a comment |Â