What do comm and diff try to accomplish at input/output level?
Clash Royale CLAN TAG#URR8PPP
up vote
-2
down vote
favorite
Given two files, for each line in each file, how do comm
and diff
determine
- whether the line also occur in the other file?
- if it does, whether its occurrences in the two files are the same or different?
by taking into account the order between the lines in each file?
why does diff
decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.
My guess: (just ignore the following if you are not interested in some elementary mathematics)
In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)
"S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.
When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.
Similar operations to set difference exist on files, such as comm
from coreutils and diff
from diffutils. But we can't think of a file as a set of lines, but as an ordered set of lines, because the lines are ordered by their line numbers naturally.
Moreover, comm
and diff
also work differently from each other.
What do comm
and diff
try to accomplish at concept level (at input and output level) respectively? If you can also use mathematics to explain, that might be clearer (I suspect I may need some basic knowledge on ordered sets). I don't expect an explanation at their implementation level, but that may help.
Thanks.
text-processing diff version-control math comm
 |Â
show 2 more comments
up vote
-2
down vote
favorite
Given two files, for each line in each file, how do comm
and diff
determine
- whether the line also occur in the other file?
- if it does, whether its occurrences in the two files are the same or different?
by taking into account the order between the lines in each file?
why does diff
decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.
My guess: (just ignore the following if you are not interested in some elementary mathematics)
In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)
"S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.
When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.
Similar operations to set difference exist on files, such as comm
from coreutils and diff
from diffutils. But we can't think of a file as a set of lines, but as an ordered set of lines, because the lines are ordered by their line numbers naturally.
Moreover, comm
and diff
also work differently from each other.
What do comm
and diff
try to accomplish at concept level (at input and output level) respectively? If you can also use mathematics to explain, that might be clearer (I suspect I may need some basic knowledge on ordered sets). I don't expect an explanation at their implementation level, but that may help.
Thanks.
text-processing diff version-control math comm
1
With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
â cryptarch
58 mins ago
Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
â Wildcard
57 mins ago
@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
â Tim
57 mins ago
@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
â Tim
55 mins ago
1
@Wildcard: you are not talking aboutdiff
, but rather aboutgdiff
. The recentdiff
sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/⦠the algorithm used bydiff
is calledstone
. BTW:gdiff
uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to makegdiff
faster thandiff
.
â schily
37 mins ago
 |Â
show 2 more comments
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
Given two files, for each line in each file, how do comm
and diff
determine
- whether the line also occur in the other file?
- if it does, whether its occurrences in the two files are the same or different?
by taking into account the order between the lines in each file?
why does diff
decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.
My guess: (just ignore the following if you are not interested in some elementary mathematics)
In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)
"S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.
When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.
Similar operations to set difference exist on files, such as comm
from coreutils and diff
from diffutils. But we can't think of a file as a set of lines, but as an ordered set of lines, because the lines are ordered by their line numbers naturally.
Moreover, comm
and diff
also work differently from each other.
What do comm
and diff
try to accomplish at concept level (at input and output level) respectively? If you can also use mathematics to explain, that might be clearer (I suspect I may need some basic knowledge on ordered sets). I don't expect an explanation at their implementation level, but that may help.
Thanks.
text-processing diff version-control math comm
Given two files, for each line in each file, how do comm
and diff
determine
- whether the line also occur in the other file?
- if it does, whether its occurrences in the two files are the same or different?
by taking into account the order between the lines in each file?
why does diff
decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.
My guess: (just ignore the following if you are not interested in some elementary mathematics)
In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)
"S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.
When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.
Similar operations to set difference exist on files, such as comm
from coreutils and diff
from diffutils. But we can't think of a file as a set of lines, but as an ordered set of lines, because the lines are ordered by their line numbers naturally.
Moreover, comm
and diff
also work differently from each other.
What do comm
and diff
try to accomplish at concept level (at input and output level) respectively? If you can also use mathematics to explain, that might be clearer (I suspect I may need some basic knowledge on ordered sets). I don't expect an explanation at their implementation level, but that may help.
Thanks.
text-processing diff version-control math comm
text-processing diff version-control math comm
edited 5 mins ago
asked 1 hour ago
Tim
24.7k70239428
24.7k70239428
1
With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
â cryptarch
58 mins ago
Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
â Wildcard
57 mins ago
@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
â Tim
57 mins ago
@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
â Tim
55 mins ago
1
@Wildcard: you are not talking aboutdiff
, but rather aboutgdiff
. The recentdiff
sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/⦠the algorithm used bydiff
is calledstone
. BTW:gdiff
uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to makegdiff
faster thandiff
.
â schily
37 mins ago
 |Â
show 2 more comments
1
With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
â cryptarch
58 mins ago
Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
â Wildcard
57 mins ago
@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
â Tim
57 mins ago
@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
â Tim
55 mins ago
1
@Wildcard: you are not talking aboutdiff
, but rather aboutgdiff
. The recentdiff
sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/⦠the algorithm used bydiff
is calledstone
. BTW:gdiff
uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to makegdiff
faster thandiff
.
â schily
37 mins ago
1
1
With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
â cryptarch
58 mins ago
With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
â cryptarch
58 mins ago
Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
â Wildcard
57 mins ago
Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
â Wildcard
57 mins ago
@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
â Tim
57 mins ago
@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
â Tim
57 mins ago
@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
â Tim
55 mins ago
@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
â Tim
55 mins ago
1
1
@Wildcard: you are not talking about
diff
, but rather about gdiff
. The recent diff
sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/⦠the algorithm used by diff
is called stone
. BTW: gdiff
uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff
faster than diff
.â schily
37 mins ago
@Wildcard: you are not talking about
diff
, but rather about gdiff
. The recent diff
sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/⦠the algorithm used by diff
is called stone
. BTW: gdiff
uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff
faster than diff
.â schily
37 mins ago
 |Â
show 2 more comments
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f480457%2fwhat-do-comm-and-diff-try-to-accomplish-at-input-output-level%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
â cryptarch
58 mins ago
Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
â Wildcard
57 mins ago
@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
â Tim
57 mins ago
@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
â Tim
55 mins ago
1
@Wildcard: you are not talking about
diff
, but rather aboutgdiff
. The recentdiff
sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/⦠the algorithm used bydiff
is calledstone
. BTW:gdiff
uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to makegdiff
faster thandiff
.â schily
37 mins ago