What do comm and diff try to accomplish at input/output level?

up vote
-2
down vote

favorite

Given two files, for each line in each file, how do comm and diff determine

whether the line also occur in the other file?

if it does, whether its occurrences in the two files are the same or different?

by taking into account the order between the lines in each file?

why does diff decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.

My guess: (just ignore the following if you are not interested in some elementary mathematics)

In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)

"S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.

When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.

Similar operations to set difference exist on files, such as comm from coreutils and diff from diffutils. But we can't think of a file as a set of lines, but as an ordered set of lines, because the lines are ordered by their line numbers naturally.

Moreover, comm and diff also work differently from each other.

What do comm and diff try to accomplish at concept level (at input and output level) respectively? If you can also use mathematics to explain, that might be clearer (I suspect I may need some basic knowledge on ordered sets). I don't expect an explanation at their implementation level, but that may help.

Thanks.

edited 5 mins ago

asked 1 hour ago

Tim

24.7k70239428

1

With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
â€“Â cryptarch
58 mins ago

Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
â€“Â Wildcard
57 mins ago

@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
â€“Â Tim
57 mins ago

@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
â€“Â Tim
55 mins ago

1

@Wildcard: you are not talking about diff, but rather about gdiff. The recent diff sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/â€¦ the algorithm used by diff is called stone. BTW: gdiff uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff faster than diff.
â€“Â schily
37 mins ago

Â |Â
show 2 more comments

up vote
-2
down vote

favorite

Given two files, for each line in each file, how do comm and diff determine

whether the line also occur in the other file?

if it does, whether its occurrences in the two files are the same or different?

by taking into account the order between the lines in each file?

why does diff decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.

My guess: (just ignore the following if you are not interested in some elementary mathematics)

In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)

"S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.

When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.

Moreover, comm and diff also work differently from each other.

Thanks.

edited 5 mins ago

asked 1 hour ago

Tim

24.7k70239428

1

With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
â€“Â cryptarch
58 mins ago

Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
â€“Â Wildcard
57 mins ago

@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
â€“Â Tim
57 mins ago

@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
â€“Â Tim
55 mins ago

1

@Wildcard: you are not talking about diff, but rather about gdiff. The recent diff sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/â€¦ the algorithm used by diff is called stone. BTW: gdiff uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff faster than diff.
â€“Â schily
37 mins ago

Â |Â
show 2 more comments

up vote
-2
down vote

favorite

Given two files, for each line in each file, how do comm and diff determine

whether the line also occur in the other file?

if it does, whether its occurrences in the two files are the same or different?

by taking into account the order between the lines in each file?

why does diff decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.

My guess: (just ignore the following if you are not interested in some elementary mathematics)

In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)

"S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.

When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.

Moreover, comm and diff also work differently from each other.

Thanks.

edited 5 mins ago

asked 1 hour ago

Tim

24.7k70239428

Given two files, for each line in each file, how do comm and diff determine

whether the line also occur in the other file?

if it does, whether its occurrences in the two files are the same or different?

by taking into account the order between the lines in each file?

why does diff decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.

My guess: (just ignore the following if you are not interested in some elementary mathematics)

In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)

"S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.

When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.

Moreover, comm and diff also work differently from each other.

Thanks.

text-processing diff version-control math comm

edited 5 mins ago

asked 1 hour ago

Tim

24.7k70239428

edited 5 mins ago

asked 1 hour ago

Tim

24.7k70239428

edited 5 mins ago

asked 1 hour ago

Tim

24.7k70239428

asked 1 hour ago

Tim

24.7k70239428

asked 1 hour ago

Tim

24.7k70239428

1

With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
â€“Â cryptarch
58 mins ago

Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
â€“Â Wildcard
57 mins ago

@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
â€“Â Tim
57 mins ago

@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
â€“Â Tim
55 mins ago

1

@Wildcard: you are not talking about diff, but rather about gdiff. The recent diff sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/â€¦ the algorithm used by diff is called stone. BTW: gdiff uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff faster than diff.
â€“Â schily
37 mins ago

Â |Â
show 2 more comments

1

With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
â€“Â cryptarch
58 mins ago

Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
â€“Â Wildcard
57 mins ago

@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
â€“Â Tim
57 mins ago

@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
â€“Â Tim
55 mins ago

1

@Wildcard: you are not talking about diff, but rather about gdiff. The recent diff sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/â€¦ the algorithm used by diff is called stone. BTW: gdiff uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff faster than diff.
â€“Â schily
37 mins ago

With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
â€“Â cryptarch
58 mins ago

Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
â€“Â Wildcard
57 mins ago

@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
â€“Â Tim
57 mins ago

@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
â€“Â Tim
55 mins ago

@Wildcard: you are not talking about diff, but rather about gdiff. The recent diff sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/â€¦ the algorithm used by diff is called stone. BTW: gdiff uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff faster than diff.
â€“Â schily
37 mins ago

Â |Â
show 2 more comments

active

oldest

votes

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f480457%2fwhat-do-comm-and-diff-try-to-accomplish-at-input-output-level%23new-answer', 'question_page');

);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu