What do comm and diff try to accomplish at input/output level?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-2
down vote

favorite












Given two files, for each line in each file, how do comm and diff determine



  • whether the line also occur in the other file?

  • if it does, whether its occurrences in the two files are the same or different?

by taking into account the order between the lines in each file?



why does diff decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.




My guess: (just ignore the following if you are not interested in some elementary mathematics)



In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)



  • "S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.


  • When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.


Similar operations to set difference exist on files, such as comm from coreutils and diff from diffutils. But we can't think of a file as a set of lines, but as an ordered set of lines, because the lines are ordered by their line numbers naturally.



Moreover, comm and diff also work differently from each other.



What do comm and diff try to accomplish at concept level (at input and output level) respectively? If you can also use mathematics to explain, that might be clearer (I suspect I may need some basic knowledge on ordered sets). I don't expect an explanation at their implementation level, but that may help.



Thanks.










share|improve this question



















  • 1




    With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
    – cryptarch
    58 mins ago











  • Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
    – Wildcard
    57 mins ago










  • @cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
    – Tim
    57 mins ago











  • @Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
    – Tim
    55 mins ago







  • 1




    @Wildcard: you are not talking about diff, but rather about gdiff. The recent diff sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/… the algorithm used by diff is called stone. BTW: gdiff uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff faster than diff.
    – schily
    37 mins ago















up vote
-2
down vote

favorite












Given two files, for each line in each file, how do comm and diff determine



  • whether the line also occur in the other file?

  • if it does, whether its occurrences in the two files are the same or different?

by taking into account the order between the lines in each file?



why does diff decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.




My guess: (just ignore the following if you are not interested in some elementary mathematics)



In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)



  • "S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.


  • When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.


Similar operations to set difference exist on files, such as comm from coreutils and diff from diffutils. But we can't think of a file as a set of lines, but as an ordered set of lines, because the lines are ordered by their line numbers naturally.



Moreover, comm and diff also work differently from each other.



What do comm and diff try to accomplish at concept level (at input and output level) respectively? If you can also use mathematics to explain, that might be clearer (I suspect I may need some basic knowledge on ordered sets). I don't expect an explanation at their implementation level, but that may help.



Thanks.










share|improve this question



















  • 1




    With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
    – cryptarch
    58 mins ago











  • Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
    – Wildcard
    57 mins ago










  • @cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
    – Tim
    57 mins ago











  • @Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
    – Tim
    55 mins ago







  • 1




    @Wildcard: you are not talking about diff, but rather about gdiff. The recent diff sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/… the algorithm used by diff is called stone. BTW: gdiff uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff faster than diff.
    – schily
    37 mins ago













up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











Given two files, for each line in each file, how do comm and diff determine



  • whether the line also occur in the other file?

  • if it does, whether its occurrences in the two files are the same or different?

by taking into account the order between the lines in each file?



why does diff decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.




My guess: (just ignore the following if you are not interested in some elementary mathematics)



In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)



  • "S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.


  • When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.


Similar operations to set difference exist on files, such as comm from coreutils and diff from diffutils. But we can't think of a file as a set of lines, but as an ordered set of lines, because the lines are ordered by their line numbers naturally.



Moreover, comm and diff also work differently from each other.



What do comm and diff try to accomplish at concept level (at input and output level) respectively? If you can also use mathematics to explain, that might be clearer (I suspect I may need some basic knowledge on ordered sets). I don't expect an explanation at their implementation level, but that may help.



Thanks.










share|improve this question















Given two files, for each line in each file, how do comm and diff determine



  • whether the line also occur in the other file?

  • if it does, whether its occurrences in the two files are the same or different?

by taking into account the order between the lines in each file?



why does diff decide some line occurs in both files but differ and some line exists in one file but not the other, instead of the other way around.




My guess: (just ignore the following if you are not interested in some elementary mathematics)



In mathematics, a set doesn't impose order between its elements. (if a set does, then it is called an ordered set, a different concept)



  • "S1-S2", the set difference operation on two sets S1 and S2, results in a set of the elements in the first set but not in the second.


  • When taking the intersection of two sets, if an element is considered in both sets, it doesn't matter where it appears in each set.


Similar operations to set difference exist on files, such as comm from coreutils and diff from diffutils. But we can't think of a file as a set of lines, but as an ordered set of lines, because the lines are ordered by their line numbers naturally.



Moreover, comm and diff also work differently from each other.



What do comm and diff try to accomplish at concept level (at input and output level) respectively? If you can also use mathematics to explain, that might be clearer (I suspect I may need some basic knowledge on ordered sets). I don't expect an explanation at their implementation level, but that may help.



Thanks.







text-processing diff version-control math comm






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 5 mins ago

























asked 1 hour ago









Tim

24.7k70239428




24.7k70239428







  • 1




    With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
    – cryptarch
    58 mins ago











  • Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
    – Wildcard
    57 mins ago










  • @cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
    – Tim
    57 mins ago











  • @Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
    – Tim
    55 mins ago







  • 1




    @Wildcard: you are not talking about diff, but rather about gdiff. The recent diff sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/… the algorithm used by diff is called stone. BTW: gdiff uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff faster than diff.
    – schily
    37 mins ago













  • 1




    With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
    – cryptarch
    58 mins ago











  • Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
    – Wildcard
    57 mins ago










  • @cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
    – Tim
    57 mins ago











  • @Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
    – Tim
    55 mins ago







  • 1




    @Wildcard: you are not talking about diff, but rather about gdiff. The recent diff sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/… the algorithm used by diff is called stone. BTW: gdiff uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff faster than diff.
    – schily
    37 mins ago








1




1




With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
– cryptarch
58 mins ago





With diff, it might be easier to start with the concept of a patch. A patch is a mathematical operation on a file. The diff command then asks, what patch will convert the first file into the second? So, diff is the inverse of patch. Edit: also, you have a typo in the title, comm -> coom ;)
– cryptarch
58 mins ago













Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
– Wildcard
57 mins ago




Have you tried studying the source code? This is a detailed question, but have you done any research on your own? If you want definitive knowledge, look at the source code; if approximations are acceptable, have you tried any reverse engineering?
– Wildcard
57 mins ago












@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
– Tim
57 mins ago





@cry I get the rough idea of "diff" meaning subtraction of one file from the other. But at the line level, how does it make decision on each line?
– Tim
57 mins ago













@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
– Tim
55 mins ago





@Wildcard: It is always much more difficult to deduce what a program tries to accomplish from its implementation. It is always much easier to know roughly what it tries to accomplish at first and then reads its implementation.
– Tim
55 mins ago





1




1




@Wildcard: you are not talking about diff, but rather about gdiff. The recent diff sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/… the algorithm used by diff is called stone. BTW: gdiff uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff faster than diff.
– schily
37 mins ago





@Wildcard: you are not talking about diff, but rather about gdiff. The recent diff sourcecode is here: sourceforge.net/p/schillix-on/schillix-on/ci/default/tree/usr/… the algorithm used by diff is called stone. BTW: gdiff uses a different algorithm that is typically 30% slower and only wins, when the files are much larger than a megabyte, IIRC, you need 100MB files to make gdiff faster than diff.
– schily
37 mins ago
















active

oldest

votes











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f480457%2fwhat-do-comm-and-diff-try-to-accomplish-at-input-output-level%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f480457%2fwhat-do-comm-and-diff-try-to-accomplish-at-input-output-level%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?