Shell: compare content file not checksums

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












I need to compare two files content. Both are generated by a third application.



The files look like an env file:



VAR1=VAL1
VAR2=VAL2
VAR3=VAL3
...


The problem is that sometimes this application generates the content:



VAR2=VAL2
VAR1=VAL1
VAR3=VAL3
...


I'm using md5sum in order to generate a file with checksums and then I use cmp in order to compare them.



So, the content is the same, but checksums differs.



Any ideas to solve that?







share|improve this question















  • 1




    why not simply sort file, remove comment if any, and compare-checksum them ?
    – Archemar
    Jul 4 at 11:07










  • The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
    – marcelm
    Jul 4 at 15:34














up vote
2
down vote

favorite












I need to compare two files content. Both are generated by a third application.



The files look like an env file:



VAR1=VAL1
VAR2=VAL2
VAR3=VAL3
...


The problem is that sometimes this application generates the content:



VAR2=VAL2
VAR1=VAL1
VAR3=VAL3
...


I'm using md5sum in order to generate a file with checksums and then I use cmp in order to compare them.



So, the content is the same, but checksums differs.



Any ideas to solve that?







share|improve this question















  • 1




    why not simply sort file, remove comment if any, and compare-checksum them ?
    – Archemar
    Jul 4 at 11:07










  • The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
    – marcelm
    Jul 4 at 15:34












up vote
2
down vote

favorite









up vote
2
down vote

favorite











I need to compare two files content. Both are generated by a third application.



The files look like an env file:



VAR1=VAL1
VAR2=VAL2
VAR3=VAL3
...


The problem is that sometimes this application generates the content:



VAR2=VAL2
VAR1=VAL1
VAR3=VAL3
...


I'm using md5sum in order to generate a file with checksums and then I use cmp in order to compare them.



So, the content is the same, but checksums differs.



Any ideas to solve that?







share|improve this question











I need to compare two files content. Both are generated by a third application.



The files look like an env file:



VAR1=VAL1
VAR2=VAL2
VAR3=VAL3
...


The problem is that sometimes this application generates the content:



VAR2=VAL2
VAR1=VAL1
VAR3=VAL3
...


I'm using md5sum in order to generate a file with checksums and then I use cmp in order to compare them.



So, the content is the same, but checksums differs.



Any ideas to solve that?









share|improve this question










share|improve this question




share|improve this question









asked Jul 4 at 10:57









Jordi

1424




1424







  • 1




    why not simply sort file, remove comment if any, and compare-checksum them ?
    – Archemar
    Jul 4 at 11:07










  • The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
    – marcelm
    Jul 4 at 15:34












  • 1




    why not simply sort file, remove comment if any, and compare-checksum them ?
    – Archemar
    Jul 4 at 11:07










  • The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
    – marcelm
    Jul 4 at 15:34







1




1




why not simply sort file, remove comment if any, and compare-checksum them ?
– Archemar
Jul 4 at 11:07




why not simply sort file, remove comment if any, and compare-checksum them ?
– Archemar
Jul 4 at 11:07












The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
– marcelm
Jul 4 at 15:34




The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
– marcelm
Jul 4 at 15:34










1 Answer
1






active

oldest

votes

















up vote
7
down vote



accepted










If the files only contain constant assignments, you can sort them first. With process substitution (Bash/zsh):



cmp <(sort foo) <(sort bar)


(or cmp -s as usual)



If you have to do with a standard shell, you'll need temporary files:



a=$(mktemp) b=$(mktemp)
sort foo > "$a"; sort bar > "$b"
cmp "$a" "$b"
rm "$a" "$b"


In any case, you'll have to be sure that the files' lines can be sorted without changing the meaning. Multi-line strings would be broken by the sort, and if you have assignments that refer to the other variables, the assignment order would also matter.



If you want the hash, do something like:



cksum1=$(sort foo | sha256sum)
cksum2=$(sort bar | sha256sum)


But if you're doing the compare locally, it probably doesn't matter much if you just compare the files directly, since you need to read them in full to verify they're the same, and cmp can stop early if it spots a difference, while sha256sum can't.



If the files are on separate machines, then passing just the hash is of course easier. But even then, I'd suggest using SHA-256 (as above) or SHA-512 instead of MD5, if possible(*). Even busybox implements sha256sum, so you might be able to get it.



Of course, even process substitution may resort to temporary files, so the direct compare needs both sorted temporary files to exist at the same time, while taking the hash doesn't. But this should only matter if the files are large enough that duplicating them would run out the storage on the system.



(* MD5 has known weaknesses that allow generating collisions, while the SHA-2 hashes are considered stronger. You could get away with MD5 in some use cases, but that depends on the details and it's better to fail safe.)






share|improve this answer























  • I'm using #!/usr/bin/env sh. Could I use something like sort t-secret.env | md5sum | awk 'print $1'
    – Jordi
    Jul 4 at 11:54






  • 2




    @Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than using cmp.
    – Kusalananda
    Jul 4 at 11:54










  • @Jordi, yes, if you really need the hash. (Or cksum=$(sort foo | md5sum); cksum=$cksum%% * to use the shell instead of awk.)
    – ilkkachu
    Jul 4 at 12:22






  • 1




    Using cmp, you don't read the files in full unless they are identical. This is why using cmp is quicker.
    – Kusalananda
    Jul 4 at 12:23










  • @Kusalananda, yeah, that too.
    – ilkkachu
    Jul 4 at 12:25










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f453402%2fshell-compare-content-file-not-checksums%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
7
down vote



accepted










If the files only contain constant assignments, you can sort them first. With process substitution (Bash/zsh):



cmp <(sort foo) <(sort bar)


(or cmp -s as usual)



If you have to do with a standard shell, you'll need temporary files:



a=$(mktemp) b=$(mktemp)
sort foo > "$a"; sort bar > "$b"
cmp "$a" "$b"
rm "$a" "$b"


In any case, you'll have to be sure that the files' lines can be sorted without changing the meaning. Multi-line strings would be broken by the sort, and if you have assignments that refer to the other variables, the assignment order would also matter.



If you want the hash, do something like:



cksum1=$(sort foo | sha256sum)
cksum2=$(sort bar | sha256sum)


But if you're doing the compare locally, it probably doesn't matter much if you just compare the files directly, since you need to read them in full to verify they're the same, and cmp can stop early if it spots a difference, while sha256sum can't.



If the files are on separate machines, then passing just the hash is of course easier. But even then, I'd suggest using SHA-256 (as above) or SHA-512 instead of MD5, if possible(*). Even busybox implements sha256sum, so you might be able to get it.



Of course, even process substitution may resort to temporary files, so the direct compare needs both sorted temporary files to exist at the same time, while taking the hash doesn't. But this should only matter if the files are large enough that duplicating them would run out the storage on the system.



(* MD5 has known weaknesses that allow generating collisions, while the SHA-2 hashes are considered stronger. You could get away with MD5 in some use cases, but that depends on the details and it's better to fail safe.)






share|improve this answer























  • I'm using #!/usr/bin/env sh. Could I use something like sort t-secret.env | md5sum | awk 'print $1'
    – Jordi
    Jul 4 at 11:54






  • 2




    @Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than using cmp.
    – Kusalananda
    Jul 4 at 11:54










  • @Jordi, yes, if you really need the hash. (Or cksum=$(sort foo | md5sum); cksum=$cksum%% * to use the shell instead of awk.)
    – ilkkachu
    Jul 4 at 12:22






  • 1




    Using cmp, you don't read the files in full unless they are identical. This is why using cmp is quicker.
    – Kusalananda
    Jul 4 at 12:23










  • @Kusalananda, yeah, that too.
    – ilkkachu
    Jul 4 at 12:25














up vote
7
down vote



accepted










If the files only contain constant assignments, you can sort them first. With process substitution (Bash/zsh):



cmp <(sort foo) <(sort bar)


(or cmp -s as usual)



If you have to do with a standard shell, you'll need temporary files:



a=$(mktemp) b=$(mktemp)
sort foo > "$a"; sort bar > "$b"
cmp "$a" "$b"
rm "$a" "$b"


In any case, you'll have to be sure that the files' lines can be sorted without changing the meaning. Multi-line strings would be broken by the sort, and if you have assignments that refer to the other variables, the assignment order would also matter.



If you want the hash, do something like:



cksum1=$(sort foo | sha256sum)
cksum2=$(sort bar | sha256sum)


But if you're doing the compare locally, it probably doesn't matter much if you just compare the files directly, since you need to read them in full to verify they're the same, and cmp can stop early if it spots a difference, while sha256sum can't.



If the files are on separate machines, then passing just the hash is of course easier. But even then, I'd suggest using SHA-256 (as above) or SHA-512 instead of MD5, if possible(*). Even busybox implements sha256sum, so you might be able to get it.



Of course, even process substitution may resort to temporary files, so the direct compare needs both sorted temporary files to exist at the same time, while taking the hash doesn't. But this should only matter if the files are large enough that duplicating them would run out the storage on the system.



(* MD5 has known weaknesses that allow generating collisions, while the SHA-2 hashes are considered stronger. You could get away with MD5 in some use cases, but that depends on the details and it's better to fail safe.)






share|improve this answer























  • I'm using #!/usr/bin/env sh. Could I use something like sort t-secret.env | md5sum | awk 'print $1'
    – Jordi
    Jul 4 at 11:54






  • 2




    @Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than using cmp.
    – Kusalananda
    Jul 4 at 11:54










  • @Jordi, yes, if you really need the hash. (Or cksum=$(sort foo | md5sum); cksum=$cksum%% * to use the shell instead of awk.)
    – ilkkachu
    Jul 4 at 12:22






  • 1




    Using cmp, you don't read the files in full unless they are identical. This is why using cmp is quicker.
    – Kusalananda
    Jul 4 at 12:23










  • @Kusalananda, yeah, that too.
    – ilkkachu
    Jul 4 at 12:25












up vote
7
down vote



accepted







up vote
7
down vote



accepted






If the files only contain constant assignments, you can sort them first. With process substitution (Bash/zsh):



cmp <(sort foo) <(sort bar)


(or cmp -s as usual)



If you have to do with a standard shell, you'll need temporary files:



a=$(mktemp) b=$(mktemp)
sort foo > "$a"; sort bar > "$b"
cmp "$a" "$b"
rm "$a" "$b"


In any case, you'll have to be sure that the files' lines can be sorted without changing the meaning. Multi-line strings would be broken by the sort, and if you have assignments that refer to the other variables, the assignment order would also matter.



If you want the hash, do something like:



cksum1=$(sort foo | sha256sum)
cksum2=$(sort bar | sha256sum)


But if you're doing the compare locally, it probably doesn't matter much if you just compare the files directly, since you need to read them in full to verify they're the same, and cmp can stop early if it spots a difference, while sha256sum can't.



If the files are on separate machines, then passing just the hash is of course easier. But even then, I'd suggest using SHA-256 (as above) or SHA-512 instead of MD5, if possible(*). Even busybox implements sha256sum, so you might be able to get it.



Of course, even process substitution may resort to temporary files, so the direct compare needs both sorted temporary files to exist at the same time, while taking the hash doesn't. But this should only matter if the files are large enough that duplicating them would run out the storage on the system.



(* MD5 has known weaknesses that allow generating collisions, while the SHA-2 hashes are considered stronger. You could get away with MD5 in some use cases, but that depends on the details and it's better to fail safe.)






share|improve this answer















If the files only contain constant assignments, you can sort them first. With process substitution (Bash/zsh):



cmp <(sort foo) <(sort bar)


(or cmp -s as usual)



If you have to do with a standard shell, you'll need temporary files:



a=$(mktemp) b=$(mktemp)
sort foo > "$a"; sort bar > "$b"
cmp "$a" "$b"
rm "$a" "$b"


In any case, you'll have to be sure that the files' lines can be sorted without changing the meaning. Multi-line strings would be broken by the sort, and if you have assignments that refer to the other variables, the assignment order would also matter.



If you want the hash, do something like:



cksum1=$(sort foo | sha256sum)
cksum2=$(sort bar | sha256sum)


But if you're doing the compare locally, it probably doesn't matter much if you just compare the files directly, since you need to read them in full to verify they're the same, and cmp can stop early if it spots a difference, while sha256sum can't.



If the files are on separate machines, then passing just the hash is of course easier. But even then, I'd suggest using SHA-256 (as above) or SHA-512 instead of MD5, if possible(*). Even busybox implements sha256sum, so you might be able to get it.



Of course, even process substitution may resort to temporary files, so the direct compare needs both sorted temporary files to exist at the same time, while taking the hash doesn't. But this should only matter if the files are large enough that duplicating them would run out the storage on the system.



(* MD5 has known weaknesses that allow generating collisions, while the SHA-2 hashes are considered stronger. You could get away with MD5 in some use cases, but that depends on the details and it's better to fail safe.)







share|improve this answer















share|improve this answer



share|improve this answer








edited Jul 4 at 12:29


























answered Jul 4 at 11:15









ilkkachu

47.3k668130




47.3k668130











  • I'm using #!/usr/bin/env sh. Could I use something like sort t-secret.env | md5sum | awk 'print $1'
    – Jordi
    Jul 4 at 11:54






  • 2




    @Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than using cmp.
    – Kusalananda
    Jul 4 at 11:54










  • @Jordi, yes, if you really need the hash. (Or cksum=$(sort foo | md5sum); cksum=$cksum%% * to use the shell instead of awk.)
    – ilkkachu
    Jul 4 at 12:22






  • 1




    Using cmp, you don't read the files in full unless they are identical. This is why using cmp is quicker.
    – Kusalananda
    Jul 4 at 12:23










  • @Kusalananda, yeah, that too.
    – ilkkachu
    Jul 4 at 12:25
















  • I'm using #!/usr/bin/env sh. Could I use something like sort t-secret.env | md5sum | awk 'print $1'
    – Jordi
    Jul 4 at 11:54






  • 2




    @Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than using cmp.
    – Kusalananda
    Jul 4 at 11:54










  • @Jordi, yes, if you really need the hash. (Or cksum=$(sort foo | md5sum); cksum=$cksum%% * to use the shell instead of awk.)
    – ilkkachu
    Jul 4 at 12:22






  • 1




    Using cmp, you don't read the files in full unless they are identical. This is why using cmp is quicker.
    – Kusalananda
    Jul 4 at 12:23










  • @Kusalananda, yeah, that too.
    – ilkkachu
    Jul 4 at 12:25















I'm using #!/usr/bin/env sh. Could I use something like sort t-secret.env | md5sum | awk 'print $1'
– Jordi
Jul 4 at 11:54




I'm using #!/usr/bin/env sh. Could I use something like sort t-secret.env | md5sum | awk 'print $1'
– Jordi
Jul 4 at 11:54




2




2




@Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than using cmp.
– Kusalananda
Jul 4 at 11:54




@Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than using cmp.
– Kusalananda
Jul 4 at 11:54












@Jordi, yes, if you really need the hash. (Or cksum=$(sort foo | md5sum); cksum=$cksum%% * to use the shell instead of awk.)
– ilkkachu
Jul 4 at 12:22




@Jordi, yes, if you really need the hash. (Or cksum=$(sort foo | md5sum); cksum=$cksum%% * to use the shell instead of awk.)
– ilkkachu
Jul 4 at 12:22




1




1




Using cmp, you don't read the files in full unless they are identical. This is why using cmp is quicker.
– Kusalananda
Jul 4 at 12:23




Using cmp, you don't read the files in full unless they are identical. This is why using cmp is quicker.
– Kusalananda
Jul 4 at 12:23












@Kusalananda, yeah, that too.
– ilkkachu
Jul 4 at 12:25




@Kusalananda, yeah, that too.
– ilkkachu
Jul 4 at 12:25












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f453402%2fshell-compare-content-file-not-checksums%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?