Shell: compare content file not checksums
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
I need to compare two files content. Both are generated by a third application.
The files look like an env
file:
VAR1=VAL1
VAR2=VAL2
VAR3=VAL3
...
The problem is that sometimes this application generates the content:
VAR2=VAL2
VAR1=VAL1
VAR3=VAL3
...
I'm using md5sum
in order to generate a file with checksums and then I use cmp
in order to compare them.
So, the content is the same, but checksums differs.
Any ideas to solve that?
shell
add a comment |Â
up vote
2
down vote
favorite
I need to compare two files content. Both are generated by a third application.
The files look like an env
file:
VAR1=VAL1
VAR2=VAL2
VAR3=VAL3
...
The problem is that sometimes this application generates the content:
VAR2=VAL2
VAR1=VAL1
VAR3=VAL3
...
I'm using md5sum
in order to generate a file with checksums and then I use cmp
in order to compare them.
So, the content is the same, but checksums differs.
Any ideas to solve that?
shell
1
why not simply sort file, remove comment if any, and compare-checksum them ?
â Archemar
Jul 4 at 11:07
The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
â marcelm
Jul 4 at 15:34
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I need to compare two files content. Both are generated by a third application.
The files look like an env
file:
VAR1=VAL1
VAR2=VAL2
VAR3=VAL3
...
The problem is that sometimes this application generates the content:
VAR2=VAL2
VAR1=VAL1
VAR3=VAL3
...
I'm using md5sum
in order to generate a file with checksums and then I use cmp
in order to compare them.
So, the content is the same, but checksums differs.
Any ideas to solve that?
shell
I need to compare two files content. Both are generated by a third application.
The files look like an env
file:
VAR1=VAL1
VAR2=VAL2
VAR3=VAL3
...
The problem is that sometimes this application generates the content:
VAR2=VAL2
VAR1=VAL1
VAR3=VAL3
...
I'm using md5sum
in order to generate a file with checksums and then I use cmp
in order to compare them.
So, the content is the same, but checksums differs.
Any ideas to solve that?
shell
asked Jul 4 at 10:57
Jordi
1424
1424
1
why not simply sort file, remove comment if any, and compare-checksum them ?
â Archemar
Jul 4 at 11:07
The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
â marcelm
Jul 4 at 15:34
add a comment |Â
1
why not simply sort file, remove comment if any, and compare-checksum them ?
â Archemar
Jul 4 at 11:07
The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
â marcelm
Jul 4 at 15:34
1
1
why not simply sort file, remove comment if any, and compare-checksum them ?
â Archemar
Jul 4 at 11:07
why not simply sort file, remove comment if any, and compare-checksum them ?
â Archemar
Jul 4 at 11:07
The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
â marcelm
Jul 4 at 15:34
The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
â marcelm
Jul 4 at 15:34
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
7
down vote
accepted
If the files only contain constant assignments, you can sort them first. With process substitution (Bash/zsh):
cmp <(sort foo) <(sort bar)
(or cmp -s
as usual)
If you have to do with a standard shell, you'll need temporary files:
a=$(mktemp) b=$(mktemp)
sort foo > "$a"; sort bar > "$b"
cmp "$a" "$b"
rm "$a" "$b"
In any case, you'll have to be sure that the files' lines can be sorted without changing the meaning. Multi-line strings would be broken by the sort, and if you have assignments that refer to the other variables, the assignment order would also matter.
If you want the hash, do something like:
cksum1=$(sort foo | sha256sum)
cksum2=$(sort bar | sha256sum)
But if you're doing the compare locally, it probably doesn't matter much if you just compare the files directly, since you need to read them in full to verify they're the same, and cmp
can stop early if it spots a difference, while sha256sum
can't.
If the files are on separate machines, then passing just the hash is of course easier. But even then, I'd suggest using SHA-256 (as above) or SHA-512 instead of MD5, if possible(*). Even busybox
implements sha256sum
, so you might be able to get it.
Of course, even process substitution may resort to temporary files, so the direct compare needs both sorted temporary files to exist at the same time, while taking the hash doesn't. But this should only matter if the files are large enough that duplicating them would run out the storage on the system.
(* MD5 has known weaknesses that allow generating collisions, while the SHA-2 hashes are considered stronger. You could get away with MD5 in some use cases, but that depends on the details and it's better to fail safe.)
I'm using#!/usr/bin/env sh
. Could I use something likesort t-secret.env | md5sum | awk 'print $1'
â Jordi
Jul 4 at 11:54
2
@Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than usingcmp
.
â Kusalananda
Jul 4 at 11:54
@Jordi, yes, if you really need the hash. (Orcksum=$(sort foo | md5sum); cksum=$cksum%% *
to use the shell instead ofawk
.)
â ilkkachu
Jul 4 at 12:22
1
Usingcmp
, you don't read the files in full unless they are identical. This is why usingcmp
is quicker.
â Kusalananda
Jul 4 at 12:23
@Kusalananda, yeah, that too.
â ilkkachu
Jul 4 at 12:25
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
7
down vote
accepted
If the files only contain constant assignments, you can sort them first. With process substitution (Bash/zsh):
cmp <(sort foo) <(sort bar)
(or cmp -s
as usual)
If you have to do with a standard shell, you'll need temporary files:
a=$(mktemp) b=$(mktemp)
sort foo > "$a"; sort bar > "$b"
cmp "$a" "$b"
rm "$a" "$b"
In any case, you'll have to be sure that the files' lines can be sorted without changing the meaning. Multi-line strings would be broken by the sort, and if you have assignments that refer to the other variables, the assignment order would also matter.
If you want the hash, do something like:
cksum1=$(sort foo | sha256sum)
cksum2=$(sort bar | sha256sum)
But if you're doing the compare locally, it probably doesn't matter much if you just compare the files directly, since you need to read them in full to verify they're the same, and cmp
can stop early if it spots a difference, while sha256sum
can't.
If the files are on separate machines, then passing just the hash is of course easier. But even then, I'd suggest using SHA-256 (as above) or SHA-512 instead of MD5, if possible(*). Even busybox
implements sha256sum
, so you might be able to get it.
Of course, even process substitution may resort to temporary files, so the direct compare needs both sorted temporary files to exist at the same time, while taking the hash doesn't. But this should only matter if the files are large enough that duplicating them would run out the storage on the system.
(* MD5 has known weaknesses that allow generating collisions, while the SHA-2 hashes are considered stronger. You could get away with MD5 in some use cases, but that depends on the details and it's better to fail safe.)
I'm using#!/usr/bin/env sh
. Could I use something likesort t-secret.env | md5sum | awk 'print $1'
â Jordi
Jul 4 at 11:54
2
@Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than usingcmp
.
â Kusalananda
Jul 4 at 11:54
@Jordi, yes, if you really need the hash. (Orcksum=$(sort foo | md5sum); cksum=$cksum%% *
to use the shell instead ofawk
.)
â ilkkachu
Jul 4 at 12:22
1
Usingcmp
, you don't read the files in full unless they are identical. This is why usingcmp
is quicker.
â Kusalananda
Jul 4 at 12:23
@Kusalananda, yeah, that too.
â ilkkachu
Jul 4 at 12:25
add a comment |Â
up vote
7
down vote
accepted
If the files only contain constant assignments, you can sort them first. With process substitution (Bash/zsh):
cmp <(sort foo) <(sort bar)
(or cmp -s
as usual)
If you have to do with a standard shell, you'll need temporary files:
a=$(mktemp) b=$(mktemp)
sort foo > "$a"; sort bar > "$b"
cmp "$a" "$b"
rm "$a" "$b"
In any case, you'll have to be sure that the files' lines can be sorted without changing the meaning. Multi-line strings would be broken by the sort, and if you have assignments that refer to the other variables, the assignment order would also matter.
If you want the hash, do something like:
cksum1=$(sort foo | sha256sum)
cksum2=$(sort bar | sha256sum)
But if you're doing the compare locally, it probably doesn't matter much if you just compare the files directly, since you need to read them in full to verify they're the same, and cmp
can stop early if it spots a difference, while sha256sum
can't.
If the files are on separate machines, then passing just the hash is of course easier. But even then, I'd suggest using SHA-256 (as above) or SHA-512 instead of MD5, if possible(*). Even busybox
implements sha256sum
, so you might be able to get it.
Of course, even process substitution may resort to temporary files, so the direct compare needs both sorted temporary files to exist at the same time, while taking the hash doesn't. But this should only matter if the files are large enough that duplicating them would run out the storage on the system.
(* MD5 has known weaknesses that allow generating collisions, while the SHA-2 hashes are considered stronger. You could get away with MD5 in some use cases, but that depends on the details and it's better to fail safe.)
I'm using#!/usr/bin/env sh
. Could I use something likesort t-secret.env | md5sum | awk 'print $1'
â Jordi
Jul 4 at 11:54
2
@Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than usingcmp
.
â Kusalananda
Jul 4 at 11:54
@Jordi, yes, if you really need the hash. (Orcksum=$(sort foo | md5sum); cksum=$cksum%% *
to use the shell instead ofawk
.)
â ilkkachu
Jul 4 at 12:22
1
Usingcmp
, you don't read the files in full unless they are identical. This is why usingcmp
is quicker.
â Kusalananda
Jul 4 at 12:23
@Kusalananda, yeah, that too.
â ilkkachu
Jul 4 at 12:25
add a comment |Â
up vote
7
down vote
accepted
up vote
7
down vote
accepted
If the files only contain constant assignments, you can sort them first. With process substitution (Bash/zsh):
cmp <(sort foo) <(sort bar)
(or cmp -s
as usual)
If you have to do with a standard shell, you'll need temporary files:
a=$(mktemp) b=$(mktemp)
sort foo > "$a"; sort bar > "$b"
cmp "$a" "$b"
rm "$a" "$b"
In any case, you'll have to be sure that the files' lines can be sorted without changing the meaning. Multi-line strings would be broken by the sort, and if you have assignments that refer to the other variables, the assignment order would also matter.
If you want the hash, do something like:
cksum1=$(sort foo | sha256sum)
cksum2=$(sort bar | sha256sum)
But if you're doing the compare locally, it probably doesn't matter much if you just compare the files directly, since you need to read them in full to verify they're the same, and cmp
can stop early if it spots a difference, while sha256sum
can't.
If the files are on separate machines, then passing just the hash is of course easier. But even then, I'd suggest using SHA-256 (as above) or SHA-512 instead of MD5, if possible(*). Even busybox
implements sha256sum
, so you might be able to get it.
Of course, even process substitution may resort to temporary files, so the direct compare needs both sorted temporary files to exist at the same time, while taking the hash doesn't. But this should only matter if the files are large enough that duplicating them would run out the storage on the system.
(* MD5 has known weaknesses that allow generating collisions, while the SHA-2 hashes are considered stronger. You could get away with MD5 in some use cases, but that depends on the details and it's better to fail safe.)
If the files only contain constant assignments, you can sort them first. With process substitution (Bash/zsh):
cmp <(sort foo) <(sort bar)
(or cmp -s
as usual)
If you have to do with a standard shell, you'll need temporary files:
a=$(mktemp) b=$(mktemp)
sort foo > "$a"; sort bar > "$b"
cmp "$a" "$b"
rm "$a" "$b"
In any case, you'll have to be sure that the files' lines can be sorted without changing the meaning. Multi-line strings would be broken by the sort, and if you have assignments that refer to the other variables, the assignment order would also matter.
If you want the hash, do something like:
cksum1=$(sort foo | sha256sum)
cksum2=$(sort bar | sha256sum)
But if you're doing the compare locally, it probably doesn't matter much if you just compare the files directly, since you need to read them in full to verify they're the same, and cmp
can stop early if it spots a difference, while sha256sum
can't.
If the files are on separate machines, then passing just the hash is of course easier. But even then, I'd suggest using SHA-256 (as above) or SHA-512 instead of MD5, if possible(*). Even busybox
implements sha256sum
, so you might be able to get it.
Of course, even process substitution may resort to temporary files, so the direct compare needs both sorted temporary files to exist at the same time, while taking the hash doesn't. But this should only matter if the files are large enough that duplicating them would run out the storage on the system.
(* MD5 has known weaknesses that allow generating collisions, while the SHA-2 hashes are considered stronger. You could get away with MD5 in some use cases, but that depends on the details and it's better to fail safe.)
edited Jul 4 at 12:29
answered Jul 4 at 11:15
ilkkachu
47.3k668130
47.3k668130
I'm using#!/usr/bin/env sh
. Could I use something likesort t-secret.env | md5sum | awk 'print $1'
â Jordi
Jul 4 at 11:54
2
@Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than usingcmp
.
â Kusalananda
Jul 4 at 11:54
@Jordi, yes, if you really need the hash. (Orcksum=$(sort foo | md5sum); cksum=$cksum%% *
to use the shell instead ofawk
.)
â ilkkachu
Jul 4 at 12:22
1
Usingcmp
, you don't read the files in full unless they are identical. This is why usingcmp
is quicker.
â Kusalananda
Jul 4 at 12:23
@Kusalananda, yeah, that too.
â ilkkachu
Jul 4 at 12:25
add a comment |Â
I'm using#!/usr/bin/env sh
. Could I use something likesort t-secret.env | md5sum | awk 'print $1'
â Jordi
Jul 4 at 11:54
2
@Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than usingcmp
.
â Kusalananda
Jul 4 at 11:54
@Jordi, yes, if you really need the hash. (Orcksum=$(sort foo | md5sum); cksum=$cksum%% *
to use the shell instead ofawk
.)
â ilkkachu
Jul 4 at 12:22
1
Usingcmp
, you don't read the files in full unless they are identical. This is why usingcmp
is quicker.
â Kusalananda
Jul 4 at 12:23
@Kusalananda, yeah, that too.
â ilkkachu
Jul 4 at 12:25
I'm using
#!/usr/bin/env sh
. Could I use something like sort t-secret.env | md5sum | awk 'print $1'
â Jordi
Jul 4 at 11:54
I'm using
#!/usr/bin/env sh
. Could I use something like sort t-secret.env | md5sum | awk 'print $1'
â Jordi
Jul 4 at 11:54
2
2
@Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than using
cmp
.â Kusalananda
Jul 4 at 11:54
@Jordi You really do not need to compute MD5 checksums for this. Doing so would be much slower than using
cmp
.â Kusalananda
Jul 4 at 11:54
@Jordi, yes, if you really need the hash. (Or
cksum=$(sort foo | md5sum); cksum=$cksum%% *
to use the shell instead of awk
.)â ilkkachu
Jul 4 at 12:22
@Jordi, yes, if you really need the hash. (Or
cksum=$(sort foo | md5sum); cksum=$cksum%% *
to use the shell instead of awk
.)â ilkkachu
Jul 4 at 12:22
1
1
Using
cmp
, you don't read the files in full unless they are identical. This is why using cmp
is quicker.â Kusalananda
Jul 4 at 12:23
Using
cmp
, you don't read the files in full unless they are identical. This is why using cmp
is quicker.â Kusalananda
Jul 4 at 12:23
@Kusalananda, yeah, that too.
â ilkkachu
Jul 4 at 12:25
@Kusalananda, yeah, that too.
â ilkkachu
Jul 4 at 12:25
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f453402%2fshell-compare-content-file-not-checksums%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
why not simply sort file, remove comment if any, and compare-checksum them ?
â Archemar
Jul 4 at 11:07
The contents may be equivalent for your specific purpose, but they're not the same. Big difference.
â marcelm
Jul 4 at 15:34