Script not correctly printing the correct elements of the array
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
So basically I have to write a bash script that examines some specific files from the working directory (which are named file.00.txt up to file.24.txt). The thing is, 3 of them are exatcly the same and my assignment is to create a script that tells me which 3 are the same.
Here is my code
#!/bin/bash
f0=file.00.txt
f1=file.01.txt
f2=file.02.txt
f3=file.03.txt
f4=file.04.txt
f5=file.05.txt
f6=file.06.txt
f7=file.07.txt
f8=file.08.txt
f9=file.09.txt
f10=file.10.txt
f11=file.11.txt
f12=file.12.txt
f13=file.13.txt
f14=file.14.txt
f15=file.15.txt
f16=file.16.txt
f17=file.17.txt
f18=file.18.txt
f19=file.19.txt
f20=file.20.txt
f21=file.21.txt
f22=file.22.txt
f23=file.23.txt
f24=file.24.txt
array=($f0 $f1 $f2 $f3 $f4 $f5 $f6 $f7 $f8 $f9 $f10 $f11 $f12 $f13 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24)
i=0
touch placeholder
while [ $i -lt $#array ]
do
DIFF=$(diff $array[i] $array[i+1])
if [ "$DIFF" = "" ]
then
echo "$array[i] y $array[i+1]" >> placeholder
fi
i=$((i+1))
done
cat placeholder
The idea of this code is to compare each file with the following one in the array , then store the ones that are the same in a file called placeholder and finally revealing the contents of the file with the cat command.
However, everytime I run the scrip I am getting the message
file.00.txt y file.00.txt
file.01.txt y file.01.txt
file.02.txt y file.02.txt
and so on for each file. This should not happen, since I am clearly using
echo "$array[i] y $array[i+1]" >> placeholder
to echo both positions. Why is this happening and how can I solve this ?
How can I solve this?
bash shell-script diff
|
show 2 more comments
So basically I have to write a bash script that examines some specific files from the working directory (which are named file.00.txt up to file.24.txt). The thing is, 3 of them are exatcly the same and my assignment is to create a script that tells me which 3 are the same.
Here is my code
#!/bin/bash
f0=file.00.txt
f1=file.01.txt
f2=file.02.txt
f3=file.03.txt
f4=file.04.txt
f5=file.05.txt
f6=file.06.txt
f7=file.07.txt
f8=file.08.txt
f9=file.09.txt
f10=file.10.txt
f11=file.11.txt
f12=file.12.txt
f13=file.13.txt
f14=file.14.txt
f15=file.15.txt
f16=file.16.txt
f17=file.17.txt
f18=file.18.txt
f19=file.19.txt
f20=file.20.txt
f21=file.21.txt
f22=file.22.txt
f23=file.23.txt
f24=file.24.txt
array=($f0 $f1 $f2 $f3 $f4 $f5 $f6 $f7 $f8 $f9 $f10 $f11 $f12 $f13 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24)
i=0
touch placeholder
while [ $i -lt $#array ]
do
DIFF=$(diff $array[i] $array[i+1])
if [ "$DIFF" = "" ]
then
echo "$array[i] y $array[i+1]" >> placeholder
fi
i=$((i+1))
done
cat placeholder
The idea of this code is to compare each file with the following one in the array , then store the ones that are the same in a file called placeholder and finally revealing the contents of the file with the cat command.
However, everytime I run the scrip I am getting the message
file.00.txt y file.00.txt
file.01.txt y file.01.txt
file.02.txt y file.02.txt
and so on for each file. This should not happen, since I am clearly using
echo "$array[i] y $array[i+1]" >> placeholder
to echo both positions. Why is this happening and how can I solve this ?
How can I solve this?
bash shell-script diff
3
You do add a comma to the filename in the call todiff
. It may be worth removing that... Apart from that, is there a reason for creating all those variables and then add them to an array? Note too, that your loop is not terminated by adone
and that you don't incrementi
, and that the length of the array is$#array[@]
, not 27, and that it's easier to loop over the elements of the array ("$array[@]"
) than to use an index (that is, if you need an array at all and can't just loop overfile.*.txt
).
– Kusalananda♦
Mar 17 at 17:16
Also you could just use brace expansion:array=( file.00..24.txt )
to generate the file names. Though note that if you only diff files n and n+1, you won't notice if nonconsecutive files (e.g. 3 and 8) are the same, so the logic in the loop may need some more thought. (You'd need to look over each other file for each file.)
– ilkkachu
Mar 17 at 17:37
Ok so I corrected all these things but I am still not getting what I want to . I will edit to clarifi what I getting
– pericothebig
Mar 17 at 17:41
@Kusalananda Ok so I am still new, so I did not know I could just loop over the files. How would I do that without an array? Thnaks to both of you guys
– pericothebig
Mar 17 at 17:46
2
Very bad algorithm anyway because you will do 27*26 comparisons. Try instead: 1) get the MD5 hash of the files (md5sum command), 2) count the instances of each hash (sort | uniq -c
after extracting only the hashes fro the md5sum output), 3) find the biggest count (addsort -n | tail i-1
to the pipeline), 4) extract the hash, 5) usegrep $hash
on the complete md5sum output to retrieve the names of the files with that hash. If paranoid, 6) usecmp
to check that all three files are the same.
– xenoid
Mar 17 at 17:58
|
show 2 more comments
So basically I have to write a bash script that examines some specific files from the working directory (which are named file.00.txt up to file.24.txt). The thing is, 3 of them are exatcly the same and my assignment is to create a script that tells me which 3 are the same.
Here is my code
#!/bin/bash
f0=file.00.txt
f1=file.01.txt
f2=file.02.txt
f3=file.03.txt
f4=file.04.txt
f5=file.05.txt
f6=file.06.txt
f7=file.07.txt
f8=file.08.txt
f9=file.09.txt
f10=file.10.txt
f11=file.11.txt
f12=file.12.txt
f13=file.13.txt
f14=file.14.txt
f15=file.15.txt
f16=file.16.txt
f17=file.17.txt
f18=file.18.txt
f19=file.19.txt
f20=file.20.txt
f21=file.21.txt
f22=file.22.txt
f23=file.23.txt
f24=file.24.txt
array=($f0 $f1 $f2 $f3 $f4 $f5 $f6 $f7 $f8 $f9 $f10 $f11 $f12 $f13 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24)
i=0
touch placeholder
while [ $i -lt $#array ]
do
DIFF=$(diff $array[i] $array[i+1])
if [ "$DIFF" = "" ]
then
echo "$array[i] y $array[i+1]" >> placeholder
fi
i=$((i+1))
done
cat placeholder
The idea of this code is to compare each file with the following one in the array , then store the ones that are the same in a file called placeholder and finally revealing the contents of the file with the cat command.
However, everytime I run the scrip I am getting the message
file.00.txt y file.00.txt
file.01.txt y file.01.txt
file.02.txt y file.02.txt
and so on for each file. This should not happen, since I am clearly using
echo "$array[i] y $array[i+1]" >> placeholder
to echo both positions. Why is this happening and how can I solve this ?
How can I solve this?
bash shell-script diff
So basically I have to write a bash script that examines some specific files from the working directory (which are named file.00.txt up to file.24.txt). The thing is, 3 of them are exatcly the same and my assignment is to create a script that tells me which 3 are the same.
Here is my code
#!/bin/bash
f0=file.00.txt
f1=file.01.txt
f2=file.02.txt
f3=file.03.txt
f4=file.04.txt
f5=file.05.txt
f6=file.06.txt
f7=file.07.txt
f8=file.08.txt
f9=file.09.txt
f10=file.10.txt
f11=file.11.txt
f12=file.12.txt
f13=file.13.txt
f14=file.14.txt
f15=file.15.txt
f16=file.16.txt
f17=file.17.txt
f18=file.18.txt
f19=file.19.txt
f20=file.20.txt
f21=file.21.txt
f22=file.22.txt
f23=file.23.txt
f24=file.24.txt
array=($f0 $f1 $f2 $f3 $f4 $f5 $f6 $f7 $f8 $f9 $f10 $f11 $f12 $f13 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24)
i=0
touch placeholder
while [ $i -lt $#array ]
do
DIFF=$(diff $array[i] $array[i+1])
if [ "$DIFF" = "" ]
then
echo "$array[i] y $array[i+1]" >> placeholder
fi
i=$((i+1))
done
cat placeholder
The idea of this code is to compare each file with the following one in the array , then store the ones that are the same in a file called placeholder and finally revealing the contents of the file with the cat command.
However, everytime I run the scrip I am getting the message
file.00.txt y file.00.txt
file.01.txt y file.01.txt
file.02.txt y file.02.txt
and so on for each file. This should not happen, since I am clearly using
echo "$array[i] y $array[i+1]" >> placeholder
to echo both positions. Why is this happening and how can I solve this ?
How can I solve this?
bash shell-script diff
bash shell-script diff
edited Mar 17 at 18:06
pericothebig
asked Mar 17 at 17:13
pericothebigpericothebig
11
11
3
You do add a comma to the filename in the call todiff
. It may be worth removing that... Apart from that, is there a reason for creating all those variables and then add them to an array? Note too, that your loop is not terminated by adone
and that you don't incrementi
, and that the length of the array is$#array[@]
, not 27, and that it's easier to loop over the elements of the array ("$array[@]"
) than to use an index (that is, if you need an array at all and can't just loop overfile.*.txt
).
– Kusalananda♦
Mar 17 at 17:16
Also you could just use brace expansion:array=( file.00..24.txt )
to generate the file names. Though note that if you only diff files n and n+1, you won't notice if nonconsecutive files (e.g. 3 and 8) are the same, so the logic in the loop may need some more thought. (You'd need to look over each other file for each file.)
– ilkkachu
Mar 17 at 17:37
Ok so I corrected all these things but I am still not getting what I want to . I will edit to clarifi what I getting
– pericothebig
Mar 17 at 17:41
@Kusalananda Ok so I am still new, so I did not know I could just loop over the files. How would I do that without an array? Thnaks to both of you guys
– pericothebig
Mar 17 at 17:46
2
Very bad algorithm anyway because you will do 27*26 comparisons. Try instead: 1) get the MD5 hash of the files (md5sum command), 2) count the instances of each hash (sort | uniq -c
after extracting only the hashes fro the md5sum output), 3) find the biggest count (addsort -n | tail i-1
to the pipeline), 4) extract the hash, 5) usegrep $hash
on the complete md5sum output to retrieve the names of the files with that hash. If paranoid, 6) usecmp
to check that all three files are the same.
– xenoid
Mar 17 at 17:58
|
show 2 more comments
3
You do add a comma to the filename in the call todiff
. It may be worth removing that... Apart from that, is there a reason for creating all those variables and then add them to an array? Note too, that your loop is not terminated by adone
and that you don't incrementi
, and that the length of the array is$#array[@]
, not 27, and that it's easier to loop over the elements of the array ("$array[@]"
) than to use an index (that is, if you need an array at all and can't just loop overfile.*.txt
).
– Kusalananda♦
Mar 17 at 17:16
Also you could just use brace expansion:array=( file.00..24.txt )
to generate the file names. Though note that if you only diff files n and n+1, you won't notice if nonconsecutive files (e.g. 3 and 8) are the same, so the logic in the loop may need some more thought. (You'd need to look over each other file for each file.)
– ilkkachu
Mar 17 at 17:37
Ok so I corrected all these things but I am still not getting what I want to . I will edit to clarifi what I getting
– pericothebig
Mar 17 at 17:41
@Kusalananda Ok so I am still new, so I did not know I could just loop over the files. How would I do that without an array? Thnaks to both of you guys
– pericothebig
Mar 17 at 17:46
2
Very bad algorithm anyway because you will do 27*26 comparisons. Try instead: 1) get the MD5 hash of the files (md5sum command), 2) count the instances of each hash (sort | uniq -c
after extracting only the hashes fro the md5sum output), 3) find the biggest count (addsort -n | tail i-1
to the pipeline), 4) extract the hash, 5) usegrep $hash
on the complete md5sum output to retrieve the names of the files with that hash. If paranoid, 6) usecmp
to check that all three files are the same.
– xenoid
Mar 17 at 17:58
3
3
You do add a comma to the filename in the call to
diff
. It may be worth removing that... Apart from that, is there a reason for creating all those variables and then add them to an array? Note too, that your loop is not terminated by a done
and that you don't increment i
, and that the length of the array is $#array[@]
, not 27, and that it's easier to loop over the elements of the array ("$array[@]"
) than to use an index (that is, if you need an array at all and can't just loop over file.*.txt
).– Kusalananda♦
Mar 17 at 17:16
You do add a comma to the filename in the call to
diff
. It may be worth removing that... Apart from that, is there a reason for creating all those variables and then add them to an array? Note too, that your loop is not terminated by a done
and that you don't increment i
, and that the length of the array is $#array[@]
, not 27, and that it's easier to loop over the elements of the array ("$array[@]"
) than to use an index (that is, if you need an array at all and can't just loop over file.*.txt
).– Kusalananda♦
Mar 17 at 17:16
Also you could just use brace expansion:
array=( file.00..24.txt )
to generate the file names. Though note that if you only diff files n and n+1, you won't notice if nonconsecutive files (e.g. 3 and 8) are the same, so the logic in the loop may need some more thought. (You'd need to look over each other file for each file.)– ilkkachu
Mar 17 at 17:37
Also you could just use brace expansion:
array=( file.00..24.txt )
to generate the file names. Though note that if you only diff files n and n+1, you won't notice if nonconsecutive files (e.g. 3 and 8) are the same, so the logic in the loop may need some more thought. (You'd need to look over each other file for each file.)– ilkkachu
Mar 17 at 17:37
Ok so I corrected all these things but I am still not getting what I want to . I will edit to clarifi what I getting
– pericothebig
Mar 17 at 17:41
Ok so I corrected all these things but I am still not getting what I want to . I will edit to clarifi what I getting
– pericothebig
Mar 17 at 17:41
@Kusalananda Ok so I am still new, so I did not know I could just loop over the files. How would I do that without an array? Thnaks to both of you guys
– pericothebig
Mar 17 at 17:46
@Kusalananda Ok so I am still new, so I did not know I could just loop over the files. How would I do that without an array? Thnaks to both of you guys
– pericothebig
Mar 17 at 17:46
2
2
Very bad algorithm anyway because you will do 27*26 comparisons. Try instead: 1) get the MD5 hash of the files (md5sum command), 2) count the instances of each hash (
sort | uniq -c
after extracting only the hashes fro the md5sum output), 3) find the biggest count (add sort -n | tail i-1
to the pipeline), 4) extract the hash, 5) use grep $hash
on the complete md5sum output to retrieve the names of the files with that hash. If paranoid, 6) use cmp
to check that all three files are the same.– xenoid
Mar 17 at 17:58
Very bad algorithm anyway because you will do 27*26 comparisons. Try instead: 1) get the MD5 hash of the files (md5sum command), 2) count the instances of each hash (
sort | uniq -c
after extracting only the hashes fro the md5sum output), 3) find the biggest count (add sort -n | tail i-1
to the pipeline), 4) extract the hash, 5) use grep $hash
on the complete md5sum output to retrieve the names of the files with that hash. If paranoid, 6) use cmp
to check that all three files are the same.– xenoid
Mar 17 at 17:58
|
show 2 more comments
2 Answers
2
active
oldest
votes
The seemingly strange output of your code could possibly be explained by the fact that your script only ever adds to the output file. This means that you might have had some error in your code previously (now corrected), but that you still see the output of that run in the output file, since the output file is never removed or emptied by the script.
You could shorten your script into
#!/bin/bash
array=( file.*.txt )
for name in "$array[@]"; do
if [ -n "$prev_name" ] && cmp -s "$prev_name" "$name"
then
printf '%s y %sn' "$prev_name" "$name"
fi
prev_name=$name
done
This uses a globbing pattern to populate the array with the filenames matching the pattern.
It then loops over the names, comparing filenames that occur adjacent to each other in the array using cmp -s
. The cmp
utility will exit with a true exit status if the contents of the two files that it's comparing are identical.
The loop uses $prev_name
to hold the name of the previous file in the array. In the first iteration of the loop, this variable is empty, so the actual comparison of files is skipped.
What you are possibly expected to write is a double loop. Something like
for nameA in "$array[@]"; do
for nameB in "$array[@]"; do
if [ "$nameA" != "$nameB" ] && cmp -s "$nameA" "$nameB"
then
printf '%s y %sn' "$nameA" "$nameB"
fi
done
done
But this would compare A
against B
and B
against A
, and the number of calls to cmp
would grow quadratically with the number of files involved, which would be both resource intensive (on disks; it would read each file as many times as there are filenames in the array) and slow.
A common way to find sets of files is with identical contents is with fdupes
:
$ fdupes --sameline .
./file.1.txt ./file.2.txt ./file.7.txt
Would you want to do something similar without fdupes
, you could do that by computing and comparing a checksum of each file using e.g. md5sum
:
#!/bin/bash
declare -A names count
while read -r cksum name; do
names[$cksum]+=$names[$cksum]:+,$name
count[$cksum]=$(( count[$cksum] + 1 ))
done < <( md5sum file.*.txt )
for cksum in "$!count[@]"; do
if [ "$count[$cksum]" -gt 1 ]; then
printf '%sn' "$names[$cksum]"
fi
done
The first loop reads the output of md5sum
which is is executed across all the relevant files. The output of md5sum
may look something like
897316929176464ebc9ad085f31e7284 file.1.txt
8c9eb686bf3eb5bd83d9373eadf6504b file.10.txt
897316929176464ebc9ad085f31e7284 file.2.txt
26ab0db90d72e28ad0ba1e22ee510510 file.3.txt
84bc3da1b3e33a18e8d5e1bdd7a18d7a file.4.txt
aa6ed9e0f26a6eba784aae8267df1951 file.5.txt
6d7fce9fee471194aa8b5b6e47267f03 file.6.txt
897316929176464ebc9ad085f31e7284 file.7.txt
c30f7472766d25af1dc80b3ffc9a58c7 file.8.txt
9ae0ea9e3c9c6e1b9b6252c8395efdc1 file.9.txt
The checksum in the first column is read into cksum
and filename gets read into name
.
Inside the first loop, we append the name to an entry in an associative array that is indexed by the checksum. The way the assignment to names[$cksum]
is done there makes sure that we add a comma before each new name if needed (which it is if the entry already contains some other names). We then update a count of the number of times that we've seen this particular checksum (this will be used in the second loop).
In the second loop, we go through the checksums ("$!count[@]"
expands to a list of keys (checksums) in the count
associative array), and for each checksum we test whether its count is greater than 1, meaning we've found a duplicate file (if you're looking for groups of exactly three identical files, you may want to use -eq 3
instead of -gt 1
here). If it is, we print the names associated with that checksum.
Testing it:
$ bash script.sh
file.1.txt,file.2.txt,file.7.txt
add a comment |
Here's a more efficient way of doing what you're trying to do. I use a smaller sample set to keep things clearer:
#!/bin/bash
# clear placeholder
printf "Files with no diff:n" > placeholder
# set up sample data
echo "one" > file.00.txt
echo "one" > file.01.txt
echo "foo" > file.02.txt
echo "bar" > file.03.txt
echo "two" > file.04.txt
echo "two" > file.05.txt
# generate array
i=0
while [ $i -lt 6 ]; do
array+=( file.`printf %02d $i`.txt )
((i++))
done
i=0
while [ $i -lt 5 ]; do
diff --brief $array[i] $array[i+1] &&
echo "$array[i] $array[i+1]" >> placeholder
((i++))
done
Results:
$ sh ./test.sh
Files file.01.txt and file.02.txt differ
Files file.02.txt and file.03.txt differ
Files file.03.txt and file.04.txt differ
$ cat placeholder
Files with no diff:
file.00.txt file.01.txt
file.04.txt file.05.txt
You don't need to generate sample data if you actually already have data.
The code, explained:
Building an array in a loop (in Bash) can be done by iterating the way you obviously already know, but the array+=
notation appends an element.
The ((++))
obviously increments your counter.
Performing the diff, I use the --brief
option. If you read the diff
man page, it tells you that --brief
only prints output when a difference is found. Therefore, the diff command succeeds if no diff is found.
Using the &&
(AND) notation, this code echos the names of the files being compared into your placeholder
file if and only if the diff
command generates no output.
If there is a difference between files, diff
outputs the differences to the terminal. This causes the &&
(AND) to fail, so nothing is output to the placeholder file.
If you have any further questions about the syntax, feel free to ask.
2
"Because Bash interprets silence as success...", noo, not really.&&
andif
check the exit code/exit status of the command.diff
just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g.echo foo && echo "it's true"
orfalse && echo "it's true"
– ilkkachu
Mar 17 at 19:37
Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.
– Klaatu von Schlacker
Mar 22 at 9:53
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f506847%2fscript-not-correctly-printing-the-correct-elements-of-the-array%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The seemingly strange output of your code could possibly be explained by the fact that your script only ever adds to the output file. This means that you might have had some error in your code previously (now corrected), but that you still see the output of that run in the output file, since the output file is never removed or emptied by the script.
You could shorten your script into
#!/bin/bash
array=( file.*.txt )
for name in "$array[@]"; do
if [ -n "$prev_name" ] && cmp -s "$prev_name" "$name"
then
printf '%s y %sn' "$prev_name" "$name"
fi
prev_name=$name
done
This uses a globbing pattern to populate the array with the filenames matching the pattern.
It then loops over the names, comparing filenames that occur adjacent to each other in the array using cmp -s
. The cmp
utility will exit with a true exit status if the contents of the two files that it's comparing are identical.
The loop uses $prev_name
to hold the name of the previous file in the array. In the first iteration of the loop, this variable is empty, so the actual comparison of files is skipped.
What you are possibly expected to write is a double loop. Something like
for nameA in "$array[@]"; do
for nameB in "$array[@]"; do
if [ "$nameA" != "$nameB" ] && cmp -s "$nameA" "$nameB"
then
printf '%s y %sn' "$nameA" "$nameB"
fi
done
done
But this would compare A
against B
and B
against A
, and the number of calls to cmp
would grow quadratically with the number of files involved, which would be both resource intensive (on disks; it would read each file as many times as there are filenames in the array) and slow.
A common way to find sets of files is with identical contents is with fdupes
:
$ fdupes --sameline .
./file.1.txt ./file.2.txt ./file.7.txt
Would you want to do something similar without fdupes
, you could do that by computing and comparing a checksum of each file using e.g. md5sum
:
#!/bin/bash
declare -A names count
while read -r cksum name; do
names[$cksum]+=$names[$cksum]:+,$name
count[$cksum]=$(( count[$cksum] + 1 ))
done < <( md5sum file.*.txt )
for cksum in "$!count[@]"; do
if [ "$count[$cksum]" -gt 1 ]; then
printf '%sn' "$names[$cksum]"
fi
done
The first loop reads the output of md5sum
which is is executed across all the relevant files. The output of md5sum
may look something like
897316929176464ebc9ad085f31e7284 file.1.txt
8c9eb686bf3eb5bd83d9373eadf6504b file.10.txt
897316929176464ebc9ad085f31e7284 file.2.txt
26ab0db90d72e28ad0ba1e22ee510510 file.3.txt
84bc3da1b3e33a18e8d5e1bdd7a18d7a file.4.txt
aa6ed9e0f26a6eba784aae8267df1951 file.5.txt
6d7fce9fee471194aa8b5b6e47267f03 file.6.txt
897316929176464ebc9ad085f31e7284 file.7.txt
c30f7472766d25af1dc80b3ffc9a58c7 file.8.txt
9ae0ea9e3c9c6e1b9b6252c8395efdc1 file.9.txt
The checksum in the first column is read into cksum
and filename gets read into name
.
Inside the first loop, we append the name to an entry in an associative array that is indexed by the checksum. The way the assignment to names[$cksum]
is done there makes sure that we add a comma before each new name if needed (which it is if the entry already contains some other names). We then update a count of the number of times that we've seen this particular checksum (this will be used in the second loop).
In the second loop, we go through the checksums ("$!count[@]"
expands to a list of keys (checksums) in the count
associative array), and for each checksum we test whether its count is greater than 1, meaning we've found a duplicate file (if you're looking for groups of exactly three identical files, you may want to use -eq 3
instead of -gt 1
here). If it is, we print the names associated with that checksum.
Testing it:
$ bash script.sh
file.1.txt,file.2.txt,file.7.txt
add a comment |
The seemingly strange output of your code could possibly be explained by the fact that your script only ever adds to the output file. This means that you might have had some error in your code previously (now corrected), but that you still see the output of that run in the output file, since the output file is never removed or emptied by the script.
You could shorten your script into
#!/bin/bash
array=( file.*.txt )
for name in "$array[@]"; do
if [ -n "$prev_name" ] && cmp -s "$prev_name" "$name"
then
printf '%s y %sn' "$prev_name" "$name"
fi
prev_name=$name
done
This uses a globbing pattern to populate the array with the filenames matching the pattern.
It then loops over the names, comparing filenames that occur adjacent to each other in the array using cmp -s
. The cmp
utility will exit with a true exit status if the contents of the two files that it's comparing are identical.
The loop uses $prev_name
to hold the name of the previous file in the array. In the first iteration of the loop, this variable is empty, so the actual comparison of files is skipped.
What you are possibly expected to write is a double loop. Something like
for nameA in "$array[@]"; do
for nameB in "$array[@]"; do
if [ "$nameA" != "$nameB" ] && cmp -s "$nameA" "$nameB"
then
printf '%s y %sn' "$nameA" "$nameB"
fi
done
done
But this would compare A
against B
and B
against A
, and the number of calls to cmp
would grow quadratically with the number of files involved, which would be both resource intensive (on disks; it would read each file as many times as there are filenames in the array) and slow.
A common way to find sets of files is with identical contents is with fdupes
:
$ fdupes --sameline .
./file.1.txt ./file.2.txt ./file.7.txt
Would you want to do something similar without fdupes
, you could do that by computing and comparing a checksum of each file using e.g. md5sum
:
#!/bin/bash
declare -A names count
while read -r cksum name; do
names[$cksum]+=$names[$cksum]:+,$name
count[$cksum]=$(( count[$cksum] + 1 ))
done < <( md5sum file.*.txt )
for cksum in "$!count[@]"; do
if [ "$count[$cksum]" -gt 1 ]; then
printf '%sn' "$names[$cksum]"
fi
done
The first loop reads the output of md5sum
which is is executed across all the relevant files. The output of md5sum
may look something like
897316929176464ebc9ad085f31e7284 file.1.txt
8c9eb686bf3eb5bd83d9373eadf6504b file.10.txt
897316929176464ebc9ad085f31e7284 file.2.txt
26ab0db90d72e28ad0ba1e22ee510510 file.3.txt
84bc3da1b3e33a18e8d5e1bdd7a18d7a file.4.txt
aa6ed9e0f26a6eba784aae8267df1951 file.5.txt
6d7fce9fee471194aa8b5b6e47267f03 file.6.txt
897316929176464ebc9ad085f31e7284 file.7.txt
c30f7472766d25af1dc80b3ffc9a58c7 file.8.txt
9ae0ea9e3c9c6e1b9b6252c8395efdc1 file.9.txt
The checksum in the first column is read into cksum
and filename gets read into name
.
Inside the first loop, we append the name to an entry in an associative array that is indexed by the checksum. The way the assignment to names[$cksum]
is done there makes sure that we add a comma before each new name if needed (which it is if the entry already contains some other names). We then update a count of the number of times that we've seen this particular checksum (this will be used in the second loop).
In the second loop, we go through the checksums ("$!count[@]"
expands to a list of keys (checksums) in the count
associative array), and for each checksum we test whether its count is greater than 1, meaning we've found a duplicate file (if you're looking for groups of exactly three identical files, you may want to use -eq 3
instead of -gt 1
here). If it is, we print the names associated with that checksum.
Testing it:
$ bash script.sh
file.1.txt,file.2.txt,file.7.txt
add a comment |
The seemingly strange output of your code could possibly be explained by the fact that your script only ever adds to the output file. This means that you might have had some error in your code previously (now corrected), but that you still see the output of that run in the output file, since the output file is never removed or emptied by the script.
You could shorten your script into
#!/bin/bash
array=( file.*.txt )
for name in "$array[@]"; do
if [ -n "$prev_name" ] && cmp -s "$prev_name" "$name"
then
printf '%s y %sn' "$prev_name" "$name"
fi
prev_name=$name
done
This uses a globbing pattern to populate the array with the filenames matching the pattern.
It then loops over the names, comparing filenames that occur adjacent to each other in the array using cmp -s
. The cmp
utility will exit with a true exit status if the contents of the two files that it's comparing are identical.
The loop uses $prev_name
to hold the name of the previous file in the array. In the first iteration of the loop, this variable is empty, so the actual comparison of files is skipped.
What you are possibly expected to write is a double loop. Something like
for nameA in "$array[@]"; do
for nameB in "$array[@]"; do
if [ "$nameA" != "$nameB" ] && cmp -s "$nameA" "$nameB"
then
printf '%s y %sn' "$nameA" "$nameB"
fi
done
done
But this would compare A
against B
and B
against A
, and the number of calls to cmp
would grow quadratically with the number of files involved, which would be both resource intensive (on disks; it would read each file as many times as there are filenames in the array) and slow.
A common way to find sets of files is with identical contents is with fdupes
:
$ fdupes --sameline .
./file.1.txt ./file.2.txt ./file.7.txt
Would you want to do something similar without fdupes
, you could do that by computing and comparing a checksum of each file using e.g. md5sum
:
#!/bin/bash
declare -A names count
while read -r cksum name; do
names[$cksum]+=$names[$cksum]:+,$name
count[$cksum]=$(( count[$cksum] + 1 ))
done < <( md5sum file.*.txt )
for cksum in "$!count[@]"; do
if [ "$count[$cksum]" -gt 1 ]; then
printf '%sn' "$names[$cksum]"
fi
done
The first loop reads the output of md5sum
which is is executed across all the relevant files. The output of md5sum
may look something like
897316929176464ebc9ad085f31e7284 file.1.txt
8c9eb686bf3eb5bd83d9373eadf6504b file.10.txt
897316929176464ebc9ad085f31e7284 file.2.txt
26ab0db90d72e28ad0ba1e22ee510510 file.3.txt
84bc3da1b3e33a18e8d5e1bdd7a18d7a file.4.txt
aa6ed9e0f26a6eba784aae8267df1951 file.5.txt
6d7fce9fee471194aa8b5b6e47267f03 file.6.txt
897316929176464ebc9ad085f31e7284 file.7.txt
c30f7472766d25af1dc80b3ffc9a58c7 file.8.txt
9ae0ea9e3c9c6e1b9b6252c8395efdc1 file.9.txt
The checksum in the first column is read into cksum
and filename gets read into name
.
Inside the first loop, we append the name to an entry in an associative array that is indexed by the checksum. The way the assignment to names[$cksum]
is done there makes sure that we add a comma before each new name if needed (which it is if the entry already contains some other names). We then update a count of the number of times that we've seen this particular checksum (this will be used in the second loop).
In the second loop, we go through the checksums ("$!count[@]"
expands to a list of keys (checksums) in the count
associative array), and for each checksum we test whether its count is greater than 1, meaning we've found a duplicate file (if you're looking for groups of exactly three identical files, you may want to use -eq 3
instead of -gt 1
here). If it is, we print the names associated with that checksum.
Testing it:
$ bash script.sh
file.1.txt,file.2.txt,file.7.txt
The seemingly strange output of your code could possibly be explained by the fact that your script only ever adds to the output file. This means that you might have had some error in your code previously (now corrected), but that you still see the output of that run in the output file, since the output file is never removed or emptied by the script.
You could shorten your script into
#!/bin/bash
array=( file.*.txt )
for name in "$array[@]"; do
if [ -n "$prev_name" ] && cmp -s "$prev_name" "$name"
then
printf '%s y %sn' "$prev_name" "$name"
fi
prev_name=$name
done
This uses a globbing pattern to populate the array with the filenames matching the pattern.
It then loops over the names, comparing filenames that occur adjacent to each other in the array using cmp -s
. The cmp
utility will exit with a true exit status if the contents of the two files that it's comparing are identical.
The loop uses $prev_name
to hold the name of the previous file in the array. In the first iteration of the loop, this variable is empty, so the actual comparison of files is skipped.
What you are possibly expected to write is a double loop. Something like
for nameA in "$array[@]"; do
for nameB in "$array[@]"; do
if [ "$nameA" != "$nameB" ] && cmp -s "$nameA" "$nameB"
then
printf '%s y %sn' "$nameA" "$nameB"
fi
done
done
But this would compare A
against B
and B
against A
, and the number of calls to cmp
would grow quadratically with the number of files involved, which would be both resource intensive (on disks; it would read each file as many times as there are filenames in the array) and slow.
A common way to find sets of files is with identical contents is with fdupes
:
$ fdupes --sameline .
./file.1.txt ./file.2.txt ./file.7.txt
Would you want to do something similar without fdupes
, you could do that by computing and comparing a checksum of each file using e.g. md5sum
:
#!/bin/bash
declare -A names count
while read -r cksum name; do
names[$cksum]+=$names[$cksum]:+,$name
count[$cksum]=$(( count[$cksum] + 1 ))
done < <( md5sum file.*.txt )
for cksum in "$!count[@]"; do
if [ "$count[$cksum]" -gt 1 ]; then
printf '%sn' "$names[$cksum]"
fi
done
The first loop reads the output of md5sum
which is is executed across all the relevant files. The output of md5sum
may look something like
897316929176464ebc9ad085f31e7284 file.1.txt
8c9eb686bf3eb5bd83d9373eadf6504b file.10.txt
897316929176464ebc9ad085f31e7284 file.2.txt
26ab0db90d72e28ad0ba1e22ee510510 file.3.txt
84bc3da1b3e33a18e8d5e1bdd7a18d7a file.4.txt
aa6ed9e0f26a6eba784aae8267df1951 file.5.txt
6d7fce9fee471194aa8b5b6e47267f03 file.6.txt
897316929176464ebc9ad085f31e7284 file.7.txt
c30f7472766d25af1dc80b3ffc9a58c7 file.8.txt
9ae0ea9e3c9c6e1b9b6252c8395efdc1 file.9.txt
The checksum in the first column is read into cksum
and filename gets read into name
.
Inside the first loop, we append the name to an entry in an associative array that is indexed by the checksum. The way the assignment to names[$cksum]
is done there makes sure that we add a comma before each new name if needed (which it is if the entry already contains some other names). We then update a count of the number of times that we've seen this particular checksum (this will be used in the second loop).
In the second loop, we go through the checksums ("$!count[@]"
expands to a list of keys (checksums) in the count
associative array), and for each checksum we test whether its count is greater than 1, meaning we've found a duplicate file (if you're looking for groups of exactly three identical files, you may want to use -eq 3
instead of -gt 1
here). If it is, we print the names associated with that checksum.
Testing it:
$ bash script.sh
file.1.txt,file.2.txt,file.7.txt
edited Mar 17 at 20:14
answered Mar 17 at 19:04
Kusalananda♦Kusalananda
142k18265440
142k18265440
add a comment |
add a comment |
Here's a more efficient way of doing what you're trying to do. I use a smaller sample set to keep things clearer:
#!/bin/bash
# clear placeholder
printf "Files with no diff:n" > placeholder
# set up sample data
echo "one" > file.00.txt
echo "one" > file.01.txt
echo "foo" > file.02.txt
echo "bar" > file.03.txt
echo "two" > file.04.txt
echo "two" > file.05.txt
# generate array
i=0
while [ $i -lt 6 ]; do
array+=( file.`printf %02d $i`.txt )
((i++))
done
i=0
while [ $i -lt 5 ]; do
diff --brief $array[i] $array[i+1] &&
echo "$array[i] $array[i+1]" >> placeholder
((i++))
done
Results:
$ sh ./test.sh
Files file.01.txt and file.02.txt differ
Files file.02.txt and file.03.txt differ
Files file.03.txt and file.04.txt differ
$ cat placeholder
Files with no diff:
file.00.txt file.01.txt
file.04.txt file.05.txt
You don't need to generate sample data if you actually already have data.
The code, explained:
Building an array in a loop (in Bash) can be done by iterating the way you obviously already know, but the array+=
notation appends an element.
The ((++))
obviously increments your counter.
Performing the diff, I use the --brief
option. If you read the diff
man page, it tells you that --brief
only prints output when a difference is found. Therefore, the diff command succeeds if no diff is found.
Using the &&
(AND) notation, this code echos the names of the files being compared into your placeholder
file if and only if the diff
command generates no output.
If there is a difference between files, diff
outputs the differences to the terminal. This causes the &&
(AND) to fail, so nothing is output to the placeholder file.
If you have any further questions about the syntax, feel free to ask.
2
"Because Bash interprets silence as success...", noo, not really.&&
andif
check the exit code/exit status of the command.diff
just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g.echo foo && echo "it's true"
orfalse && echo "it's true"
– ilkkachu
Mar 17 at 19:37
Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.
– Klaatu von Schlacker
Mar 22 at 9:53
add a comment |
Here's a more efficient way of doing what you're trying to do. I use a smaller sample set to keep things clearer:
#!/bin/bash
# clear placeholder
printf "Files with no diff:n" > placeholder
# set up sample data
echo "one" > file.00.txt
echo "one" > file.01.txt
echo "foo" > file.02.txt
echo "bar" > file.03.txt
echo "two" > file.04.txt
echo "two" > file.05.txt
# generate array
i=0
while [ $i -lt 6 ]; do
array+=( file.`printf %02d $i`.txt )
((i++))
done
i=0
while [ $i -lt 5 ]; do
diff --brief $array[i] $array[i+1] &&
echo "$array[i] $array[i+1]" >> placeholder
((i++))
done
Results:
$ sh ./test.sh
Files file.01.txt and file.02.txt differ
Files file.02.txt and file.03.txt differ
Files file.03.txt and file.04.txt differ
$ cat placeholder
Files with no diff:
file.00.txt file.01.txt
file.04.txt file.05.txt
You don't need to generate sample data if you actually already have data.
The code, explained:
Building an array in a loop (in Bash) can be done by iterating the way you obviously already know, but the array+=
notation appends an element.
The ((++))
obviously increments your counter.
Performing the diff, I use the --brief
option. If you read the diff
man page, it tells you that --brief
only prints output when a difference is found. Therefore, the diff command succeeds if no diff is found.
Using the &&
(AND) notation, this code echos the names of the files being compared into your placeholder
file if and only if the diff
command generates no output.
If there is a difference between files, diff
outputs the differences to the terminal. This causes the &&
(AND) to fail, so nothing is output to the placeholder file.
If you have any further questions about the syntax, feel free to ask.
2
"Because Bash interprets silence as success...", noo, not really.&&
andif
check the exit code/exit status of the command.diff
just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g.echo foo && echo "it's true"
orfalse && echo "it's true"
– ilkkachu
Mar 17 at 19:37
Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.
– Klaatu von Schlacker
Mar 22 at 9:53
add a comment |
Here's a more efficient way of doing what you're trying to do. I use a smaller sample set to keep things clearer:
#!/bin/bash
# clear placeholder
printf "Files with no diff:n" > placeholder
# set up sample data
echo "one" > file.00.txt
echo "one" > file.01.txt
echo "foo" > file.02.txt
echo "bar" > file.03.txt
echo "two" > file.04.txt
echo "two" > file.05.txt
# generate array
i=0
while [ $i -lt 6 ]; do
array+=( file.`printf %02d $i`.txt )
((i++))
done
i=0
while [ $i -lt 5 ]; do
diff --brief $array[i] $array[i+1] &&
echo "$array[i] $array[i+1]" >> placeholder
((i++))
done
Results:
$ sh ./test.sh
Files file.01.txt and file.02.txt differ
Files file.02.txt and file.03.txt differ
Files file.03.txt and file.04.txt differ
$ cat placeholder
Files with no diff:
file.00.txt file.01.txt
file.04.txt file.05.txt
You don't need to generate sample data if you actually already have data.
The code, explained:
Building an array in a loop (in Bash) can be done by iterating the way you obviously already know, but the array+=
notation appends an element.
The ((++))
obviously increments your counter.
Performing the diff, I use the --brief
option. If you read the diff
man page, it tells you that --brief
only prints output when a difference is found. Therefore, the diff command succeeds if no diff is found.
Using the &&
(AND) notation, this code echos the names of the files being compared into your placeholder
file if and only if the diff
command generates no output.
If there is a difference between files, diff
outputs the differences to the terminal. This causes the &&
(AND) to fail, so nothing is output to the placeholder file.
If you have any further questions about the syntax, feel free to ask.
Here's a more efficient way of doing what you're trying to do. I use a smaller sample set to keep things clearer:
#!/bin/bash
# clear placeholder
printf "Files with no diff:n" > placeholder
# set up sample data
echo "one" > file.00.txt
echo "one" > file.01.txt
echo "foo" > file.02.txt
echo "bar" > file.03.txt
echo "two" > file.04.txt
echo "two" > file.05.txt
# generate array
i=0
while [ $i -lt 6 ]; do
array+=( file.`printf %02d $i`.txt )
((i++))
done
i=0
while [ $i -lt 5 ]; do
diff --brief $array[i] $array[i+1] &&
echo "$array[i] $array[i+1]" >> placeholder
((i++))
done
Results:
$ sh ./test.sh
Files file.01.txt and file.02.txt differ
Files file.02.txt and file.03.txt differ
Files file.03.txt and file.04.txt differ
$ cat placeholder
Files with no diff:
file.00.txt file.01.txt
file.04.txt file.05.txt
You don't need to generate sample data if you actually already have data.
The code, explained:
Building an array in a loop (in Bash) can be done by iterating the way you obviously already know, but the array+=
notation appends an element.
The ((++))
obviously increments your counter.
Performing the diff, I use the --brief
option. If you read the diff
man page, it tells you that --brief
only prints output when a difference is found. Therefore, the diff command succeeds if no diff is found.
Using the &&
(AND) notation, this code echos the names of the files being compared into your placeholder
file if and only if the diff
command generates no output.
If there is a difference between files, diff
outputs the differences to the terminal. This causes the &&
(AND) to fail, so nothing is output to the placeholder file.
If you have any further questions about the syntax, feel free to ask.
edited Mar 22 at 9:54
answered Mar 17 at 18:00
Klaatu von SchlackerKlaatu von Schlacker
2,337710
2,337710
2
"Because Bash interprets silence as success...", noo, not really.&&
andif
check the exit code/exit status of the command.diff
just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g.echo foo && echo "it's true"
orfalse && echo "it's true"
– ilkkachu
Mar 17 at 19:37
Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.
– Klaatu von Schlacker
Mar 22 at 9:53
add a comment |
2
"Because Bash interprets silence as success...", noo, not really.&&
andif
check the exit code/exit status of the command.diff
just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g.echo foo && echo "it's true"
orfalse && echo "it's true"
– ilkkachu
Mar 17 at 19:37
Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.
– Klaatu von Schlacker
Mar 22 at 9:53
2
2
"Because Bash interprets silence as success...", noo, not really.
&&
and if
check the exit code/exit status of the command. diff
just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g. echo foo && echo "it's true"
or false && echo "it's true"
– ilkkachu
Mar 17 at 19:37
"Because Bash interprets silence as success...", noo, not really.
&&
and if
check the exit code/exit status of the command. diff
just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g. echo foo && echo "it's true"
or false && echo "it's true"
– ilkkachu
Mar 17 at 19:37
Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.
– Klaatu von Schlacker
Mar 22 at 9:53
Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.
– Klaatu von Schlacker
Mar 22 at 9:53
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f506847%2fscript-not-correctly-printing-the-correct-elements-of-the-array%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
You do add a comma to the filename in the call to
diff
. It may be worth removing that... Apart from that, is there a reason for creating all those variables and then add them to an array? Note too, that your loop is not terminated by adone
and that you don't incrementi
, and that the length of the array is$#array[@]
, not 27, and that it's easier to loop over the elements of the array ("$array[@]"
) than to use an index (that is, if you need an array at all and can't just loop overfile.*.txt
).– Kusalananda♦
Mar 17 at 17:16
Also you could just use brace expansion:
array=( file.00..24.txt )
to generate the file names. Though note that if you only diff files n and n+1, you won't notice if nonconsecutive files (e.g. 3 and 8) are the same, so the logic in the loop may need some more thought. (You'd need to look over each other file for each file.)– ilkkachu
Mar 17 at 17:37
Ok so I corrected all these things but I am still not getting what I want to . I will edit to clarifi what I getting
– pericothebig
Mar 17 at 17:41
@Kusalananda Ok so I am still new, so I did not know I could just loop over the files. How would I do that without an array? Thnaks to both of you guys
– pericothebig
Mar 17 at 17:46
2
Very bad algorithm anyway because you will do 27*26 comparisons. Try instead: 1) get the MD5 hash of the files (md5sum command), 2) count the instances of each hash (
sort | uniq -c
after extracting only the hashes fro the md5sum output), 3) find the biggest count (addsort -n | tail i-1
to the pipeline), 4) extract the hash, 5) usegrep $hash
on the complete md5sum output to retrieve the names of the files with that hash. If paranoid, 6) usecmp
to check that all three files are the same.– xenoid
Mar 17 at 17:58