Script not correctly printing the correct elements of the array

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

-2

So basically I have to write a bash script that examines some specific files from the working directory (which are named file.00.txt up to file.24.txt). The thing is, 3 of them are exatcly the same and my assignment is to create a script that tells me which 3 are the same.

Here is my code

#!/bin/bash 
f0=file.00.txt
f1=file.01.txt
f2=file.02.txt
f3=file.03.txt
f4=file.04.txt
f5=file.05.txt
f6=file.06.txt
f7=file.07.txt
f8=file.08.txt
f9=file.09.txt
f10=file.10.txt
f11=file.11.txt
f12=file.12.txt
f13=file.13.txt
f14=file.14.txt
f15=file.15.txt
f16=file.16.txt
f17=file.17.txt
f18=file.18.txt
f19=file.19.txt
f20=file.20.txt
f21=file.21.txt
f22=file.22.txt
f23=file.23.txt
f24=file.24.txt

array=($f0 $f1 $f2 $f3 $f4 $f5 $f6 $f7 $f8 $f9 $f10 $f11 $f12 $f13 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24)

i=0
touch placeholder

while [ $i -lt $#array ]
do
 DIFF=$(diff $array[i] $array[i+1])
 if [ "$DIFF" = "" ]
 then
 echo "$array[i] y $array[i+1]" >> placeholder
 fi
i=$((i+1))
done

 cat placeholder

The idea of this code is to compare each file with the following one in the array , then store the ones that are the same in a file called placeholder and finally revealing the contents of the file with the cat command.

However, everytime I run the scrip I am getting the message

file.00.txt y file.00.txt
file.01.txt y file.01.txt
file.02.txt y file.02.txt

and so on for each file. This should not happen, since I am clearly using

echo "$array[i] y $array[i+1]" >> placeholder

to echo both positions. Why is this happening and how can I solve this ?

How can I solve this?

edited Mar 17 at 18:06

asked Mar 17 at 17:13

pericothebig

3

You do add a comma to the filename in the call to diff. It may be worth removing that... Apart from that, is there a reason for creating all those variables and then add them to an array? Note too, that your loop is not terminated by a done and that you don't increment i, and that the length of the array is $#array[@], not 27, and that it's easier to loop over the elements of the array ("$array[@]") than to use an index (that is, if you need an array at all and can't just loop over file.*.txt).

– Kusalananda♦
Mar 17 at 17:16

Also you could just use brace expansion: array=( file.00..24.txt ) to generate the file names. Though note that if you only diff files n and n+1, you won't notice if nonconsecutive files (e.g. 3 and 8) are the same, so the logic in the loop may need some more thought. (You'd need to look over each other file for each file.)

– ilkkachu
Mar 17 at 17:37

Ok so I corrected all these things but I am still not getting what I want to . I will edit to clarifi what I getting

– pericothebig
Mar 17 at 17:41

@Kusalananda Ok so I am still new, so I did not know I could just loop over the files. How would I do that without an array? Thnaks to both of you guys

– pericothebig
Mar 17 at 17:46

2

Very bad algorithm anyway because you will do 27*26 comparisons. Try instead: 1) get the MD5 hash of the files (md5sum command), 2) count the instances of each hash (sort | uniq -c after extracting only the hashes fro the md5sum output), 3) find the biggest count (add sort -n | tail i-1 to the pipeline), 4) extract the hash, 5) use grep $hash on the complete md5sum output to retrieve the names of the files with that hash. If paranoid, 6) use cmp to check that all three files are the same.

– xenoid
Mar 17 at 17:58

|
show 2 more comments

-2

Here is my code

#!/bin/bash 
f0=file.00.txt
f1=file.01.txt
f2=file.02.txt
f3=file.03.txt
f4=file.04.txt
f5=file.05.txt
f6=file.06.txt
f7=file.07.txt
f8=file.08.txt
f9=file.09.txt
f10=file.10.txt
f11=file.11.txt
f12=file.12.txt
f13=file.13.txt
f14=file.14.txt
f15=file.15.txt
f16=file.16.txt
f17=file.17.txt
f18=file.18.txt
f19=file.19.txt
f20=file.20.txt
f21=file.21.txt
f22=file.22.txt
f23=file.23.txt
f24=file.24.txt

array=($f0 $f1 $f2 $f3 $f4 $f5 $f6 $f7 $f8 $f9 $f10 $f11 $f12 $f13 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24)

i=0
touch placeholder

while [ $i -lt $#array ]
do
 DIFF=$(diff $array[i] $array[i+1])
 if [ "$DIFF" = "" ]
 then
 echo "$array[i] y $array[i+1]" >> placeholder
 fi
i=$((i+1))
done

 cat placeholder

However, everytime I run the scrip I am getting the message

file.00.txt y file.00.txt
file.01.txt y file.01.txt
file.02.txt y file.02.txt

and so on for each file. This should not happen, since I am clearly using

echo "$array[i] y $array[i+1]" >> placeholder

to echo both positions. Why is this happening and how can I solve this ?

How can I solve this?

edited Mar 17 at 18:06

asked Mar 17 at 17:13

pericothebig

3

You do add a comma to the filename in the call to diff. It may be worth removing that... Apart from that, is there a reason for creating all those variables and then add them to an array? Note too, that your loop is not terminated by a done and that you don't increment i, and that the length of the array is $#array[@], not 27, and that it's easier to loop over the elements of the array ("$array[@]") than to use an index (that is, if you need an array at all and can't just loop over file.*.txt).

– Kusalananda♦
Mar 17 at 17:16

Also you could just use brace expansion: array=( file.00..24.txt ) to generate the file names. Though note that if you only diff files n and n+1, you won't notice if nonconsecutive files (e.g. 3 and 8) are the same, so the logic in the loop may need some more thought. (You'd need to look over each other file for each file.)

– ilkkachu
Mar 17 at 17:37

Ok so I corrected all these things but I am still not getting what I want to . I will edit to clarifi what I getting

– pericothebig
Mar 17 at 17:41

@Kusalananda Ok so I am still new, so I did not know I could just loop over the files. How would I do that without an array? Thnaks to both of you guys

– pericothebig
Mar 17 at 17:46

2

Very bad algorithm anyway because you will do 27*26 comparisons. Try instead: 1) get the MD5 hash of the files (md5sum command), 2) count the instances of each hash (sort | uniq -c after extracting only the hashes fro the md5sum output), 3) find the biggest count (add sort -n | tail i-1 to the pipeline), 4) extract the hash, 5) use grep $hash on the complete md5sum output to retrieve the names of the files with that hash. If paranoid, 6) use cmp to check that all three files are the same.

– xenoid
Mar 17 at 17:58

|
show 2 more comments

-2

Here is my code

#!/bin/bash 
f0=file.00.txt
f1=file.01.txt
f2=file.02.txt
f3=file.03.txt
f4=file.04.txt
f5=file.05.txt
f6=file.06.txt
f7=file.07.txt
f8=file.08.txt
f9=file.09.txt
f10=file.10.txt
f11=file.11.txt
f12=file.12.txt
f13=file.13.txt
f14=file.14.txt
f15=file.15.txt
f16=file.16.txt
f17=file.17.txt
f18=file.18.txt
f19=file.19.txt
f20=file.20.txt
f21=file.21.txt
f22=file.22.txt
f23=file.23.txt
f24=file.24.txt

array=($f0 $f1 $f2 $f3 $f4 $f5 $f6 $f7 $f8 $f9 $f10 $f11 $f12 $f13 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24)

i=0
touch placeholder

while [ $i -lt $#array ]
do
 DIFF=$(diff $array[i] $array[i+1])
 if [ "$DIFF" = "" ]
 then
 echo "$array[i] y $array[i+1]" >> placeholder
 fi
i=$((i+1))
done

 cat placeholder

However, everytime I run the scrip I am getting the message

file.00.txt y file.00.txt
file.01.txt y file.01.txt
file.02.txt y file.02.txt

and so on for each file. This should not happen, since I am clearly using

echo "$array[i] y $array[i+1]" >> placeholder

to echo both positions. Why is this happening and how can I solve this ?

How can I solve this?

edited Mar 17 at 18:06

asked Mar 17 at 17:13

pericothebig

Here is my code

#!/bin/bash 
f0=file.00.txt
f1=file.01.txt
f2=file.02.txt
f3=file.03.txt
f4=file.04.txt
f5=file.05.txt
f6=file.06.txt
f7=file.07.txt
f8=file.08.txt
f9=file.09.txt
f10=file.10.txt
f11=file.11.txt
f12=file.12.txt
f13=file.13.txt
f14=file.14.txt
f15=file.15.txt
f16=file.16.txt
f17=file.17.txt
f18=file.18.txt
f19=file.19.txt
f20=file.20.txt
f21=file.21.txt
f22=file.22.txt
f23=file.23.txt
f24=file.24.txt

array=($f0 $f1 $f2 $f3 $f4 $f5 $f6 $f7 $f8 $f9 $f10 $f11 $f12 $f13 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24)

i=0
touch placeholder

while [ $i -lt $#array ]
do
 DIFF=$(diff $array[i] $array[i+1])
 if [ "$DIFF" = "" ]
 then
 echo "$array[i] y $array[i+1]" >> placeholder
 fi
i=$((i+1))
done

 cat placeholder

However, everytime I run the scrip I am getting the message

file.00.txt y file.00.txt
file.01.txt y file.01.txt
file.02.txt y file.02.txt

and so on for each file. This should not happen, since I am clearly using

echo "$array[i] y $array[i+1]" >> placeholder

to echo both positions. Why is this happening and how can I solve this ?

How can I solve this?

bash shell-script diff

edited Mar 17 at 18:06

asked Mar 17 at 17:13

pericothebig

edited Mar 17 at 18:06

asked Mar 17 at 17:13

pericothebig

edited Mar 17 at 18:06

asked Mar 17 at 17:13

pericothebig

asked Mar 17 at 17:13

pericothebig

asked Mar 17 at 17:13

pericothebig

3

You do add a comma to the filename in the call to diff. It may be worth removing that... Apart from that, is there a reason for creating all those variables and then add them to an array? Note too, that your loop is not terminated by a done and that you don't increment i, and that the length of the array is $#array[@], not 27, and that it's easier to loop over the elements of the array ("$array[@]") than to use an index (that is, if you need an array at all and can't just loop over file.*.txt).

– Kusalananda♦
Mar 17 at 17:16

Also you could just use brace expansion: array=( file.00..24.txt ) to generate the file names. Though note that if you only diff files n and n+1, you won't notice if nonconsecutive files (e.g. 3 and 8) are the same, so the logic in the loop may need some more thought. (You'd need to look over each other file for each file.)

– ilkkachu
Mar 17 at 17:37

Ok so I corrected all these things but I am still not getting what I want to . I will edit to clarifi what I getting

– pericothebig
Mar 17 at 17:41

@Kusalananda Ok so I am still new, so I did not know I could just loop over the files. How would I do that without an array? Thnaks to both of you guys

– pericothebig
Mar 17 at 17:46

2

Very bad algorithm anyway because you will do 27*26 comparisons. Try instead: 1) get the MD5 hash of the files (md5sum command), 2) count the instances of each hash (sort | uniq -c after extracting only the hashes fro the md5sum output), 3) find the biggest count (add sort -n | tail i-1 to the pipeline), 4) extract the hash, 5) use grep $hash on the complete md5sum output to retrieve the names of the files with that hash. If paranoid, 6) use cmp to check that all three files are the same.

– xenoid
Mar 17 at 17:58

|
show 2 more comments

3

You do add a comma to the filename in the call to diff. It may be worth removing that... Apart from that, is there a reason for creating all those variables and then add them to an array? Note too, that your loop is not terminated by a done and that you don't increment i, and that the length of the array is $#array[@], not 27, and that it's easier to loop over the elements of the array ("$array[@]") than to use an index (that is, if you need an array at all and can't just loop over file.*.txt).

– Kusalananda♦
Mar 17 at 17:16

Also you could just use brace expansion: array=( file.00..24.txt ) to generate the file names. Though note that if you only diff files n and n+1, you won't notice if nonconsecutive files (e.g. 3 and 8) are the same, so the logic in the loop may need some more thought. (You'd need to look over each other file for each file.)

– ilkkachu
Mar 17 at 17:37

Ok so I corrected all these things but I am still not getting what I want to . I will edit to clarifi what I getting

– pericothebig
Mar 17 at 17:41

@Kusalananda Ok so I am still new, so I did not know I could just loop over the files. How would I do that without an array? Thnaks to both of you guys

– pericothebig
Mar 17 at 17:46

2

Very bad algorithm anyway because you will do 27*26 comparisons. Try instead: 1) get the MD5 hash of the files (md5sum command), 2) count the instances of each hash (sort | uniq -c after extracting only the hashes fro the md5sum output), 3) find the biggest count (add sort -n | tail i-1 to the pipeline), 4) extract the hash, 5) use grep $hash on the complete md5sum output to retrieve the names of the files with that hash. If paranoid, 6) use cmp to check that all three files are the same.

– xenoid
Mar 17 at 17:58

You do add a comma to the filename in the call to diff. It may be worth removing that... Apart from that, is there a reason for creating all those variables and then add them to an array? Note too, that your loop is not terminated by a done and that you don't increment i, and that the length of the array is $#array[@], not 27, and that it's easier to loop over the elements of the array ("$array[@]") than to use an index (that is, if you need an array at all and can't just loop over file.*.txt).

– Kusalananda♦
Mar 17 at 17:16

Also you could just use brace expansion: array=( file.00..24.txt ) to generate the file names. Though note that if you only diff files n and n+1, you won't notice if nonconsecutive files (e.g. 3 and 8) are the same, so the logic in the loop may need some more thought. (You'd need to look over each other file for each file.)

– ilkkachu
Mar 17 at 17:37

Ok so I corrected all these things but I am still not getting what I want to . I will edit to clarifi what I getting

– pericothebig
Mar 17 at 17:41

@Kusalananda Ok so I am still new, so I did not know I could just loop over the files. How would I do that without an array? Thnaks to both of you guys

– pericothebig
Mar 17 at 17:46

Very bad algorithm anyway because you will do 27*26 comparisons. Try instead: 1) get the MD5 hash of the files (md5sum command), 2) count the instances of each hash (sort | uniq -c after extracting only the hashes fro the md5sum output), 3) find the biggest count (add sort -n | tail i-1 to the pipeline), 4) extract the hash, 5) use grep $hash on the complete md5sum output to retrieve the names of the files with that hash. If paranoid, 6) use cmp to check that all three files are the same.

– xenoid
Mar 17 at 17:58

|
show 2 more comments

2 Answers
2

active

oldest

votes

The seemingly strange output of your code could possibly be explained by the fact that your script only ever adds to the output file. This means that you might have had some error in your code previously (now corrected), but that you still see the output of that run in the output file, since the output file is never removed or emptied by the script.

You could shorten your script into

#!/bin/bash

array=( file.*.txt )

for name in "$array[@]"; do
 if [ -n "$prev_name" ] && cmp -s "$prev_name" "$name"
 then
 printf '%s y %sn' "$prev_name" "$name"
 fi

 prev_name=$name
done

This uses a globbing pattern to populate the array with the filenames matching the pattern.

It then loops over the names, comparing filenames that occur adjacent to each other in the array using cmp -s. The cmp utility will exit with a true exit status if the contents of the two files that it's comparing are identical.

The loop uses $prev_name to hold the name of the previous file in the array. In the first iteration of the loop, this variable is empty, so the actual comparison of files is skipped.

What you are possibly expected to write is a double loop. Something like

for nameA in "$array[@]"; do
 for nameB in "$array[@]"; do
 if [ "$nameA" != "$nameB" ] && cmp -s "$nameA" "$nameB"
 then
 printf '%s y %sn' "$nameA" "$nameB"
 fi
 done
done

But this would compare A against B and B against A, and the number of calls to cmp would grow quadratically with the number of files involved, which would be both resource intensive (on disks; it would read each file as many times as there are filenames in the array) and slow.

A common way to find sets of files is with identical contents is with fdupes:

$ fdupes --sameline .
./file.1.txt ./file.2.txt ./file.7.txt

Would you want to do something similar without fdupes, you could do that by computing and comparing a checksum of each file using e.g. md5sum:

#!/bin/bash

declare -A names count

while read -r cksum name; do
 names[$cksum]+=$names[$cksum]:+,$name
 count[$cksum]=$(( count[$cksum] + 1 ))
done < <( md5sum file.*.txt )

for cksum in "$!count[@]"; do
 if [ "$count[$cksum]" -gt 1 ]; then
 printf '%sn' "$names[$cksum]"
 fi
done

The first loop reads the output of md5sum which is is executed across all the relevant files. The output of md5sum may look something like

897316929176464ebc9ad085f31e7284 file.1.txt
8c9eb686bf3eb5bd83d9373eadf6504b file.10.txt
897316929176464ebc9ad085f31e7284 file.2.txt
26ab0db90d72e28ad0ba1e22ee510510 file.3.txt
84bc3da1b3e33a18e8d5e1bdd7a18d7a file.4.txt
aa6ed9e0f26a6eba784aae8267df1951 file.5.txt
6d7fce9fee471194aa8b5b6e47267f03 file.6.txt
897316929176464ebc9ad085f31e7284 file.7.txt
c30f7472766d25af1dc80b3ffc9a58c7 file.8.txt
9ae0ea9e3c9c6e1b9b6252c8395efdc1 file.9.txt

The checksum in the first column is read into cksum and filename gets read into name.

Inside the first loop, we append the name to an entry in an associative array that is indexed by the checksum. The way the assignment to names[$cksum] is done there makes sure that we add a comma before each new name if needed (which it is if the entry already contains some other names). We then update a count of the number of times that we've seen this particular checksum (this will be used in the second loop).

In the second loop, we go through the checksums ("$!count[@]" expands to a list of keys (checksums) in the count associative array), and for each checksum we test whether its count is greater than 1, meaning we've found a duplicate file (if you're looking for groups of exactly three identical files, you may want to use -eq 3 instead of -gt 1 here). If it is, we print the names associated with that checksum.

Testing it:

$ bash script.sh
file.1.txt,file.2.txt,file.7.txt

edited Mar 17 at 20:14

answered Mar 17 at 19:04

Kusalananda♦

142k18265440

add a comment |

Here's a more efficient way of doing what you're trying to do. I use a smaller sample set to keep things clearer:

#!/bin/bash

# clear placeholder
printf "Files with no diff:n" > placeholder

# set up sample data
echo "one" > file.00.txt
echo "one" > file.01.txt
echo "foo" > file.02.txt
echo "bar" > file.03.txt
echo "two" > file.04.txt
echo "two" > file.05.txt 

# generate array
i=0
while [ $i -lt 6 ]; do 
 array+=( file.`printf %02d $i`.txt )
 ((i++))
done

i=0
while [ $i -lt 5 ]; do
 diff --brief $array[i] $array[i+1] && 
 echo "$array[i] $array[i+1]" >> placeholder 
 ((i++))
done

Results:

$ sh ./test.sh 
Files file.01.txt and file.02.txt differ
Files file.02.txt and file.03.txt differ
Files file.03.txt and file.04.txt differ
$ cat placeholder 
Files with no diff:
file.00.txt file.01.txt
file.04.txt file.05.txt

You don't need to generate sample data if you actually already have data.

The code, explained:

Building an array in a loop (in Bash) can be done by iterating the way you obviously already know, but the array+= notation appends an element.

The ((++)) obviously increments your counter.

Performing the diff, I use the --brief option. If you read the diff man page, it tells you that --brief only prints output when a difference is found. Therefore, the diff command succeeds if no diff is found.

Using the && (AND) notation, this code echos the names of the files being compared into your placeholder file if and only if the diff command generates no output.

If there is a difference between files, diff outputs the differences to the terminal. This causes the && (AND) to fail, so nothing is output to the placeholder file.

If you have any further questions about the syntax, feel free to ask.

edited Mar 22 at 9:54

answered Mar 17 at 18:00

Klaatu von Schlacker

2,337710

2

"Because Bash interprets silence as success...", noo, not really. && and if check the exit code/exit status of the command. diff just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g. echo foo && echo "it's true" or false && echo "it's true"

– ilkkachu
Mar 17 at 19:37

Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.

– Klaatu von Schlacker
Mar 22 at 9:53

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f506847%2fscript-not-correctly-printing-the-correct-elements-of-the-array%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You could shorten your script into

#!/bin/bash

array=( file.*.txt )

for name in "$array[@]"; do
 if [ -n "$prev_name" ] && cmp -s "$prev_name" "$name"
 then
 printf '%s y %sn' "$prev_name" "$name"
 fi

 prev_name=$name
done

This uses a globbing pattern to populate the array with the filenames matching the pattern.

The loop uses $prev_name to hold the name of the previous file in the array. In the first iteration of the loop, this variable is empty, so the actual comparison of files is skipped.

What you are possibly expected to write is a double loop. Something like

for nameA in "$array[@]"; do
 for nameB in "$array[@]"; do
 if [ "$nameA" != "$nameB" ] && cmp -s "$nameA" "$nameB"
 then
 printf '%s y %sn' "$nameA" "$nameB"
 fi
 done
done

A common way to find sets of files is with identical contents is with fdupes:

$ fdupes --sameline .
./file.1.txt ./file.2.txt ./file.7.txt

Would you want to do something similar without fdupes, you could do that by computing and comparing a checksum of each file using e.g. md5sum:

#!/bin/bash

declare -A names count

while read -r cksum name; do
 names[$cksum]+=$names[$cksum]:+,$name
 count[$cksum]=$(( count[$cksum] + 1 ))
done < <( md5sum file.*.txt )

for cksum in "$!count[@]"; do
 if [ "$count[$cksum]" -gt 1 ]; then
 printf '%sn' "$names[$cksum]"
 fi
done

The first loop reads the output of md5sum which is is executed across all the relevant files. The output of md5sum may look something like

897316929176464ebc9ad085f31e7284 file.1.txt
8c9eb686bf3eb5bd83d9373eadf6504b file.10.txt
897316929176464ebc9ad085f31e7284 file.2.txt
26ab0db90d72e28ad0ba1e22ee510510 file.3.txt
84bc3da1b3e33a18e8d5e1bdd7a18d7a file.4.txt
aa6ed9e0f26a6eba784aae8267df1951 file.5.txt
6d7fce9fee471194aa8b5b6e47267f03 file.6.txt
897316929176464ebc9ad085f31e7284 file.7.txt
c30f7472766d25af1dc80b3ffc9a58c7 file.8.txt
9ae0ea9e3c9c6e1b9b6252c8395efdc1 file.9.txt

The checksum in the first column is read into cksum and filename gets read into name.

Testing it:

$ bash script.sh
file.1.txt,file.2.txt,file.7.txt

edited Mar 17 at 20:14

answered Mar 17 at 19:04

Kusalananda♦

142k18265440

add a comment |

You could shorten your script into

#!/bin/bash

array=( file.*.txt )

for name in "$array[@]"; do
 if [ -n "$prev_name" ] && cmp -s "$prev_name" "$name"
 then
 printf '%s y %sn' "$prev_name" "$name"
 fi

 prev_name=$name
done

This uses a globbing pattern to populate the array with the filenames matching the pattern.

The loop uses $prev_name to hold the name of the previous file in the array. In the first iteration of the loop, this variable is empty, so the actual comparison of files is skipped.

What you are possibly expected to write is a double loop. Something like

for nameA in "$array[@]"; do
 for nameB in "$array[@]"; do
 if [ "$nameA" != "$nameB" ] && cmp -s "$nameA" "$nameB"
 then
 printf '%s y %sn' "$nameA" "$nameB"
 fi
 done
done

A common way to find sets of files is with identical contents is with fdupes:

$ fdupes --sameline .
./file.1.txt ./file.2.txt ./file.7.txt

Would you want to do something similar without fdupes, you could do that by computing and comparing a checksum of each file using e.g. md5sum:

#!/bin/bash

declare -A names count

while read -r cksum name; do
 names[$cksum]+=$names[$cksum]:+,$name
 count[$cksum]=$(( count[$cksum] + 1 ))
done < <( md5sum file.*.txt )

for cksum in "$!count[@]"; do
 if [ "$count[$cksum]" -gt 1 ]; then
 printf '%sn' "$names[$cksum]"
 fi
done

The first loop reads the output of md5sum which is is executed across all the relevant files. The output of md5sum may look something like

897316929176464ebc9ad085f31e7284 file.1.txt
8c9eb686bf3eb5bd83d9373eadf6504b file.10.txt
897316929176464ebc9ad085f31e7284 file.2.txt
26ab0db90d72e28ad0ba1e22ee510510 file.3.txt
84bc3da1b3e33a18e8d5e1bdd7a18d7a file.4.txt
aa6ed9e0f26a6eba784aae8267df1951 file.5.txt
6d7fce9fee471194aa8b5b6e47267f03 file.6.txt
897316929176464ebc9ad085f31e7284 file.7.txt
c30f7472766d25af1dc80b3ffc9a58c7 file.8.txt
9ae0ea9e3c9c6e1b9b6252c8395efdc1 file.9.txt

The checksum in the first column is read into cksum and filename gets read into name.

Testing it:

$ bash script.sh
file.1.txt,file.2.txt,file.7.txt

edited Mar 17 at 20:14

answered Mar 17 at 19:04

Kusalananda♦

142k18265440

add a comment |

You could shorten your script into

#!/bin/bash

array=( file.*.txt )

for name in "$array[@]"; do
 if [ -n "$prev_name" ] && cmp -s "$prev_name" "$name"
 then
 printf '%s y %sn' "$prev_name" "$name"
 fi

 prev_name=$name
done

This uses a globbing pattern to populate the array with the filenames matching the pattern.

The loop uses $prev_name to hold the name of the previous file in the array. In the first iteration of the loop, this variable is empty, so the actual comparison of files is skipped.

What you are possibly expected to write is a double loop. Something like

for nameA in "$array[@]"; do
 for nameB in "$array[@]"; do
 if [ "$nameA" != "$nameB" ] && cmp -s "$nameA" "$nameB"
 then
 printf '%s y %sn' "$nameA" "$nameB"
 fi
 done
done

A common way to find sets of files is with identical contents is with fdupes:

$ fdupes --sameline .
./file.1.txt ./file.2.txt ./file.7.txt

Would you want to do something similar without fdupes, you could do that by computing and comparing a checksum of each file using e.g. md5sum:

#!/bin/bash

declare -A names count

while read -r cksum name; do
 names[$cksum]+=$names[$cksum]:+,$name
 count[$cksum]=$(( count[$cksum] + 1 ))
done < <( md5sum file.*.txt )

for cksum in "$!count[@]"; do
 if [ "$count[$cksum]" -gt 1 ]; then
 printf '%sn' "$names[$cksum]"
 fi
done

The first loop reads the output of md5sum which is is executed across all the relevant files. The output of md5sum may look something like

897316929176464ebc9ad085f31e7284 file.1.txt
8c9eb686bf3eb5bd83d9373eadf6504b file.10.txt
897316929176464ebc9ad085f31e7284 file.2.txt
26ab0db90d72e28ad0ba1e22ee510510 file.3.txt
84bc3da1b3e33a18e8d5e1bdd7a18d7a file.4.txt
aa6ed9e0f26a6eba784aae8267df1951 file.5.txt
6d7fce9fee471194aa8b5b6e47267f03 file.6.txt
897316929176464ebc9ad085f31e7284 file.7.txt
c30f7472766d25af1dc80b3ffc9a58c7 file.8.txt
9ae0ea9e3c9c6e1b9b6252c8395efdc1 file.9.txt

The checksum in the first column is read into cksum and filename gets read into name.

Testing it:

$ bash script.sh
file.1.txt,file.2.txt,file.7.txt

edited Mar 17 at 20:14

answered Mar 17 at 19:04

Kusalananda♦

142k18265440

You could shorten your script into

#!/bin/bash

array=( file.*.txt )

for name in "$array[@]"; do
 if [ -n "$prev_name" ] && cmp -s "$prev_name" "$name"
 then
 printf '%s y %sn' "$prev_name" "$name"
 fi

 prev_name=$name
done

This uses a globbing pattern to populate the array with the filenames matching the pattern.

The loop uses $prev_name to hold the name of the previous file in the array. In the first iteration of the loop, this variable is empty, so the actual comparison of files is skipped.

What you are possibly expected to write is a double loop. Something like

for nameA in "$array[@]"; do
 for nameB in "$array[@]"; do
 if [ "$nameA" != "$nameB" ] && cmp -s "$nameA" "$nameB"
 then
 printf '%s y %sn' "$nameA" "$nameB"
 fi
 done
done

A common way to find sets of files is with identical contents is with fdupes:

$ fdupes --sameline .
./file.1.txt ./file.2.txt ./file.7.txt

Would you want to do something similar without fdupes, you could do that by computing and comparing a checksum of each file using e.g. md5sum:

#!/bin/bash

declare -A names count

while read -r cksum name; do
 names[$cksum]+=$names[$cksum]:+,$name
 count[$cksum]=$(( count[$cksum] + 1 ))
done < <( md5sum file.*.txt )

for cksum in "$!count[@]"; do
 if [ "$count[$cksum]" -gt 1 ]; then
 printf '%sn' "$names[$cksum]"
 fi
done

The first loop reads the output of md5sum which is is executed across all the relevant files. The output of md5sum may look something like

897316929176464ebc9ad085f31e7284 file.1.txt
8c9eb686bf3eb5bd83d9373eadf6504b file.10.txt
897316929176464ebc9ad085f31e7284 file.2.txt
26ab0db90d72e28ad0ba1e22ee510510 file.3.txt
84bc3da1b3e33a18e8d5e1bdd7a18d7a file.4.txt
aa6ed9e0f26a6eba784aae8267df1951 file.5.txt
6d7fce9fee471194aa8b5b6e47267f03 file.6.txt
897316929176464ebc9ad085f31e7284 file.7.txt
c30f7472766d25af1dc80b3ffc9a58c7 file.8.txt
9ae0ea9e3c9c6e1b9b6252c8395efdc1 file.9.txt

The checksum in the first column is read into cksum and filename gets read into name.

Testing it:

$ bash script.sh
file.1.txt,file.2.txt,file.7.txt

edited Mar 17 at 20:14

answered Mar 17 at 19:04

Kusalananda♦

142k18265440

edited Mar 17 at 20:14

answered Mar 17 at 19:04

Kusalananda♦

142k18265440

answered Mar 17 at 19:04

Kusalananda♦

142k18265440

answered Mar 17 at 19:04

Kusalananda♦

142k18265440

add a comment |

Here's a more efficient way of doing what you're trying to do. I use a smaller sample set to keep things clearer:

#!/bin/bash

# clear placeholder
printf "Files with no diff:n" > placeholder

# set up sample data
echo "one" > file.00.txt
echo "one" > file.01.txt
echo "foo" > file.02.txt
echo "bar" > file.03.txt
echo "two" > file.04.txt
echo "two" > file.05.txt 

# generate array
i=0
while [ $i -lt 6 ]; do 
 array+=( file.`printf %02d $i`.txt )
 ((i++))
done

i=0
while [ $i -lt 5 ]; do
 diff --brief $array[i] $array[i+1] && 
 echo "$array[i] $array[i+1]" >> placeholder 
 ((i++))
done

Results:

$ sh ./test.sh 
Files file.01.txt and file.02.txt differ
Files file.02.txt and file.03.txt differ
Files file.03.txt and file.04.txt differ
$ cat placeholder 
Files with no diff:
file.00.txt file.01.txt
file.04.txt file.05.txt

You don't need to generate sample data if you actually already have data.

The code, explained:

Building an array in a loop (in Bash) can be done by iterating the way you obviously already know, but the array+= notation appends an element.

The ((++)) obviously increments your counter.

Using the && (AND) notation, this code echos the names of the files being compared into your placeholder file if and only if the diff command generates no output.

If there is a difference between files, diff outputs the differences to the terminal. This causes the && (AND) to fail, so nothing is output to the placeholder file.

If you have any further questions about the syntax, feel free to ask.

edited Mar 22 at 9:54

answered Mar 17 at 18:00

Klaatu von Schlacker

2,337710

2

"Because Bash interprets silence as success...", noo, not really. && and if check the exit code/exit status of the command. diff just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g. echo foo && echo "it's true" or false && echo "it's true"

– ilkkachu
Mar 17 at 19:37

Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.

– Klaatu von Schlacker
Mar 22 at 9:53

add a comment |

Here's a more efficient way of doing what you're trying to do. I use a smaller sample set to keep things clearer:

#!/bin/bash

# clear placeholder
printf "Files with no diff:n" > placeholder

# set up sample data
echo "one" > file.00.txt
echo "one" > file.01.txt
echo "foo" > file.02.txt
echo "bar" > file.03.txt
echo "two" > file.04.txt
echo "two" > file.05.txt 

# generate array
i=0
while [ $i -lt 6 ]; do 
 array+=( file.`printf %02d $i`.txt )
 ((i++))
done

i=0
while [ $i -lt 5 ]; do
 diff --brief $array[i] $array[i+1] && 
 echo "$array[i] $array[i+1]" >> placeholder 
 ((i++))
done

Results:

$ sh ./test.sh 
Files file.01.txt and file.02.txt differ
Files file.02.txt and file.03.txt differ
Files file.03.txt and file.04.txt differ
$ cat placeholder 
Files with no diff:
file.00.txt file.01.txt
file.04.txt file.05.txt

You don't need to generate sample data if you actually already have data.

The code, explained:

Building an array in a loop (in Bash) can be done by iterating the way you obviously already know, but the array+= notation appends an element.

The ((++)) obviously increments your counter.

Using the && (AND) notation, this code echos the names of the files being compared into your placeholder file if and only if the diff command generates no output.

If there is a difference between files, diff outputs the differences to the terminal. This causes the && (AND) to fail, so nothing is output to the placeholder file.

If you have any further questions about the syntax, feel free to ask.

edited Mar 22 at 9:54

answered Mar 17 at 18:00

Klaatu von Schlacker

2,337710

2

"Because Bash interprets silence as success...", noo, not really. && and if check the exit code/exit status of the command. diff just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g. echo foo && echo "it's true" or false && echo "it's true"

– ilkkachu
Mar 17 at 19:37

Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.

– Klaatu von Schlacker
Mar 22 at 9:53

add a comment |

Here's a more efficient way of doing what you're trying to do. I use a smaller sample set to keep things clearer:

#!/bin/bash

# clear placeholder
printf "Files with no diff:n" > placeholder

# set up sample data
echo "one" > file.00.txt
echo "one" > file.01.txt
echo "foo" > file.02.txt
echo "bar" > file.03.txt
echo "two" > file.04.txt
echo "two" > file.05.txt 

# generate array
i=0
while [ $i -lt 6 ]; do 
 array+=( file.`printf %02d $i`.txt )
 ((i++))
done

i=0
while [ $i -lt 5 ]; do
 diff --brief $array[i] $array[i+1] && 
 echo "$array[i] $array[i+1]" >> placeholder 
 ((i++))
done

Results:

$ sh ./test.sh 
Files file.01.txt and file.02.txt differ
Files file.02.txt and file.03.txt differ
Files file.03.txt and file.04.txt differ
$ cat placeholder 
Files with no diff:
file.00.txt file.01.txt
file.04.txt file.05.txt

You don't need to generate sample data if you actually already have data.

The code, explained:

Building an array in a loop (in Bash) can be done by iterating the way you obviously already know, but the array+= notation appends an element.

The ((++)) obviously increments your counter.

Using the && (AND) notation, this code echos the names of the files being compared into your placeholder file if and only if the diff command generates no output.

If there is a difference between files, diff outputs the differences to the terminal. This causes the && (AND) to fail, so nothing is output to the placeholder file.

If you have any further questions about the syntax, feel free to ask.

edited Mar 22 at 9:54

answered Mar 17 at 18:00

Klaatu von Schlacker

2,337710

Here's a more efficient way of doing what you're trying to do. I use a smaller sample set to keep things clearer:

#!/bin/bash

# clear placeholder
printf "Files with no diff:n" > placeholder

# set up sample data
echo "one" > file.00.txt
echo "one" > file.01.txt
echo "foo" > file.02.txt
echo "bar" > file.03.txt
echo "two" > file.04.txt
echo "two" > file.05.txt 

# generate array
i=0
while [ $i -lt 6 ]; do 
 array+=( file.`printf %02d $i`.txt )
 ((i++))
done

i=0
while [ $i -lt 5 ]; do
 diff --brief $array[i] $array[i+1] && 
 echo "$array[i] $array[i+1]" >> placeholder 
 ((i++))
done

Results:

$ sh ./test.sh 
Files file.01.txt and file.02.txt differ
Files file.02.txt and file.03.txt differ
Files file.03.txt and file.04.txt differ
$ cat placeholder 
Files with no diff:
file.00.txt file.01.txt
file.04.txt file.05.txt

You don't need to generate sample data if you actually already have data.

The code, explained:

Building an array in a loop (in Bash) can be done by iterating the way you obviously already know, but the array+= notation appends an element.

The ((++)) obviously increments your counter.

Using the && (AND) notation, this code echos the names of the files being compared into your placeholder file if and only if the diff command generates no output.

If there is a difference between files, diff outputs the differences to the terminal. This causes the && (AND) to fail, so nothing is output to the placeholder file.

If you have any further questions about the syntax, feel free to ask.

edited Mar 22 at 9:54

answered Mar 17 at 18:00

Klaatu von Schlacker

2,337710

edited Mar 22 at 9:54

answered Mar 17 at 18:00

Klaatu von Schlacker

2,337710

answered Mar 17 at 18:00

Klaatu von Schlacker

2,337710

answered Mar 17 at 18:00

Klaatu von Schlacker

2,337710

2

"Because Bash interprets silence as success...", noo, not really. && and if check the exit code/exit status of the command. diff just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g. echo foo && echo "it's true" or false && echo "it's true"

– ilkkachu
Mar 17 at 19:37

Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.

– Klaatu von Schlacker
Mar 22 at 9:53

add a comment |

2

"Because Bash interprets silence as success...", noo, not really. && and if check the exit code/exit status of the command. diff just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g. echo foo && echo "it's true" or false && echo "it's true"

– ilkkachu
Mar 17 at 19:37

Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.

– Klaatu von Schlacker
Mar 22 at 9:53

"Because Bash interprets silence as success...", noo, not really. && and if check the exit code/exit status of the command. diff just happens to return a truthy exit code (i.e. zero) if the files are identical, and a falsy exit code (a one) if they differ. But you can have commands that produce output and return a truthy value, or the other way around, e.g. echo foo && echo "it's true" or false && echo "it's true"

– ilkkachu
Mar 17 at 19:37

Yes, you're right, my comment was unintentionally broad there. I'll edit for accuracy. Thanks.

– Klaatu von Schlacker
Mar 22 at 9:53

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu