Bash script only functioning on certain inputs

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












2















I have a bash script that I've been working on for a while. Basically, it searches through text to find repetitions of multiple lines. Here is what I have so far:



#!/bin/bash

count() pcregrep -Mc "^Q$(echo "$pattern")E$"


file=$1
fileprep=$(grep -v '=' $file | grep -v '!' | grep -v '*' | grep -o '[[:digit:]]*' | grep . )
linecount=$(echo "$fileprep" | wc -l)
len=10
start=1
end=$(( $linecount - $len + 1 ))



for i in $(seq $start $end); do
test="$testn$(count "$fileprep" $i $((i+len-1)))"
done

a=$(printf $test | grep -v 'b1b' )

mostrepetitions=$(echo "$a" | sort -rn | head -n1)

for i in $(seq 1 $mostrepetitions); do
var1=$(printf "$a" | grep 'b'$i'b' | wc -l)
var2="$var2n$(echo $(( var1 / i )))"
done

printf "$var2" | tr 'n' '+' | awk 'print "0"$0' | bc -l


I have found that this works correctly on a simple file that has the numbers 1-10 repeated twice (like so):



1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10


On this, it will correctly output 1 (with the len variable at 10). When the len variable is changed to 9, it will correctly output two, because both 1-9 and 2-10 are 9 line patterns that occur at least twice.



However, when I run this on my target files (an example of which can be found here), I get impossible results.



In this script, the amount of nine-line patterns found will always have to be at least double the amount of ten line patterns. Take the above example of 1-10. In that, 1-10 is the only ten line pattern. However, within it are both 1-9 and 2-10, both of which are repeated twice. When I run my script though, for ten-line repeated patterns, I get an output of 2, and for nine-line patterns I also get an output of 2. This is clearly incorrect. Why is this happening?



Note - the fileprep variable was created to create a list of numbers from the input file (see the sample file I linked).










share|improve this question






















  • Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.

    – nohillside
    Jan 27 at 16:33











  • What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?

    – Kusalananda
    Jan 27 at 17:11















2















I have a bash script that I've been working on for a while. Basically, it searches through text to find repetitions of multiple lines. Here is what I have so far:



#!/bin/bash

count() pcregrep -Mc "^Q$(echo "$pattern")E$"


file=$1
fileprep=$(grep -v '=' $file | grep -v '!' | grep -v '*' | grep -o '[[:digit:]]*' | grep . )
linecount=$(echo "$fileprep" | wc -l)
len=10
start=1
end=$(( $linecount - $len + 1 ))



for i in $(seq $start $end); do
test="$testn$(count "$fileprep" $i $((i+len-1)))"
done

a=$(printf $test | grep -v 'b1b' )

mostrepetitions=$(echo "$a" | sort -rn | head -n1)

for i in $(seq 1 $mostrepetitions); do
var1=$(printf "$a" | grep 'b'$i'b' | wc -l)
var2="$var2n$(echo $(( var1 / i )))"
done

printf "$var2" | tr 'n' '+' | awk 'print "0"$0' | bc -l


I have found that this works correctly on a simple file that has the numbers 1-10 repeated twice (like so):



1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10


On this, it will correctly output 1 (with the len variable at 10). When the len variable is changed to 9, it will correctly output two, because both 1-9 and 2-10 are 9 line patterns that occur at least twice.



However, when I run this on my target files (an example of which can be found here), I get impossible results.



In this script, the amount of nine-line patterns found will always have to be at least double the amount of ten line patterns. Take the above example of 1-10. In that, 1-10 is the only ten line pattern. However, within it are both 1-9 and 2-10, both of which are repeated twice. When I run my script though, for ten-line repeated patterns, I get an output of 2, and for nine-line patterns I also get an output of 2. This is clearly incorrect. Why is this happening?



Note - the fileprep variable was created to create a list of numbers from the input file (see the sample file I linked).










share|improve this question






















  • Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.

    – nohillside
    Jan 27 at 16:33











  • What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?

    – Kusalananda
    Jan 27 at 17:11













2












2








2


1






I have a bash script that I've been working on for a while. Basically, it searches through text to find repetitions of multiple lines. Here is what I have so far:



#!/bin/bash

count() pcregrep -Mc "^Q$(echo "$pattern")E$"


file=$1
fileprep=$(grep -v '=' $file | grep -v '!' | grep -v '*' | grep -o '[[:digit:]]*' | grep . )
linecount=$(echo "$fileprep" | wc -l)
len=10
start=1
end=$(( $linecount - $len + 1 ))



for i in $(seq $start $end); do
test="$testn$(count "$fileprep" $i $((i+len-1)))"
done

a=$(printf $test | grep -v 'b1b' )

mostrepetitions=$(echo "$a" | sort -rn | head -n1)

for i in $(seq 1 $mostrepetitions); do
var1=$(printf "$a" | grep 'b'$i'b' | wc -l)
var2="$var2n$(echo $(( var1 / i )))"
done

printf "$var2" | tr 'n' '+' | awk 'print "0"$0' | bc -l


I have found that this works correctly on a simple file that has the numbers 1-10 repeated twice (like so):



1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10


On this, it will correctly output 1 (with the len variable at 10). When the len variable is changed to 9, it will correctly output two, because both 1-9 and 2-10 are 9 line patterns that occur at least twice.



However, when I run this on my target files (an example of which can be found here), I get impossible results.



In this script, the amount of nine-line patterns found will always have to be at least double the amount of ten line patterns. Take the above example of 1-10. In that, 1-10 is the only ten line pattern. However, within it are both 1-9 and 2-10, both of which are repeated twice. When I run my script though, for ten-line repeated patterns, I get an output of 2, and for nine-line patterns I also get an output of 2. This is clearly incorrect. Why is this happening?



Note - the fileprep variable was created to create a list of numbers from the input file (see the sample file I linked).










share|improve this question














I have a bash script that I've been working on for a while. Basically, it searches through text to find repetitions of multiple lines. Here is what I have so far:



#!/bin/bash

count() pcregrep -Mc "^Q$(echo "$pattern")E$"


file=$1
fileprep=$(grep -v '=' $file | grep -v '!' | grep -v '*' | grep -o '[[:digit:]]*' | grep . )
linecount=$(echo "$fileprep" | wc -l)
len=10
start=1
end=$(( $linecount - $len + 1 ))



for i in $(seq $start $end); do
test="$testn$(count "$fileprep" $i $((i+len-1)))"
done

a=$(printf $test | grep -v 'b1b' )

mostrepetitions=$(echo "$a" | sort -rn | head -n1)

for i in $(seq 1 $mostrepetitions); do
var1=$(printf "$a" | grep 'b'$i'b' | wc -l)
var2="$var2n$(echo $(( var1 / i )))"
done

printf "$var2" | tr 'n' '+' | awk 'print "0"$0' | bc -l


I have found that this works correctly on a simple file that has the numbers 1-10 repeated twice (like so):



1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10


On this, it will correctly output 1 (with the len variable at 10). When the len variable is changed to 9, it will correctly output two, because both 1-9 and 2-10 are 9 line patterns that occur at least twice.



However, when I run this on my target files (an example of which can be found here), I get impossible results.



In this script, the amount of nine-line patterns found will always have to be at least double the amount of ten line patterns. Take the above example of 1-10. In that, 1-10 is the only ten line pattern. However, within it are both 1-9 and 2-10, both of which are repeated twice. When I run my script though, for ten-line repeated patterns, I get an output of 2, and for nine-line patterns I also get an output of 2. This is clearly incorrect. Why is this happening?



Note - the fileprep variable was created to create a list of numbers from the input file (see the sample file I linked).







bash text-processing






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 27 at 15:29









ToasterFrogsToasterFrogs

443




443












  • Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.

    – nohillside
    Jan 27 at 16:33











  • What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?

    – Kusalananda
    Jan 27 at 17:11

















  • Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.

    – nohillside
    Jan 27 at 16:33











  • What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?

    – Kusalananda
    Jan 27 at 17:11
















Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.

– nohillside
Jan 27 at 16:33





Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.

– nohillside
Jan 27 at 16:33













What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?

– Kusalananda
Jan 27 at 17:11





What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?

– Kusalananda
Jan 27 at 17:11










1 Answer
1






active

oldest

votes


















1














The phenomenon you describe is actually not impossible, so your script is not the problem. The smallest example I can think of is with len=3 as opposed to len=2, and the input file is



1
2
1
2
1
2


With len=3, you get the result 2, but with len=2, you don't get some number ≥4 as you would maybe suspect, but again the result 2. In order to get the same number of distinct repeating patterns with len=10 as well as with len=9, you just need to extrapolate the file to 13 lines.



Addendum:



I modified the count() function to



count() pcregrep -Mc "^Q$(echo "$pattern")E$")
[ $occur -ge 2 ] && echo "$pattern occurs $occur times." >&2
echo $occur



So it prints the pattern which repeats to the standard error output. It says that the 10-line pattern



16
...
16


appears 360 times, while the 10-line pattern



16
...
16
8


appears twice. On the other hand, the 9-line pattern



16
...
16


appears 362 times, while



16
...
16
8


appears twice. Your file contains many blocks of subsequent lines with 16. What puzzles me is why the 9 lines with 16 do not occur once more for each such block, but only two times more than the 10 lines in total.






share|improve this answer

























  • Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.

    – ToasterFrogs
    Feb 9 at 17:24










Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497028%2fbash-script-only-functioning-on-certain-inputs%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














The phenomenon you describe is actually not impossible, so your script is not the problem. The smallest example I can think of is with len=3 as opposed to len=2, and the input file is



1
2
1
2
1
2


With len=3, you get the result 2, but with len=2, you don't get some number ≥4 as you would maybe suspect, but again the result 2. In order to get the same number of distinct repeating patterns with len=10 as well as with len=9, you just need to extrapolate the file to 13 lines.



Addendum:



I modified the count() function to



count() pcregrep -Mc "^Q$(echo "$pattern")E$")
[ $occur -ge 2 ] && echo "$pattern occurs $occur times." >&2
echo $occur



So it prints the pattern which repeats to the standard error output. It says that the 10-line pattern



16
...
16


appears 360 times, while the 10-line pattern



16
...
16
8


appears twice. On the other hand, the 9-line pattern



16
...
16


appears 362 times, while



16
...
16
8


appears twice. Your file contains many blocks of subsequent lines with 16. What puzzles me is why the 9 lines with 16 do not occur once more for each such block, but only two times more than the 10 lines in total.






share|improve this answer

























  • Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.

    – ToasterFrogs
    Feb 9 at 17:24















1














The phenomenon you describe is actually not impossible, so your script is not the problem. The smallest example I can think of is with len=3 as opposed to len=2, and the input file is



1
2
1
2
1
2


With len=3, you get the result 2, but with len=2, you don't get some number ≥4 as you would maybe suspect, but again the result 2. In order to get the same number of distinct repeating patterns with len=10 as well as with len=9, you just need to extrapolate the file to 13 lines.



Addendum:



I modified the count() function to



count() pcregrep -Mc "^Q$(echo "$pattern")E$")
[ $occur -ge 2 ] && echo "$pattern occurs $occur times." >&2
echo $occur



So it prints the pattern which repeats to the standard error output. It says that the 10-line pattern



16
...
16


appears 360 times, while the 10-line pattern



16
...
16
8


appears twice. On the other hand, the 9-line pattern



16
...
16


appears 362 times, while



16
...
16
8


appears twice. Your file contains many blocks of subsequent lines with 16. What puzzles me is why the 9 lines with 16 do not occur once more for each such block, but only two times more than the 10 lines in total.






share|improve this answer

























  • Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.

    – ToasterFrogs
    Feb 9 at 17:24













1












1








1







The phenomenon you describe is actually not impossible, so your script is not the problem. The smallest example I can think of is with len=3 as opposed to len=2, and the input file is



1
2
1
2
1
2


With len=3, you get the result 2, but with len=2, you don't get some number ≥4 as you would maybe suspect, but again the result 2. In order to get the same number of distinct repeating patterns with len=10 as well as with len=9, you just need to extrapolate the file to 13 lines.



Addendum:



I modified the count() function to



count() pcregrep -Mc "^Q$(echo "$pattern")E$")
[ $occur -ge 2 ] && echo "$pattern occurs $occur times." >&2
echo $occur



So it prints the pattern which repeats to the standard error output. It says that the 10-line pattern



16
...
16


appears 360 times, while the 10-line pattern



16
...
16
8


appears twice. On the other hand, the 9-line pattern



16
...
16


appears 362 times, while



16
...
16
8


appears twice. Your file contains many blocks of subsequent lines with 16. What puzzles me is why the 9 lines with 16 do not occur once more for each such block, but only two times more than the 10 lines in total.






share|improve this answer















The phenomenon you describe is actually not impossible, so your script is not the problem. The smallest example I can think of is with len=3 as opposed to len=2, and the input file is



1
2
1
2
1
2


With len=3, you get the result 2, but with len=2, you don't get some number ≥4 as you would maybe suspect, but again the result 2. In order to get the same number of distinct repeating patterns with len=10 as well as with len=9, you just need to extrapolate the file to 13 lines.



Addendum:



I modified the count() function to



count() pcregrep -Mc "^Q$(echo "$pattern")E$")
[ $occur -ge 2 ] && echo "$pattern occurs $occur times." >&2
echo $occur



So it prints the pattern which repeats to the standard error output. It says that the 10-line pattern



16
...
16


appears 360 times, while the 10-line pattern



16
...
16
8


appears twice. On the other hand, the 9-line pattern



16
...
16


appears 362 times, while



16
...
16
8


appears twice. Your file contains many blocks of subsequent lines with 16. What puzzles me is why the 9 lines with 16 do not occur once more for each such block, but only two times more than the 10 lines in total.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 28 at 15:34

























answered Jan 27 at 17:18









Stefan HamckeStefan Hamcke

217312




217312












  • Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.

    – ToasterFrogs
    Feb 9 at 17:24

















  • Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.

    – ToasterFrogs
    Feb 9 at 17:24
















Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.

– ToasterFrogs
Feb 9 at 17:24





Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.

– ToasterFrogs
Feb 9 at 17:24

















draft saved

draft discarded
















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497028%2fbash-script-only-functioning-on-certain-inputs%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown






Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?