Bash script only functioning on certain inputs
Clash Royale CLAN TAG#URR8PPP
I have a bash script that I've been working on for a while. Basically, it searches through text to find repetitions of multiple lines. Here is what I have so far:
#!/bin/bash
count() pcregrep -Mc "^Q$(echo "$pattern")E$"
file=$1
fileprep=$(grep -v '=' $file | grep -v '!' | grep -v '*' | grep -o '[[:digit:]]*' | grep . )
linecount=$(echo "$fileprep" | wc -l)
len=10
start=1
end=$(( $linecount - $len + 1 ))
for i in $(seq $start $end); do
test="$testn$(count "$fileprep" $i $((i+len-1)))"
done
a=$(printf $test | grep -v 'b1b' )
mostrepetitions=$(echo "$a" | sort -rn | head -n1)
for i in $(seq 1 $mostrepetitions); do
var1=$(printf "$a" | grep 'b'$i'b' | wc -l)
var2="$var2n$(echo $(( var1 / i )))"
done
printf "$var2" | tr 'n' '+' | awk 'print "0"$0' | bc -l
I have found that this works correctly on a simple file that has the numbers 1-10 repeated twice (like so):
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
On this, it will correctly output 1 (with the len
variable at 10). When the len
variable is changed to 9, it will correctly output two, because both 1-9 and 2-10 are 9 line patterns that occur at least twice.
However, when I run this on my target files (an example of which can be found here), I get impossible results.
In this script, the amount of nine-line patterns found will always have to be at least double the amount of ten line patterns. Take the above example of 1-10. In that, 1-10 is the only ten line pattern. However, within it are both 1-9 and 2-10, both of which are repeated twice. When I run my script though, for ten-line repeated patterns, I get an output of 2, and for nine-line patterns I also get an output of 2. This is clearly incorrect. Why is this happening?
Note - the fileprep
variable was created to create a list of numbers from the input file (see the sample file I linked).
bash text-processing
add a comment |
I have a bash script that I've been working on for a while. Basically, it searches through text to find repetitions of multiple lines. Here is what I have so far:
#!/bin/bash
count() pcregrep -Mc "^Q$(echo "$pattern")E$"
file=$1
fileprep=$(grep -v '=' $file | grep -v '!' | grep -v '*' | grep -o '[[:digit:]]*' | grep . )
linecount=$(echo "$fileprep" | wc -l)
len=10
start=1
end=$(( $linecount - $len + 1 ))
for i in $(seq $start $end); do
test="$testn$(count "$fileprep" $i $((i+len-1)))"
done
a=$(printf $test | grep -v 'b1b' )
mostrepetitions=$(echo "$a" | sort -rn | head -n1)
for i in $(seq 1 $mostrepetitions); do
var1=$(printf "$a" | grep 'b'$i'b' | wc -l)
var2="$var2n$(echo $(( var1 / i )))"
done
printf "$var2" | tr 'n' '+' | awk 'print "0"$0' | bc -l
I have found that this works correctly on a simple file that has the numbers 1-10 repeated twice (like so):
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
On this, it will correctly output 1 (with the len
variable at 10). When the len
variable is changed to 9, it will correctly output two, because both 1-9 and 2-10 are 9 line patterns that occur at least twice.
However, when I run this on my target files (an example of which can be found here), I get impossible results.
In this script, the amount of nine-line patterns found will always have to be at least double the amount of ten line patterns. Take the above example of 1-10. In that, 1-10 is the only ten line pattern. However, within it are both 1-9 and 2-10, both of which are repeated twice. When I run my script though, for ten-line repeated patterns, I get an output of 2, and for nine-line patterns I also get an output of 2. This is clearly incorrect. Why is this happening?
Note - the fileprep
variable was created to create a list of numbers from the input file (see the sample file I linked).
bash text-processing
Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.
– nohillside
Jan 27 at 16:33
What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?
– Kusalananda
Jan 27 at 17:11
add a comment |
I have a bash script that I've been working on for a while. Basically, it searches through text to find repetitions of multiple lines. Here is what I have so far:
#!/bin/bash
count() pcregrep -Mc "^Q$(echo "$pattern")E$"
file=$1
fileprep=$(grep -v '=' $file | grep -v '!' | grep -v '*' | grep -o '[[:digit:]]*' | grep . )
linecount=$(echo "$fileprep" | wc -l)
len=10
start=1
end=$(( $linecount - $len + 1 ))
for i in $(seq $start $end); do
test="$testn$(count "$fileprep" $i $((i+len-1)))"
done
a=$(printf $test | grep -v 'b1b' )
mostrepetitions=$(echo "$a" | sort -rn | head -n1)
for i in $(seq 1 $mostrepetitions); do
var1=$(printf "$a" | grep 'b'$i'b' | wc -l)
var2="$var2n$(echo $(( var1 / i )))"
done
printf "$var2" | tr 'n' '+' | awk 'print "0"$0' | bc -l
I have found that this works correctly on a simple file that has the numbers 1-10 repeated twice (like so):
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
On this, it will correctly output 1 (with the len
variable at 10). When the len
variable is changed to 9, it will correctly output two, because both 1-9 and 2-10 are 9 line patterns that occur at least twice.
However, when I run this on my target files (an example of which can be found here), I get impossible results.
In this script, the amount of nine-line patterns found will always have to be at least double the amount of ten line patterns. Take the above example of 1-10. In that, 1-10 is the only ten line pattern. However, within it are both 1-9 and 2-10, both of which are repeated twice. When I run my script though, for ten-line repeated patterns, I get an output of 2, and for nine-line patterns I also get an output of 2. This is clearly incorrect. Why is this happening?
Note - the fileprep
variable was created to create a list of numbers from the input file (see the sample file I linked).
bash text-processing
I have a bash script that I've been working on for a while. Basically, it searches through text to find repetitions of multiple lines. Here is what I have so far:
#!/bin/bash
count() pcregrep -Mc "^Q$(echo "$pattern")E$"
file=$1
fileprep=$(grep -v '=' $file | grep -v '!' | grep -v '*' | grep -o '[[:digit:]]*' | grep . )
linecount=$(echo "$fileprep" | wc -l)
len=10
start=1
end=$(( $linecount - $len + 1 ))
for i in $(seq $start $end); do
test="$testn$(count "$fileprep" $i $((i+len-1)))"
done
a=$(printf $test | grep -v 'b1b' )
mostrepetitions=$(echo "$a" | sort -rn | head -n1)
for i in $(seq 1 $mostrepetitions); do
var1=$(printf "$a" | grep 'b'$i'b' | wc -l)
var2="$var2n$(echo $(( var1 / i )))"
done
printf "$var2" | tr 'n' '+' | awk 'print "0"$0' | bc -l
I have found that this works correctly on a simple file that has the numbers 1-10 repeated twice (like so):
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
On this, it will correctly output 1 (with the len
variable at 10). When the len
variable is changed to 9, it will correctly output two, because both 1-9 and 2-10 are 9 line patterns that occur at least twice.
However, when I run this on my target files (an example of which can be found here), I get impossible results.
In this script, the amount of nine-line patterns found will always have to be at least double the amount of ten line patterns. Take the above example of 1-10. In that, 1-10 is the only ten line pattern. However, within it are both 1-9 and 2-10, both of which are repeated twice. When I run my script though, for ten-line repeated patterns, I get an output of 2, and for nine-line patterns I also get an output of 2. This is clearly incorrect. Why is this happening?
Note - the fileprep
variable was created to create a list of numbers from the input file (see the sample file I linked).
bash text-processing
bash text-processing
asked Jan 27 at 15:29
ToasterFrogsToasterFrogs
443
443
Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.
– nohillside
Jan 27 at 16:33
What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?
– Kusalananda
Jan 27 at 17:11
add a comment |
Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.
– nohillside
Jan 27 at 16:33
What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?
– Kusalananda
Jan 27 at 17:11
Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.
– nohillside
Jan 27 at 16:33
Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.
– nohillside
Jan 27 at 16:33
What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?
– Kusalananda
Jan 27 at 17:11
What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?
– Kusalananda
Jan 27 at 17:11
add a comment |
1 Answer
1
active
oldest
votes
The phenomenon you describe is actually not impossible, so your script is not the problem. The smallest example I can think of is with len=3
as opposed to len=2
, and the input file is
1
2
1
2
1
2
With len=3
, you get the result 2
, but with len=2
, you don't get some number ≥4
as you would maybe suspect, but again the result 2
. In order to get the same number of distinct repeating patterns with len=10
as well as with len=9
, you just need to extrapolate the file to 13 lines.
Addendum:
I modified the count()
function to
count() pcregrep -Mc "^Q$(echo "$pattern")E$")
[ $occur -ge 2 ] && echo "$pattern occurs $occur times." >&2
echo $occur
So it prints the pattern which repeats to the standard error output. It says that the 10-line pattern
16
...
16
appears 360 times, while the 10-line pattern
16
...
16
8
appears twice. On the other hand, the 9-line pattern
16
...
16
appears 362 times, while
16
...
16
8
appears twice. Your file contains many blocks of subsequent lines with 16
. What puzzles me is why the 9 lines with 16
do not occur once more for each such block, but only two times more than the 10 lines in total.
Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.
– ToasterFrogs
Feb 9 at 17:24
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497028%2fbash-script-only-functioning-on-certain-inputs%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The phenomenon you describe is actually not impossible, so your script is not the problem. The smallest example I can think of is with len=3
as opposed to len=2
, and the input file is
1
2
1
2
1
2
With len=3
, you get the result 2
, but with len=2
, you don't get some number ≥4
as you would maybe suspect, but again the result 2
. In order to get the same number of distinct repeating patterns with len=10
as well as with len=9
, you just need to extrapolate the file to 13 lines.
Addendum:
I modified the count()
function to
count() pcregrep -Mc "^Q$(echo "$pattern")E$")
[ $occur -ge 2 ] && echo "$pattern occurs $occur times." >&2
echo $occur
So it prints the pattern which repeats to the standard error output. It says that the 10-line pattern
16
...
16
appears 360 times, while the 10-line pattern
16
...
16
8
appears twice. On the other hand, the 9-line pattern
16
...
16
appears 362 times, while
16
...
16
8
appears twice. Your file contains many blocks of subsequent lines with 16
. What puzzles me is why the 9 lines with 16
do not occur once more for each such block, but only two times more than the 10 lines in total.
Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.
– ToasterFrogs
Feb 9 at 17:24
add a comment |
The phenomenon you describe is actually not impossible, so your script is not the problem. The smallest example I can think of is with len=3
as opposed to len=2
, and the input file is
1
2
1
2
1
2
With len=3
, you get the result 2
, but with len=2
, you don't get some number ≥4
as you would maybe suspect, but again the result 2
. In order to get the same number of distinct repeating patterns with len=10
as well as with len=9
, you just need to extrapolate the file to 13 lines.
Addendum:
I modified the count()
function to
count() pcregrep -Mc "^Q$(echo "$pattern")E$")
[ $occur -ge 2 ] && echo "$pattern occurs $occur times." >&2
echo $occur
So it prints the pattern which repeats to the standard error output. It says that the 10-line pattern
16
...
16
appears 360 times, while the 10-line pattern
16
...
16
8
appears twice. On the other hand, the 9-line pattern
16
...
16
appears 362 times, while
16
...
16
8
appears twice. Your file contains many blocks of subsequent lines with 16
. What puzzles me is why the 9 lines with 16
do not occur once more for each such block, but only two times more than the 10 lines in total.
Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.
– ToasterFrogs
Feb 9 at 17:24
add a comment |
The phenomenon you describe is actually not impossible, so your script is not the problem. The smallest example I can think of is with len=3
as opposed to len=2
, and the input file is
1
2
1
2
1
2
With len=3
, you get the result 2
, but with len=2
, you don't get some number ≥4
as you would maybe suspect, but again the result 2
. In order to get the same number of distinct repeating patterns with len=10
as well as with len=9
, you just need to extrapolate the file to 13 lines.
Addendum:
I modified the count()
function to
count() pcregrep -Mc "^Q$(echo "$pattern")E$")
[ $occur -ge 2 ] && echo "$pattern occurs $occur times." >&2
echo $occur
So it prints the pattern which repeats to the standard error output. It says that the 10-line pattern
16
...
16
appears 360 times, while the 10-line pattern
16
...
16
8
appears twice. On the other hand, the 9-line pattern
16
...
16
appears 362 times, while
16
...
16
8
appears twice. Your file contains many blocks of subsequent lines with 16
. What puzzles me is why the 9 lines with 16
do not occur once more for each such block, but only two times more than the 10 lines in total.
The phenomenon you describe is actually not impossible, so your script is not the problem. The smallest example I can think of is with len=3
as opposed to len=2
, and the input file is
1
2
1
2
1
2
With len=3
, you get the result 2
, but with len=2
, you don't get some number ≥4
as you would maybe suspect, but again the result 2
. In order to get the same number of distinct repeating patterns with len=10
as well as with len=9
, you just need to extrapolate the file to 13 lines.
Addendum:
I modified the count()
function to
count() pcregrep -Mc "^Q$(echo "$pattern")E$")
[ $occur -ge 2 ] && echo "$pattern occurs $occur times." >&2
echo $occur
So it prints the pattern which repeats to the standard error output. It says that the 10-line pattern
16
...
16
appears 360 times, while the 10-line pattern
16
...
16
8
appears twice. On the other hand, the 9-line pattern
16
...
16
appears 362 times, while
16
...
16
8
appears twice. Your file contains many blocks of subsequent lines with 16
. What puzzles me is why the 9 lines with 16
do not occur once more for each such block, but only two times more than the 10 lines in total.
edited Jan 28 at 15:34
answered Jan 27 at 17:18
Stefan HamckeStefan Hamcke
217312
217312
Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.
– ToasterFrogs
Feb 9 at 17:24
add a comment |
Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.
– ToasterFrogs
Feb 9 at 17:24
Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.
– ToasterFrogs
Feb 9 at 17:24
Thank you very much, I had never thought about the fact that there are more distinct 10 line possibilities than 9 line.
– ToasterFrogs
Feb 9 at 17:24
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497028%2fbash-script-only-functioning-on-certain-inputs%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Some comments in the code would help to understand what the idea behind the various parts are, and what they should do.
– nohillside
Jan 27 at 16:33
What's tho overall purpose? To find cycles? Or to do a sort of a frequency analysis?
– Kusalananda
Jan 27 at 17:11