Count multi-line patterns in file

I am looking for a way to search for a multi line pattern across a file.

For example, say this list of numbers was my input file:

If I wanted to search for instances of lines 2-4 (inclusive), I would like the result to be:

Since that is the amount of times those particular lines are exactly repeated. I would also like this to work with any given amount of lines, as well as any given line number range in the file.

edited Jan 26 at 0:15

Sparhawk

9,69764094

asked Jan 25 at 22:56

ToasterFrogs

443

If it's inclusive then only the value in line 3 is repeated three times. The values in lines 2 and 4 are repeated four times.

– Nasir Riley
Jan 25 at 23:10

1

@NasirRiley I think they are asking for a multi-line grep, i.e. 2n5n4

– Sparhawk
Jan 25 at 23:16

I really can't tell what OP is looking for. Is it possible to reword it in simpler terms?

– Jesse_b
Jan 25 at 23:17

What @Sparhawk said is correct - I am looking for something like a multi line grep.

– ToasterFrogs
Jan 25 at 23:29

2

Is the input to this script "lines 2 through 4" or is it "the sequence of numbers 2,5,4"?

– Jeff Schaller
Jan 25 at 23:46

|
show 1 more comment

I am looking for a way to search for a multi line pattern across a file.

For example, say this list of numbers was my input file:

If I wanted to search for instances of lines 2-4 (inclusive), I would like the result to be:

Since that is the amount of times those particular lines are exactly repeated. I would also like this to work with any given amount of lines, as well as any given line number range in the file.

edited Jan 26 at 0:15

Sparhawk

9,69764094

asked Jan 25 at 22:56

ToasterFrogs

443

If it's inclusive then only the value in line 3 is repeated three times. The values in lines 2 and 4 are repeated four times.

– Nasir Riley
Jan 25 at 23:10

1

@NasirRiley I think they are asking for a multi-line grep, i.e. 2n5n4

– Sparhawk
Jan 25 at 23:16

I really can't tell what OP is looking for. Is it possible to reword it in simpler terms?

– Jesse_b
Jan 25 at 23:17

What @Sparhawk said is correct - I am looking for something like a multi line grep.

– ToasterFrogs
Jan 25 at 23:29

2

Is the input to this script "lines 2 through 4" or is it "the sequence of numbers 2,5,4"?

– Jeff Schaller
Jan 25 at 23:46

|
show 1 more comment

I am looking for a way to search for a multi line pattern across a file.

For example, say this list of numbers was my input file:

If I wanted to search for instances of lines 2-4 (inclusive), I would like the result to be:

Since that is the amount of times those particular lines are exactly repeated. I would also like this to work with any given amount of lines, as well as any given line number range in the file.

edited Jan 26 at 0:15

Sparhawk

9,69764094

asked Jan 25 at 22:56

ToasterFrogs

443

I am looking for a way to search for a multi line pattern across a file.

For example, say this list of numbers was my input file:

If I wanted to search for instances of lines 2-4 (inclusive), I would like the result to be:

Since that is the amount of times those particular lines are exactly repeated. I would also like this to work with any given amount of lines, as well as any given line number range in the file.

bash text-processing

edited Jan 26 at 0:15

Sparhawk

9,69764094

asked Jan 25 at 22:56

ToasterFrogs

443

edited Jan 26 at 0:15

Sparhawk

9,69764094

asked Jan 25 at 22:56

ToasterFrogs

443

edited Jan 26 at 0:15

Sparhawk

9,69764094

edited Jan 26 at 0:15

Sparhawk

9,69764094

edited Jan 26 at 0:15

Sparhawk

9,69764094

asked Jan 25 at 22:56

ToasterFrogs

443

asked Jan 25 at 22:56

ToasterFrogs

443

asked Jan 25 at 22:56

ToasterFrogs

443

If it's inclusive then only the value in line 3 is repeated three times. The values in lines 2 and 4 are repeated four times.

– Nasir Riley
Jan 25 at 23:10

1

@NasirRiley I think they are asking for a multi-line grep, i.e. 2n5n4

– Sparhawk
Jan 25 at 23:16

I really can't tell what OP is looking for. Is it possible to reword it in simpler terms?

– Jesse_b
Jan 25 at 23:17

What @Sparhawk said is correct - I am looking for something like a multi line grep.

– ToasterFrogs
Jan 25 at 23:29

2

Is the input to this script "lines 2 through 4" or is it "the sequence of numbers 2,5,4"?

– Jeff Schaller
Jan 25 at 23:46

|
show 1 more comment

If it's inclusive then only the value in line 3 is repeated three times. The values in lines 2 and 4 are repeated four times.

– Nasir Riley
Jan 25 at 23:10

1

@NasirRiley I think they are asking for a multi-line grep, i.e. 2n5n4

– Sparhawk
Jan 25 at 23:16

I really can't tell what OP is looking for. Is it possible to reword it in simpler terms?

– Jesse_b
Jan 25 at 23:17

What @Sparhawk said is correct - I am looking for something like a multi line grep.

– ToasterFrogs
Jan 25 at 23:29

2

Is the input to this script "lines 2 through 4" or is it "the sequence of numbers 2,5,4"?

– Jeff Schaller
Jan 25 at 23:46

If it's inclusive then only the value in line 3 is repeated three times. The values in lines 2 and 4 are repeated four times.

– Nasir Riley
Jan 25 at 23:10

@NasirRiley I think they are asking for a multi-line grep, i.e. 2n5n4

– Sparhawk
Jan 25 at 23:16

I really can't tell what OP is looking for. Is it possible to reword it in simpler terms?

– Jesse_b
Jan 25 at 23:17

What @Sparhawk said is correct - I am looking for something like a multi line grep.

– ToasterFrogs
Jan 25 at 23:29

Is the input to this script "lines 2 through 4" or is it "the sequence of numbers 2,5,4"?

– Jeff Schaller
Jan 25 at 23:46

|
show 1 more comment

5 Answers
5

active

oldest

votes

You could use pcregrep, which is available in most distros. The following command matches a fixed string.

pcregrep -Mc '^2n5n4$' input.txt

Explanation

From the man page, pcregrep is "a grep with Perl-compatible regular expressions."

-M: match the regex over multiple lines

-c: output the number of matches (count), instead of the matches themselves

^2n5n4$: regex for 2, 5, 4, each on a separate line.

Pattern from specific lines instead

Later comments in the question suggest that the pattern to be matched is not a fixed string, but instead a general "lines 2 through 4". Here, you can use command substitution to parse the lines from the input file instead.

pcregrep -Mc "^Q$(sed -n 2,4p input.txt)E$" input.txt

Explanation

tail -n+2 input.txt: output the file, from line 2 inclusive

head -n3: only output the first three lines

Q...E: quote the ... part for a basic string matching as opposed to regexp matching (assumes the output of the command doesn't contain E).

Note that it assumes the last lines of the output of sed ... input.txt are not empty as command substitution ($(...)) strips all trailing newline characters.

edited Jan 26 at 22:31

answered Jan 26 at 0:05

Sparhawk

9,69764094

2

sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

– glenn jackman
Jan 26 at 14:43

Thanks @glennjackman. Good point.

– Sparhawk
Jan 26 at 22:31

add a comment |

$ perl -l -0777pe '$_=()=/^2n5n4$/mg' input_file
3

Working:

-0777 => slurp mode, meaning read the whole file in.

-p => before reading the next record, print the current record, $_ to stdout.

-l => set the RS = ORS = "n"

the regex /^2n5n4$/mg is implicitly applied on the $_, which in our case is the whole file remember. the /m regex modifier shall match the line endings and beginnings too apart from string beginnings and string endings. /g modifier will get all the matches in the $_ aka the whole file.

We do this in the list-context, and assign it to an empty list. The $_ thus gets re-assigned with the number of elements in the list, which is the number of times the regex matched really.

HTH

answered Jan 26 at 11:34

Rakesh Sharma

332

Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

– glenn jackman
Jan 26 at 14:40

Thanks @glenn jackman, for providing the generalization.

– Rakesh Sharma
Jan 27 at 3:23

add a comment |

Your post doesn't mention any requirement for regular expression support, so I'm going to assume that you will be searching for fixed, literal text strings.

This probably isn't the fastest algorithm you've ever seen, but it works, if you have enough time. It has the slight defect that if there are more than one N-line patterns that begin with the same first line and have the same SHA256 hash, it will give incorrect results. It assumes that all possible N-line patterns will have unique SHA256 hashes.

It will be tediously slow on large files, especially those which contain numerous occurrences of the first line of the pattern.

#!/usr/bin/env bash

# What's the name of the list file?
LIST=list

# What's the name of the pattern file?
PATTERN=pattern

# We'll figure out how many times the pattern lines appear (consecutively) in the list.

# Where's your SHA256 tool?
SHA256=/sbin/sha256

# what's the first line of pattern?
PATTERN_START="$(head -1 $PATTERN)"

# where in the list does that single line appear (what line numbers?)
START_LINES="$(grep -nx "$PATTERN_START" $LIST | sed -e 's/:.*//')"

# how many lines long is the pattern?
PAT_LEN="$(grep -c ^ < $PATTERN)"

echo Pattern is $PAT_LEN lines long, and might start at any of these lines:
echo $START_LINES

PAT_HASH="$($SHA256 < "$PATTERN")"

# So how many times does $PATTERN appear consecutively in $LIST?
PAT_COUNT=0

for LINE in $START_LINES
do
 HASH="$(tail +$LINE $LIST | head -$PAT_LEN | $SHA256 -q)"
 if [ "$HASH" = "$PAT_HASH" ]
 then
 echo match at line $LINE
 PAT_COUNT=$(($PAT_COUNT+1))
 fi
done

echo The pattern was found $PAT_COUNT times

The output:

$ cat list
3
2
5
4
8
2
5
4
2
4
2
5
4
$ cat pattern
2
5
4
$ . foo.sh 
Pattern is 3 lines long, and might start at any of these lines:
2 6 9 11
match at line 2
match at line 6
match at line 11
The pattern was found 3 times

edited Jan 26 at 0:04

answered Jan 25 at 23:30

Jim L.

1313

add a comment |

mpc() tail -n $line_count)
 awk -v RS='' -v FPAT="$multiline_pattern" 'print NF' "$3"


# count how many times multiline-pattern defined by lines 2 to 4 (inclusive) occurs
mpc 2 4 input_file

Requirement:

The second argument must be at least equal to or greater than the first argument. I make no guarantee to the output if you violate that.

Disclaimer:

This doesn't work if characters and/or $ appear in any of the lines included as a pattern. awk struggles to process those characters as parts of a pattern even if they're backslash-escaped.

edited Jan 26 at 2:56

answered Jan 26 at 2:21

Niko Gambt

1836

add a comment |

How about

a="2 5 4"; tr 'n' ' ' < test | grep -o "[^0-9]$a[^0-9]" | wc -l

With the separator of your choice....

You need the regex to prevent a match in the event of .... 22 5 44 ... or similar

edited Jan 26 at 15:02

answered Jan 26 at 14:55

bu5hman

1,282214

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f496773%2fcount-multi-line-patterns-in-file%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

You could use pcregrep, which is available in most distros. The following command matches a fixed string.

pcregrep -Mc '^2n5n4$' input.txt

Explanation

From the man page, pcregrep is "a grep with Perl-compatible regular expressions."

-M: match the regex over multiple lines

-c: output the number of matches (count), instead of the matches themselves

^2n5n4$: regex for 2, 5, 4, each on a separate line.

Pattern from specific lines instead

pcregrep -Mc "^Q$(sed -n 2,4p input.txt)E$" input.txt

Explanation

tail -n+2 input.txt: output the file, from line 2 inclusive

head -n3: only output the first three lines

Q...E: quote the ... part for a basic string matching as opposed to regexp matching (assumes the output of the command doesn't contain E).

Note that it assumes the last lines of the output of sed ... input.txt are not empty as command substitution ($(...)) strips all trailing newline characters.

edited Jan 26 at 22:31

answered Jan 26 at 0:05

Sparhawk

9,69764094

2

sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

– glenn jackman
Jan 26 at 14:43

Thanks @glennjackman. Good point.

– Sparhawk
Jan 26 at 22:31

add a comment |

You could use pcregrep, which is available in most distros. The following command matches a fixed string.

pcregrep -Mc '^2n5n4$' input.txt

Explanation

From the man page, pcregrep is "a grep with Perl-compatible regular expressions."

-M: match the regex over multiple lines

-c: output the number of matches (count), instead of the matches themselves

^2n5n4$: regex for 2, 5, 4, each on a separate line.

Pattern from specific lines instead

pcregrep -Mc "^Q$(sed -n 2,4p input.txt)E$" input.txt

Explanation

tail -n+2 input.txt: output the file, from line 2 inclusive

head -n3: only output the first three lines

Q...E: quote the ... part for a basic string matching as opposed to regexp matching (assumes the output of the command doesn't contain E).

Note that it assumes the last lines of the output of sed ... input.txt are not empty as command substitution ($(...)) strips all trailing newline characters.

edited Jan 26 at 22:31

answered Jan 26 at 0:05

Sparhawk

9,69764094

2

sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

– glenn jackman
Jan 26 at 14:43

Thanks @glennjackman. Good point.

– Sparhawk
Jan 26 at 22:31

add a comment |

You could use pcregrep, which is available in most distros. The following command matches a fixed string.

pcregrep -Mc '^2n5n4$' input.txt

Explanation

From the man page, pcregrep is "a grep with Perl-compatible regular expressions."

-M: match the regex over multiple lines

-c: output the number of matches (count), instead of the matches themselves

^2n5n4$: regex for 2, 5, 4, each on a separate line.

Pattern from specific lines instead

pcregrep -Mc "^Q$(sed -n 2,4p input.txt)E$" input.txt

Explanation

tail -n+2 input.txt: output the file, from line 2 inclusive

head -n3: only output the first three lines

Q...E: quote the ... part for a basic string matching as opposed to regexp matching (assumes the output of the command doesn't contain E).

Note that it assumes the last lines of the output of sed ... input.txt are not empty as command substitution ($(...)) strips all trailing newline characters.

edited Jan 26 at 22:31

answered Jan 26 at 0:05

Sparhawk

9,69764094

You could use pcregrep, which is available in most distros. The following command matches a fixed string.

pcregrep -Mc '^2n5n4$' input.txt

Explanation

From the man page, pcregrep is "a grep with Perl-compatible regular expressions."

-M: match the regex over multiple lines

-c: output the number of matches (count), instead of the matches themselves

^2n5n4$: regex for 2, 5, 4, each on a separate line.

Pattern from specific lines instead

pcregrep -Mc "^Q$(sed -n 2,4p input.txt)E$" input.txt

Explanation

tail -n+2 input.txt: output the file, from line 2 inclusive

head -n3: only output the first three lines

Q...E: quote the ... part for a basic string matching as opposed to regexp matching (assumes the output of the command doesn't contain E).

Note that it assumes the last lines of the output of sed ... input.txt are not empty as command substitution ($(...)) strips all trailing newline characters.

edited Jan 26 at 22:31

answered Jan 26 at 0:05

Sparhawk

9,69764094

edited Jan 26 at 22:31

answered Jan 26 at 0:05

Sparhawk

9,69764094

answered Jan 26 at 0:05

Sparhawk

9,69764094

answered Jan 26 at 0:05

Sparhawk

9,69764094

2

sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

– glenn jackman
Jan 26 at 14:43

Thanks @glennjackman. Good point.

– Sparhawk
Jan 26 at 22:31

add a comment |

2

sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

– glenn jackman
Jan 26 at 14:43

Thanks @glennjackman. Good point.

– Sparhawk
Jan 26 at 22:31

sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

– glenn jackman
Jan 26 at 14:43

Thanks @glennjackman. Good point.

– Sparhawk
Jan 26 at 22:31

add a comment |

$ perl -l -0777pe '$_=()=/^2n5n4$/mg' input_file
3

Working:

-0777 => slurp mode, meaning read the whole file in.

-p => before reading the next record, print the current record, $_ to stdout.

-l => set the RS = ORS = "n"

the regex /^2n5n4$/mg is implicitly applied on the $_, which in our case is the whole file remember. the /m regex modifier shall match the line endings and beginnings too apart from string beginnings and string endings. /g modifier will get all the matches in the $_ aka the whole file.

We do this in the list-context, and assign it to an empty list. The $_ thus gets re-assigned with the number of elements in the list, which is the number of times the regex matched really.

HTH

answered Jan 26 at 11:34

Rakesh Sharma

332

Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

– glenn jackman
Jan 26 at 14:40

Thanks @glenn jackman, for providing the generalization.

– Rakesh Sharma
Jan 27 at 3:23

add a comment |

$ perl -l -0777pe '$_=()=/^2n5n4$/mg' input_file
3

Working:

-0777 => slurp mode, meaning read the whole file in.

-p => before reading the next record, print the current record, $_ to stdout.

-l => set the RS = ORS = "n"

the regex /^2n5n4$/mg is implicitly applied on the $_, which in our case is the whole file remember. the /m regex modifier shall match the line endings and beginnings too apart from string beginnings and string endings. /g modifier will get all the matches in the $_ aka the whole file.

We do this in the list-context, and assign it to an empty list. The $_ thus gets re-assigned with the number of elements in the list, which is the number of times the regex matched really.

HTH

answered Jan 26 at 11:34

Rakesh Sharma

332

Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

– glenn jackman
Jan 26 at 14:40

Thanks @glenn jackman, for providing the generalization.

– Rakesh Sharma
Jan 27 at 3:23

add a comment |

$ perl -l -0777pe '$_=()=/^2n5n4$/mg' input_file
3

Working:

-0777 => slurp mode, meaning read the whole file in.

-p => before reading the next record, print the current record, $_ to stdout.

-l => set the RS = ORS = "n"

the regex /^2n5n4$/mg is implicitly applied on the $_, which in our case is the whole file remember. the /m regex modifier shall match the line endings and beginnings too apart from string beginnings and string endings. /g modifier will get all the matches in the $_ aka the whole file.

We do this in the list-context, and assign it to an empty list. The $_ thus gets re-assigned with the number of elements in the list, which is the number of times the regex matched really.

HTH

answered Jan 26 at 11:34

Rakesh Sharma

332

$ perl -l -0777pe '$_=()=/^2n5n4$/mg' input_file
3

Working:

-0777 => slurp mode, meaning read the whole file in.

-p => before reading the next record, print the current record, $_ to stdout.

-l => set the RS = ORS = "n"

the regex /^2n5n4$/mg is implicitly applied on the $_, which in our case is the whole file remember. the /m regex modifier shall match the line endings and beginnings too apart from string beginnings and string endings. /g modifier will get all the matches in the $_ aka the whole file.

We do this in the list-context, and assign it to an empty list. The $_ thus gets re-assigned with the number of elements in the list, which is the number of times the regex matched really.

HTH

answered Jan 26 at 11:34

Rakesh Sharma

332

answered Jan 26 at 11:34

Rakesh Sharma

332

answered Jan 26 at 11:34

Rakesh Sharma

332

answered Jan 26 at 11:34

Rakesh Sharma

332

Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

– glenn jackman
Jan 26 at 14:40

Thanks @glenn jackman, for providing the generalization.

– Rakesh Sharma
Jan 27 at 3:23

add a comment |

Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

– glenn jackman
Jan 26 at 14:40

Thanks @glenn jackman, for providing the generalization.

– Rakesh Sharma
Jan 27 at 3:23

Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

– glenn jackman
Jan 26 at 14:40

Thanks @glenn jackman, for providing the generalization.

– Rakesh Sharma
Jan 27 at 3:23

add a comment |

Your post doesn't mention any requirement for regular expression support, so I'm going to assume that you will be searching for fixed, literal text strings.

It will be tediously slow on large files, especially those which contain numerous occurrences of the first line of the pattern.

#!/usr/bin/env bash

# What's the name of the list file?
LIST=list

# What's the name of the pattern file?
PATTERN=pattern

# We'll figure out how many times the pattern lines appear (consecutively) in the list.

# Where's your SHA256 tool?
SHA256=/sbin/sha256

# what's the first line of pattern?
PATTERN_START="$(head -1 $PATTERN)"

# where in the list does that single line appear (what line numbers?)
START_LINES="$(grep -nx "$PATTERN_START" $LIST | sed -e 's/:.*//')"

# how many lines long is the pattern?
PAT_LEN="$(grep -c ^ < $PATTERN)"

echo Pattern is $PAT_LEN lines long, and might start at any of these lines:
echo $START_LINES

PAT_HASH="$($SHA256 < "$PATTERN")"

# So how many times does $PATTERN appear consecutively in $LIST?
PAT_COUNT=0

for LINE in $START_LINES
do
 HASH="$(tail +$LINE $LIST | head -$PAT_LEN | $SHA256 -q)"
 if [ "$HASH" = "$PAT_HASH" ]
 then
 echo match at line $LINE
 PAT_COUNT=$(($PAT_COUNT+1))
 fi
done

echo The pattern was found $PAT_COUNT times

The output:

$ cat list
3
2
5
4
8
2
5
4
2
4
2
5
4
$ cat pattern
2
5
4
$ . foo.sh 
Pattern is 3 lines long, and might start at any of these lines:
2 6 9 11
match at line 2
match at line 6
match at line 11
The pattern was found 3 times

edited Jan 26 at 0:04

answered Jan 25 at 23:30

Jim L.

1313

add a comment |

Your post doesn't mention any requirement for regular expression support, so I'm going to assume that you will be searching for fixed, literal text strings.

It will be tediously slow on large files, especially those which contain numerous occurrences of the first line of the pattern.

#!/usr/bin/env bash

# What's the name of the list file?
LIST=list

# What's the name of the pattern file?
PATTERN=pattern

# We'll figure out how many times the pattern lines appear (consecutively) in the list.

# Where's your SHA256 tool?
SHA256=/sbin/sha256

# what's the first line of pattern?
PATTERN_START="$(head -1 $PATTERN)"

# where in the list does that single line appear (what line numbers?)
START_LINES="$(grep -nx "$PATTERN_START" $LIST | sed -e 's/:.*//')"

# how many lines long is the pattern?
PAT_LEN="$(grep -c ^ < $PATTERN)"

echo Pattern is $PAT_LEN lines long, and might start at any of these lines:
echo $START_LINES

PAT_HASH="$($SHA256 < "$PATTERN")"

# So how many times does $PATTERN appear consecutively in $LIST?
PAT_COUNT=0

for LINE in $START_LINES
do
 HASH="$(tail +$LINE $LIST | head -$PAT_LEN | $SHA256 -q)"
 if [ "$HASH" = "$PAT_HASH" ]
 then
 echo match at line $LINE
 PAT_COUNT=$(($PAT_COUNT+1))
 fi
done

echo The pattern was found $PAT_COUNT times

The output:

$ cat list
3
2
5
4
8
2
5
4
2
4
2
5
4
$ cat pattern
2
5
4
$ . foo.sh 
Pattern is 3 lines long, and might start at any of these lines:
2 6 9 11
match at line 2
match at line 6
match at line 11
The pattern was found 3 times

edited Jan 26 at 0:04

answered Jan 25 at 23:30

Jim L.

1313

add a comment |

Your post doesn't mention any requirement for regular expression support, so I'm going to assume that you will be searching for fixed, literal text strings.

It will be tediously slow on large files, especially those which contain numerous occurrences of the first line of the pattern.

#!/usr/bin/env bash

# What's the name of the list file?
LIST=list

# What's the name of the pattern file?
PATTERN=pattern

# We'll figure out how many times the pattern lines appear (consecutively) in the list.

# Where's your SHA256 tool?
SHA256=/sbin/sha256

# what's the first line of pattern?
PATTERN_START="$(head -1 $PATTERN)"

# where in the list does that single line appear (what line numbers?)
START_LINES="$(grep -nx "$PATTERN_START" $LIST | sed -e 's/:.*//')"

# how many lines long is the pattern?
PAT_LEN="$(grep -c ^ < $PATTERN)"

echo Pattern is $PAT_LEN lines long, and might start at any of these lines:
echo $START_LINES

PAT_HASH="$($SHA256 < "$PATTERN")"

# So how many times does $PATTERN appear consecutively in $LIST?
PAT_COUNT=0

for LINE in $START_LINES
do
 HASH="$(tail +$LINE $LIST | head -$PAT_LEN | $SHA256 -q)"
 if [ "$HASH" = "$PAT_HASH" ]
 then
 echo match at line $LINE
 PAT_COUNT=$(($PAT_COUNT+1))
 fi
done

echo The pattern was found $PAT_COUNT times

The output:

$ cat list
3
2
5
4
8
2
5
4
2
4
2
5
4
$ cat pattern
2
5
4
$ . foo.sh 
Pattern is 3 lines long, and might start at any of these lines:
2 6 9 11
match at line 2
match at line 6
match at line 11
The pattern was found 3 times

edited Jan 26 at 0:04

answered Jan 25 at 23:30

Jim L.

1313

Your post doesn't mention any requirement for regular expression support, so I'm going to assume that you will be searching for fixed, literal text strings.

It will be tediously slow on large files, especially those which contain numerous occurrences of the first line of the pattern.

#!/usr/bin/env bash

# What's the name of the list file?
LIST=list

# What's the name of the pattern file?
PATTERN=pattern

# We'll figure out how many times the pattern lines appear (consecutively) in the list.

# Where's your SHA256 tool?
SHA256=/sbin/sha256

# what's the first line of pattern?
PATTERN_START="$(head -1 $PATTERN)"

# where in the list does that single line appear (what line numbers?)
START_LINES="$(grep -nx "$PATTERN_START" $LIST | sed -e 's/:.*//')"

# how many lines long is the pattern?
PAT_LEN="$(grep -c ^ < $PATTERN)"

echo Pattern is $PAT_LEN lines long, and might start at any of these lines:
echo $START_LINES

PAT_HASH="$($SHA256 < "$PATTERN")"

# So how many times does $PATTERN appear consecutively in $LIST?
PAT_COUNT=0

for LINE in $START_LINES
do
 HASH="$(tail +$LINE $LIST | head -$PAT_LEN | $SHA256 -q)"
 if [ "$HASH" = "$PAT_HASH" ]
 then
 echo match at line $LINE
 PAT_COUNT=$(($PAT_COUNT+1))
 fi
done

echo The pattern was found $PAT_COUNT times

The output:

$ cat list
3
2
5
4
8
2
5
4
2
4
2
5
4
$ cat pattern
2
5
4
$ . foo.sh 
Pattern is 3 lines long, and might start at any of these lines:
2 6 9 11
match at line 2
match at line 6
match at line 11
The pattern was found 3 times

edited Jan 26 at 0:04

answered Jan 25 at 23:30

Jim L.

1313

edited Jan 26 at 0:04

answered Jan 25 at 23:30

Jim L.

1313

answered Jan 25 at 23:30

Jim L.

1313

answered Jan 25 at 23:30

Jim L.

1313

add a comment |

mpc() tail -n $line_count)
 awk -v RS='' -v FPAT="$multiline_pattern" 'print NF' "$3"


# count how many times multiline-pattern defined by lines 2 to 4 (inclusive) occurs
mpc 2 4 input_file

Requirement:

The second argument must be at least equal to or greater than the first argument. I make no guarantee to the output if you violate that.

Disclaimer:

This doesn't work if characters and/or $ appear in any of the lines included as a pattern. awk struggles to process those characters as parts of a pattern even if they're backslash-escaped.

edited Jan 26 at 2:56

answered Jan 26 at 2:21

Niko Gambt

1836

add a comment |

mpc() tail -n $line_count)
 awk -v RS='' -v FPAT="$multiline_pattern" 'print NF' "$3"


# count how many times multiline-pattern defined by lines 2 to 4 (inclusive) occurs
mpc 2 4 input_file

Requirement:

The second argument must be at least equal to or greater than the first argument. I make no guarantee to the output if you violate that.

Disclaimer:

This doesn't work if characters and/or $ appear in any of the lines included as a pattern. awk struggles to process those characters as parts of a pattern even if they're backslash-escaped.

edited Jan 26 at 2:56

answered Jan 26 at 2:21

Niko Gambt

1836

add a comment |

mpc() tail -n $line_count)
 awk -v RS='' -v FPAT="$multiline_pattern" 'print NF' "$3"


# count how many times multiline-pattern defined by lines 2 to 4 (inclusive) occurs
mpc 2 4 input_file

Requirement:

The second argument must be at least equal to or greater than the first argument. I make no guarantee to the output if you violate that.

Disclaimer:

This doesn't work if characters and/or $ appear in any of the lines included as a pattern. awk struggles to process those characters as parts of a pattern even if they're backslash-escaped.

edited Jan 26 at 2:56

answered Jan 26 at 2:21

Niko Gambt

1836

mpc() tail -n $line_count)
 awk -v RS='' -v FPAT="$multiline_pattern" 'print NF' "$3"


# count how many times multiline-pattern defined by lines 2 to 4 (inclusive) occurs
mpc 2 4 input_file

Requirement:

The second argument must be at least equal to or greater than the first argument. I make no guarantee to the output if you violate that.

Disclaimer:

This doesn't work if characters and/or $ appear in any of the lines included as a pattern. awk struggles to process those characters as parts of a pattern even if they're backslash-escaped.

edited Jan 26 at 2:56

answered Jan 26 at 2:21

Niko Gambt

1836

edited Jan 26 at 2:56

answered Jan 26 at 2:21

Niko Gambt

1836

answered Jan 26 at 2:21

Niko Gambt

1836

answered Jan 26 at 2:21

Niko Gambt

1836

add a comment |

How about

a="2 5 4"; tr 'n' ' ' < test | grep -o "[^0-9]$a[^0-9]" | wc -l

With the separator of your choice....

You need the regex to prevent a match in the event of .... 22 5 44 ... or similar

edited Jan 26 at 15:02

answered Jan 26 at 14:55

bu5hman

1,282214

add a comment |

How about

a="2 5 4"; tr 'n' ' ' < test | grep -o "[^0-9]$a[^0-9]" | wc -l

With the separator of your choice....

You need the regex to prevent a match in the event of .... 22 5 44 ... or similar

edited Jan 26 at 15:02

answered Jan 26 at 14:55

bu5hman

1,282214

add a comment |

How about

a="2 5 4"; tr 'n' ' ' < test | grep -o "[^0-9]$a[^0-9]" | wc -l

With the separator of your choice....

You need the regex to prevent a match in the event of .... 22 5 44 ... or similar

edited Jan 26 at 15:02

answered Jan 26 at 14:55

bu5hman

1,282214

How about

a="2 5 4"; tr 'n' ' ' < test | grep -o "[^0-9]$a[^0-9]" | wc -l

With the separator of your choice....

You need the regex to prevent a match in the event of .... 22 5 44 ... or similar

edited Jan 26 at 15:02

answered Jan 26 at 14:55

bu5hman

1,282214

edited Jan 26 at 15:02

answered Jan 26 at 14:55

bu5hman

1,282214

answered Jan 26 at 14:55

bu5hman

1,282214

answered Jan 26 at 14:55

bu5hman

1,282214

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu

Count multi-line patterns in file

5 Answers
5

Explanation

Pattern from specific lines instead

Explanation

Your Answer

Post as a guest

5 Answers
5

5 Answers
5

Explanation

Pattern from specific lines instead

Explanation

Explanation

Pattern from specific lines instead

Explanation

Explanation

Pattern from specific lines instead

Explanation

Explanation

Pattern from specific lines instead

Explanation

Post as a guest

Popular posts from this blog

How to check contact read email or not when send email to Individual?

How many registers does an x86_64 CPU actually have?

Running qemu-guest-agent on windows server 2008

Count multi-line patterns in file

5 Answers 5

Explanation

Pattern from specific lines instead

Explanation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

5 Answers 5

5 Answers 5

Explanation

Pattern from specific lines instead

Explanation

Explanation

Pattern from specific lines instead

Explanation

Explanation

Pattern from specific lines instead

Explanation

Explanation

Pattern from specific lines instead

Explanation

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

How to check contact read email or not when send email to Individual?

How many registers does an x86_64 CPU actually have?

Running qemu-guest-agent on windows server 2008

5 Answers
5

5 Answers
5

5 Answers
5