Count multi-line patterns in file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












2















I am looking for a way to search for a multi line pattern across a file.



For example, say this list of numbers was my input file:



3
2
5
4
8
2
5
4
2
4
2
5
4


If I wanted to search for instances of lines 2-4 (inclusive), I would like the result to be:



3


Since that is the amount of times those particular lines are exactly repeated. I would also like this to work with any given amount of lines, as well as any given line number range in the file.










share|improve this question
























  • If it's inclusive then only the value in line 3 is repeated three times. The values in lines 2 and 4 are repeated four times.

    – Nasir Riley
    Jan 25 at 23:10







  • 1





    @NasirRiley I think they are asking for a multi-line grep, i.e. 2n5n4

    – Sparhawk
    Jan 25 at 23:16











  • I really can't tell what OP is looking for. Is it possible to reword it in simpler terms?

    – Jesse_b
    Jan 25 at 23:17











  • What @Sparhawk said is correct - I am looking for something like a multi line grep.

    – ToasterFrogs
    Jan 25 at 23:29






  • 2





    Is the input to this script "lines 2 through 4" or is it "the sequence of numbers 2,5,4"?

    – Jeff Schaller
    Jan 25 at 23:46















2















I am looking for a way to search for a multi line pattern across a file.



For example, say this list of numbers was my input file:



3
2
5
4
8
2
5
4
2
4
2
5
4


If I wanted to search for instances of lines 2-4 (inclusive), I would like the result to be:



3


Since that is the amount of times those particular lines are exactly repeated. I would also like this to work with any given amount of lines, as well as any given line number range in the file.










share|improve this question
























  • If it's inclusive then only the value in line 3 is repeated three times. The values in lines 2 and 4 are repeated four times.

    – Nasir Riley
    Jan 25 at 23:10







  • 1





    @NasirRiley I think they are asking for a multi-line grep, i.e. 2n5n4

    – Sparhawk
    Jan 25 at 23:16











  • I really can't tell what OP is looking for. Is it possible to reword it in simpler terms?

    – Jesse_b
    Jan 25 at 23:17











  • What @Sparhawk said is correct - I am looking for something like a multi line grep.

    – ToasterFrogs
    Jan 25 at 23:29






  • 2





    Is the input to this script "lines 2 through 4" or is it "the sequence of numbers 2,5,4"?

    – Jeff Schaller
    Jan 25 at 23:46













2












2








2


1






I am looking for a way to search for a multi line pattern across a file.



For example, say this list of numbers was my input file:



3
2
5
4
8
2
5
4
2
4
2
5
4


If I wanted to search for instances of lines 2-4 (inclusive), I would like the result to be:



3


Since that is the amount of times those particular lines are exactly repeated. I would also like this to work with any given amount of lines, as well as any given line number range in the file.










share|improve this question
















I am looking for a way to search for a multi line pattern across a file.



For example, say this list of numbers was my input file:



3
2
5
4
8
2
5
4
2
4
2
5
4


If I wanted to search for instances of lines 2-4 (inclusive), I would like the result to be:



3


Since that is the amount of times those particular lines are exactly repeated. I would also like this to work with any given amount of lines, as well as any given line number range in the file.







bash text-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 26 at 0:15









Sparhawk

9,69764094




9,69764094










asked Jan 25 at 22:56









ToasterFrogsToasterFrogs

443




443












  • If it's inclusive then only the value in line 3 is repeated three times. The values in lines 2 and 4 are repeated four times.

    – Nasir Riley
    Jan 25 at 23:10







  • 1





    @NasirRiley I think they are asking for a multi-line grep, i.e. 2n5n4

    – Sparhawk
    Jan 25 at 23:16











  • I really can't tell what OP is looking for. Is it possible to reword it in simpler terms?

    – Jesse_b
    Jan 25 at 23:17











  • What @Sparhawk said is correct - I am looking for something like a multi line grep.

    – ToasterFrogs
    Jan 25 at 23:29






  • 2





    Is the input to this script "lines 2 through 4" or is it "the sequence of numbers 2,5,4"?

    – Jeff Schaller
    Jan 25 at 23:46

















  • If it's inclusive then only the value in line 3 is repeated three times. The values in lines 2 and 4 are repeated four times.

    – Nasir Riley
    Jan 25 at 23:10







  • 1





    @NasirRiley I think they are asking for a multi-line grep, i.e. 2n5n4

    – Sparhawk
    Jan 25 at 23:16











  • I really can't tell what OP is looking for. Is it possible to reword it in simpler terms?

    – Jesse_b
    Jan 25 at 23:17











  • What @Sparhawk said is correct - I am looking for something like a multi line grep.

    – ToasterFrogs
    Jan 25 at 23:29






  • 2





    Is the input to this script "lines 2 through 4" or is it "the sequence of numbers 2,5,4"?

    – Jeff Schaller
    Jan 25 at 23:46
















If it's inclusive then only the value in line 3 is repeated three times. The values in lines 2 and 4 are repeated four times.

– Nasir Riley
Jan 25 at 23:10






If it's inclusive then only the value in line 3 is repeated three times. The values in lines 2 and 4 are repeated four times.

– Nasir Riley
Jan 25 at 23:10





1




1





@NasirRiley I think they are asking for a multi-line grep, i.e. 2n5n4

– Sparhawk
Jan 25 at 23:16





@NasirRiley I think they are asking for a multi-line grep, i.e. 2n5n4

– Sparhawk
Jan 25 at 23:16













I really can't tell what OP is looking for. Is it possible to reword it in simpler terms?

– Jesse_b
Jan 25 at 23:17





I really can't tell what OP is looking for. Is it possible to reword it in simpler terms?

– Jesse_b
Jan 25 at 23:17













What @Sparhawk said is correct - I am looking for something like a multi line grep.

– ToasterFrogs
Jan 25 at 23:29





What @Sparhawk said is correct - I am looking for something like a multi line grep.

– ToasterFrogs
Jan 25 at 23:29




2




2





Is the input to this script "lines 2 through 4" or is it "the sequence of numbers 2,5,4"?

– Jeff Schaller
Jan 25 at 23:46





Is the input to this script "lines 2 through 4" or is it "the sequence of numbers 2,5,4"?

– Jeff Schaller
Jan 25 at 23:46










5 Answers
5






active

oldest

votes


















2














You could use pcregrep, which is available in most distros. The following command matches a fixed string.



pcregrep -Mc '^2n5n4$' input.txt


Explanation



From the man page, pcregrep is "a grep with Perl-compatible regular expressions."




  • -M: match the regex over multiple lines


  • -c: output the number of matches (count), instead of the matches themselves


  • ^2n5n4$: regex for 2, 5, 4, each on a separate line.

Pattern from specific lines instead



Later comments in the question suggest that the pattern to be matched is not a fixed string, but instead a general "lines 2 through 4". Here, you can use command substitution to parse the lines from the input file instead.



pcregrep -Mc "^Q$(sed -n 2,4p input.txt)E$" input.txt


Explanation




  • tail -n+2 input.txt: output the file, from line 2 inclusive


  • head -n3: only output the first three lines


  • Q...E: quote the ... part for a basic string matching as opposed to regexp matching (assumes the output of the command doesn't contain E).

Note that it assumes the last lines of the output of sed ... input.txt are not empty as command substitution ($(...)) strips all trailing newline characters.






share|improve this answer




















  • 2





    sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

    – glenn jackman
    Jan 26 at 14:43












  • Thanks @glennjackman. Good point.

    – Sparhawk
    Jan 26 at 22:31



















1














$ perl -l -0777pe '$_=()=/^2n5n4$/mg' input_file
3


Working:




  • -0777 => slurp mode, meaning read the whole file in.


  • -p => before reading the next record, print the current record, $_ to stdout.


  • -l => set the RS = ORS = "n"

  • the regex /^2n5n4$/mg is implicitly applied on the $_, which in our case is the whole file remember. the /m regex modifier shall match the line endings and beginnings too apart from string beginnings and string endings. /g modifier will get all the matches in the $_ aka the whole file.

  • We do this in the list-context, and assign it to an empty list. The $_ thus gets re-assigned with the number of elements in the list, which is the number of times the regex matched really.

HTH






share|improve this answer























  • Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

    – glenn jackman
    Jan 26 at 14:40











  • Thanks @glenn jackman, for providing the generalization.

    – Rakesh Sharma
    Jan 27 at 3:23


















0














Your post doesn't mention any requirement for regular expression support, so I'm going to assume that you will be searching for fixed, literal text strings.



This probably isn't the fastest algorithm you've ever seen, but it works, if you have enough time. It has the slight defect that if there are more than one N-line patterns that begin with the same first line and have the same SHA256 hash, it will give incorrect results. It assumes that all possible N-line patterns will have unique SHA256 hashes.



It will be tediously slow on large files, especially those which contain numerous occurrences of the first line of the pattern.



#!/usr/bin/env bash

# What's the name of the list file?
LIST=list

# What's the name of the pattern file?
PATTERN=pattern

# We'll figure out how many times the pattern lines appear (consecutively) in the list.

# Where's your SHA256 tool?
SHA256=/sbin/sha256

# what's the first line of pattern?
PATTERN_START="$(head -1 $PATTERN)"

# where in the list does that single line appear (what line numbers?)
START_LINES="$(grep -nx "$PATTERN_START" $LIST | sed -e 's/:.*//')"

# how many lines long is the pattern?
PAT_LEN="$(grep -c ^ < $PATTERN)"

echo Pattern is $PAT_LEN lines long, and might start at any of these lines:
echo $START_LINES

PAT_HASH="$($SHA256 < "$PATTERN")"

# So how many times does $PATTERN appear consecutively in $LIST?
PAT_COUNT=0

for LINE in $START_LINES
do
HASH="$(tail +$LINE $LIST | head -$PAT_LEN | $SHA256 -q)"
if [ "$HASH" = "$PAT_HASH" ]
then
echo match at line $LINE
PAT_COUNT=$(($PAT_COUNT+1))
fi
done

echo The pattern was found $PAT_COUNT times


The output:



$ cat list
3
2
5
4
8
2
5
4
2
4
2
5
4
$ cat pattern
2
5
4
$ . foo.sh
Pattern is 3 lines long, and might start at any of these lines:
2 6 9 11
match at line 2
match at line 6
match at line 11
The pattern was found 3 times





share|improve this answer
































    0














    mpc() tail -n $line_count)
    awk -v RS='' -v FPAT="$multiline_pattern" 'print NF' "$3"


    # count how many times multiline-pattern defined by lines 2 to 4 (inclusive) occurs
    mpc 2 4 input_file


    Requirement:



    The second argument must be at least equal to or greater than the first argument. I make no guarantee to the output if you violate that.



    Disclaimer:



    This doesn't work if characters and/or $ appear in any of the lines included as a pattern. awk struggles to process those characters as parts of a pattern even if they're backslash-escaped.






    share|improve this answer
































      0














      How about



      a="2 5 4"; tr 'n' ' ' < test | grep -o "[^0-9]$a[^0-9]" | wc -l


      With the separator of your choice....



      You need the regex to prevent a match in the event of .... 22 5 44 ... or similar






      share|improve this answer
























        Your Answer








        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "106"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f496773%2fcount-multi-line-patterns-in-file%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        5 Answers
        5






        active

        oldest

        votes








        5 Answers
        5






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        2














        You could use pcregrep, which is available in most distros. The following command matches a fixed string.



        pcregrep -Mc '^2n5n4$' input.txt


        Explanation



        From the man page, pcregrep is "a grep with Perl-compatible regular expressions."




        • -M: match the regex over multiple lines


        • -c: output the number of matches (count), instead of the matches themselves


        • ^2n5n4$: regex for 2, 5, 4, each on a separate line.

        Pattern from specific lines instead



        Later comments in the question suggest that the pattern to be matched is not a fixed string, but instead a general "lines 2 through 4". Here, you can use command substitution to parse the lines from the input file instead.



        pcregrep -Mc "^Q$(sed -n 2,4p input.txt)E$" input.txt


        Explanation




        • tail -n+2 input.txt: output the file, from line 2 inclusive


        • head -n3: only output the first three lines


        • Q...E: quote the ... part for a basic string matching as opposed to regexp matching (assumes the output of the command doesn't contain E).

        Note that it assumes the last lines of the output of sed ... input.txt are not empty as command substitution ($(...)) strips all trailing newline characters.






        share|improve this answer




















        • 2





          sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

          – glenn jackman
          Jan 26 at 14:43












        • Thanks @glennjackman. Good point.

          – Sparhawk
          Jan 26 at 22:31
















        2














        You could use pcregrep, which is available in most distros. The following command matches a fixed string.



        pcregrep -Mc '^2n5n4$' input.txt


        Explanation



        From the man page, pcregrep is "a grep with Perl-compatible regular expressions."




        • -M: match the regex over multiple lines


        • -c: output the number of matches (count), instead of the matches themselves


        • ^2n5n4$: regex for 2, 5, 4, each on a separate line.

        Pattern from specific lines instead



        Later comments in the question suggest that the pattern to be matched is not a fixed string, but instead a general "lines 2 through 4". Here, you can use command substitution to parse the lines from the input file instead.



        pcregrep -Mc "^Q$(sed -n 2,4p input.txt)E$" input.txt


        Explanation




        • tail -n+2 input.txt: output the file, from line 2 inclusive


        • head -n3: only output the first three lines


        • Q...E: quote the ... part for a basic string matching as opposed to regexp matching (assumes the output of the command doesn't contain E).

        Note that it assumes the last lines of the output of sed ... input.txt are not empty as command substitution ($(...)) strips all trailing newline characters.






        share|improve this answer




















        • 2





          sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

          – glenn jackman
          Jan 26 at 14:43












        • Thanks @glennjackman. Good point.

          – Sparhawk
          Jan 26 at 22:31














        2












        2








        2







        You could use pcregrep, which is available in most distros. The following command matches a fixed string.



        pcregrep -Mc '^2n5n4$' input.txt


        Explanation



        From the man page, pcregrep is "a grep with Perl-compatible regular expressions."




        • -M: match the regex over multiple lines


        • -c: output the number of matches (count), instead of the matches themselves


        • ^2n5n4$: regex for 2, 5, 4, each on a separate line.

        Pattern from specific lines instead



        Later comments in the question suggest that the pattern to be matched is not a fixed string, but instead a general "lines 2 through 4". Here, you can use command substitution to parse the lines from the input file instead.



        pcregrep -Mc "^Q$(sed -n 2,4p input.txt)E$" input.txt


        Explanation




        • tail -n+2 input.txt: output the file, from line 2 inclusive


        • head -n3: only output the first three lines


        • Q...E: quote the ... part for a basic string matching as opposed to regexp matching (assumes the output of the command doesn't contain E).

        Note that it assumes the last lines of the output of sed ... input.txt are not empty as command substitution ($(...)) strips all trailing newline characters.






        share|improve this answer















        You could use pcregrep, which is available in most distros. The following command matches a fixed string.



        pcregrep -Mc '^2n5n4$' input.txt


        Explanation



        From the man page, pcregrep is "a grep with Perl-compatible regular expressions."




        • -M: match the regex over multiple lines


        • -c: output the number of matches (count), instead of the matches themselves


        • ^2n5n4$: regex for 2, 5, 4, each on a separate line.

        Pattern from specific lines instead



        Later comments in the question suggest that the pattern to be matched is not a fixed string, but instead a general "lines 2 through 4". Here, you can use command substitution to parse the lines from the input file instead.



        pcregrep -Mc "^Q$(sed -n 2,4p input.txt)E$" input.txt


        Explanation




        • tail -n+2 input.txt: output the file, from line 2 inclusive


        • head -n3: only output the first three lines


        • Q...E: quote the ... part for a basic string matching as opposed to regexp matching (assumes the output of the command doesn't contain E).

        Note that it assumes the last lines of the output of sed ... input.txt are not empty as command substitution ($(...)) strips all trailing newline characters.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jan 26 at 22:31

























        answered Jan 26 at 0:05









        SparhawkSparhawk

        9,69764094




        9,69764094







        • 2





          sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

          – glenn jackman
          Jan 26 at 14:43












        • Thanks @glennjackman. Good point.

          – Sparhawk
          Jan 26 at 22:31













        • 2





          sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

          – glenn jackman
          Jan 26 at 14:43












        • Thanks @glennjackman. Good point.

          – Sparhawk
          Jan 26 at 22:31








        2




        2





        sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

        – glenn jackman
        Jan 26 at 14:43






        sed -n 2,4p input.txt is I think more clear than the tail|head pipeline, and simpler to plug in the start and end line numbers.

        – glenn jackman
        Jan 26 at 14:43














        Thanks @glennjackman. Good point.

        – Sparhawk
        Jan 26 at 22:31






        Thanks @glennjackman. Good point.

        – Sparhawk
        Jan 26 at 22:31














        1














        $ perl -l -0777pe '$_=()=/^2n5n4$/mg' input_file
        3


        Working:




        • -0777 => slurp mode, meaning read the whole file in.


        • -p => before reading the next record, print the current record, $_ to stdout.


        • -l => set the RS = ORS = "n"

        • the regex /^2n5n4$/mg is implicitly applied on the $_, which in our case is the whole file remember. the /m regex modifier shall match the line endings and beginnings too apart from string beginnings and string endings. /g modifier will get all the matches in the $_ aka the whole file.

        • We do this in the list-context, and assign it to an empty list. The $_ thus gets re-assigned with the number of elements in the list, which is the number of times the regex matched really.

        HTH






        share|improve this answer























        • Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

          – glenn jackman
          Jan 26 at 14:40











        • Thanks @glenn jackman, for providing the generalization.

          – Rakesh Sharma
          Jan 27 at 3:23















        1














        $ perl -l -0777pe '$_=()=/^2n5n4$/mg' input_file
        3


        Working:




        • -0777 => slurp mode, meaning read the whole file in.


        • -p => before reading the next record, print the current record, $_ to stdout.


        • -l => set the RS = ORS = "n"

        • the regex /^2n5n4$/mg is implicitly applied on the $_, which in our case is the whole file remember. the /m regex modifier shall match the line endings and beginnings too apart from string beginnings and string endings. /g modifier will get all the matches in the $_ aka the whole file.

        • We do this in the list-context, and assign it to an empty list. The $_ thus gets re-assigned with the number of elements in the list, which is the number of times the regex matched really.

        HTH






        share|improve this answer























        • Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

          – glenn jackman
          Jan 26 at 14:40











        • Thanks @glenn jackman, for providing the generalization.

          – Rakesh Sharma
          Jan 27 at 3:23













        1












        1








        1







        $ perl -l -0777pe '$_=()=/^2n5n4$/mg' input_file
        3


        Working:




        • -0777 => slurp mode, meaning read the whole file in.


        • -p => before reading the next record, print the current record, $_ to stdout.


        • -l => set the RS = ORS = "n"

        • the regex /^2n5n4$/mg is implicitly applied on the $_, which in our case is the whole file remember. the /m regex modifier shall match the line endings and beginnings too apart from string beginnings and string endings. /g modifier will get all the matches in the $_ aka the whole file.

        • We do this in the list-context, and assign it to an empty list. The $_ thus gets re-assigned with the number of elements in the list, which is the number of times the regex matched really.

        HTH






        share|improve this answer













        $ perl -l -0777pe '$_=()=/^2n5n4$/mg' input_file
        3


        Working:




        • -0777 => slurp mode, meaning read the whole file in.


        • -p => before reading the next record, print the current record, $_ to stdout.


        • -l => set the RS = ORS = "n"

        • the regex /^2n5n4$/mg is implicitly applied on the $_, which in our case is the whole file remember. the /m regex modifier shall match the line endings and beginnings too apart from string beginnings and string endings. /g modifier will get all the matches in the $_ aka the whole file.

        • We do this in the list-context, and assign it to an empty list. The $_ thus gets re-assigned with the number of elements in the list, which is the number of times the regex matched really.

        HTH







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 26 at 11:34









        Rakesh SharmaRakesh Sharma

        332




        332












        • Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

          – glenn jackman
          Jan 26 at 14:40











        • Thanks @glenn jackman, for providing the generalization.

          – Rakesh Sharma
          Jan 27 at 3:23

















        • Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

          – glenn jackman
          Jan 26 at 14:40











        • Thanks @glenn jackman, for providing the generalization.

          – Rakesh Sharma
          Jan 27 at 3:23
















        Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

        – glenn jackman
        Jan 26 at 14:40





        Without hardcoding the pattern, you can pass the start and end lines of the file and extract it within the perl code: perl -s -l -0777pe '$p = join "n", (split /n/)[$start-1 .. $end-1]; $_ = ()=/^$p$/mg' -- -start=2 -end=4 input_file

        – glenn jackman
        Jan 26 at 14:40













        Thanks @glenn jackman, for providing the generalization.

        – Rakesh Sharma
        Jan 27 at 3:23





        Thanks @glenn jackman, for providing the generalization.

        – Rakesh Sharma
        Jan 27 at 3:23











        0














        Your post doesn't mention any requirement for regular expression support, so I'm going to assume that you will be searching for fixed, literal text strings.



        This probably isn't the fastest algorithm you've ever seen, but it works, if you have enough time. It has the slight defect that if there are more than one N-line patterns that begin with the same first line and have the same SHA256 hash, it will give incorrect results. It assumes that all possible N-line patterns will have unique SHA256 hashes.



        It will be tediously slow on large files, especially those which contain numerous occurrences of the first line of the pattern.



        #!/usr/bin/env bash

        # What's the name of the list file?
        LIST=list

        # What's the name of the pattern file?
        PATTERN=pattern

        # We'll figure out how many times the pattern lines appear (consecutively) in the list.

        # Where's your SHA256 tool?
        SHA256=/sbin/sha256

        # what's the first line of pattern?
        PATTERN_START="$(head -1 $PATTERN)"

        # where in the list does that single line appear (what line numbers?)
        START_LINES="$(grep -nx "$PATTERN_START" $LIST | sed -e 's/:.*//')"

        # how many lines long is the pattern?
        PAT_LEN="$(grep -c ^ < $PATTERN)"

        echo Pattern is $PAT_LEN lines long, and might start at any of these lines:
        echo $START_LINES

        PAT_HASH="$($SHA256 < "$PATTERN")"

        # So how many times does $PATTERN appear consecutively in $LIST?
        PAT_COUNT=0

        for LINE in $START_LINES
        do
        HASH="$(tail +$LINE $LIST | head -$PAT_LEN | $SHA256 -q)"
        if [ "$HASH" = "$PAT_HASH" ]
        then
        echo match at line $LINE
        PAT_COUNT=$(($PAT_COUNT+1))
        fi
        done

        echo The pattern was found $PAT_COUNT times


        The output:



        $ cat list
        3
        2
        5
        4
        8
        2
        5
        4
        2
        4
        2
        5
        4
        $ cat pattern
        2
        5
        4
        $ . foo.sh
        Pattern is 3 lines long, and might start at any of these lines:
        2 6 9 11
        match at line 2
        match at line 6
        match at line 11
        The pattern was found 3 times





        share|improve this answer





























          0














          Your post doesn't mention any requirement for regular expression support, so I'm going to assume that you will be searching for fixed, literal text strings.



          This probably isn't the fastest algorithm you've ever seen, but it works, if you have enough time. It has the slight defect that if there are more than one N-line patterns that begin with the same first line and have the same SHA256 hash, it will give incorrect results. It assumes that all possible N-line patterns will have unique SHA256 hashes.



          It will be tediously slow on large files, especially those which contain numerous occurrences of the first line of the pattern.



          #!/usr/bin/env bash

          # What's the name of the list file?
          LIST=list

          # What's the name of the pattern file?
          PATTERN=pattern

          # We'll figure out how many times the pattern lines appear (consecutively) in the list.

          # Where's your SHA256 tool?
          SHA256=/sbin/sha256

          # what's the first line of pattern?
          PATTERN_START="$(head -1 $PATTERN)"

          # where in the list does that single line appear (what line numbers?)
          START_LINES="$(grep -nx "$PATTERN_START" $LIST | sed -e 's/:.*//')"

          # how many lines long is the pattern?
          PAT_LEN="$(grep -c ^ < $PATTERN)"

          echo Pattern is $PAT_LEN lines long, and might start at any of these lines:
          echo $START_LINES

          PAT_HASH="$($SHA256 < "$PATTERN")"

          # So how many times does $PATTERN appear consecutively in $LIST?
          PAT_COUNT=0

          for LINE in $START_LINES
          do
          HASH="$(tail +$LINE $LIST | head -$PAT_LEN | $SHA256 -q)"
          if [ "$HASH" = "$PAT_HASH" ]
          then
          echo match at line $LINE
          PAT_COUNT=$(($PAT_COUNT+1))
          fi
          done

          echo The pattern was found $PAT_COUNT times


          The output:



          $ cat list
          3
          2
          5
          4
          8
          2
          5
          4
          2
          4
          2
          5
          4
          $ cat pattern
          2
          5
          4
          $ . foo.sh
          Pattern is 3 lines long, and might start at any of these lines:
          2 6 9 11
          match at line 2
          match at line 6
          match at line 11
          The pattern was found 3 times





          share|improve this answer



























            0












            0








            0







            Your post doesn't mention any requirement for regular expression support, so I'm going to assume that you will be searching for fixed, literal text strings.



            This probably isn't the fastest algorithm you've ever seen, but it works, if you have enough time. It has the slight defect that if there are more than one N-line patterns that begin with the same first line and have the same SHA256 hash, it will give incorrect results. It assumes that all possible N-line patterns will have unique SHA256 hashes.



            It will be tediously slow on large files, especially those which contain numerous occurrences of the first line of the pattern.



            #!/usr/bin/env bash

            # What's the name of the list file?
            LIST=list

            # What's the name of the pattern file?
            PATTERN=pattern

            # We'll figure out how many times the pattern lines appear (consecutively) in the list.

            # Where's your SHA256 tool?
            SHA256=/sbin/sha256

            # what's the first line of pattern?
            PATTERN_START="$(head -1 $PATTERN)"

            # where in the list does that single line appear (what line numbers?)
            START_LINES="$(grep -nx "$PATTERN_START" $LIST | sed -e 's/:.*//')"

            # how many lines long is the pattern?
            PAT_LEN="$(grep -c ^ < $PATTERN)"

            echo Pattern is $PAT_LEN lines long, and might start at any of these lines:
            echo $START_LINES

            PAT_HASH="$($SHA256 < "$PATTERN")"

            # So how many times does $PATTERN appear consecutively in $LIST?
            PAT_COUNT=0

            for LINE in $START_LINES
            do
            HASH="$(tail +$LINE $LIST | head -$PAT_LEN | $SHA256 -q)"
            if [ "$HASH" = "$PAT_HASH" ]
            then
            echo match at line $LINE
            PAT_COUNT=$(($PAT_COUNT+1))
            fi
            done

            echo The pattern was found $PAT_COUNT times


            The output:



            $ cat list
            3
            2
            5
            4
            8
            2
            5
            4
            2
            4
            2
            5
            4
            $ cat pattern
            2
            5
            4
            $ . foo.sh
            Pattern is 3 lines long, and might start at any of these lines:
            2 6 9 11
            match at line 2
            match at line 6
            match at line 11
            The pattern was found 3 times





            share|improve this answer















            Your post doesn't mention any requirement for regular expression support, so I'm going to assume that you will be searching for fixed, literal text strings.



            This probably isn't the fastest algorithm you've ever seen, but it works, if you have enough time. It has the slight defect that if there are more than one N-line patterns that begin with the same first line and have the same SHA256 hash, it will give incorrect results. It assumes that all possible N-line patterns will have unique SHA256 hashes.



            It will be tediously slow on large files, especially those which contain numerous occurrences of the first line of the pattern.



            #!/usr/bin/env bash

            # What's the name of the list file?
            LIST=list

            # What's the name of the pattern file?
            PATTERN=pattern

            # We'll figure out how many times the pattern lines appear (consecutively) in the list.

            # Where's your SHA256 tool?
            SHA256=/sbin/sha256

            # what's the first line of pattern?
            PATTERN_START="$(head -1 $PATTERN)"

            # where in the list does that single line appear (what line numbers?)
            START_LINES="$(grep -nx "$PATTERN_START" $LIST | sed -e 's/:.*//')"

            # how many lines long is the pattern?
            PAT_LEN="$(grep -c ^ < $PATTERN)"

            echo Pattern is $PAT_LEN lines long, and might start at any of these lines:
            echo $START_LINES

            PAT_HASH="$($SHA256 < "$PATTERN")"

            # So how many times does $PATTERN appear consecutively in $LIST?
            PAT_COUNT=0

            for LINE in $START_LINES
            do
            HASH="$(tail +$LINE $LIST | head -$PAT_LEN | $SHA256 -q)"
            if [ "$HASH" = "$PAT_HASH" ]
            then
            echo match at line $LINE
            PAT_COUNT=$(($PAT_COUNT+1))
            fi
            done

            echo The pattern was found $PAT_COUNT times


            The output:



            $ cat list
            3
            2
            5
            4
            8
            2
            5
            4
            2
            4
            2
            5
            4
            $ cat pattern
            2
            5
            4
            $ . foo.sh
            Pattern is 3 lines long, and might start at any of these lines:
            2 6 9 11
            match at line 2
            match at line 6
            match at line 11
            The pattern was found 3 times






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jan 26 at 0:04

























            answered Jan 25 at 23:30









            Jim L.Jim L.

            1313




            1313





















                0














                mpc() tail -n $line_count)
                awk -v RS='' -v FPAT="$multiline_pattern" 'print NF' "$3"


                # count how many times multiline-pattern defined by lines 2 to 4 (inclusive) occurs
                mpc 2 4 input_file


                Requirement:



                The second argument must be at least equal to or greater than the first argument. I make no guarantee to the output if you violate that.



                Disclaimer:



                This doesn't work if characters and/or $ appear in any of the lines included as a pattern. awk struggles to process those characters as parts of a pattern even if they're backslash-escaped.






                share|improve this answer





























                  0














                  mpc() tail -n $line_count)
                  awk -v RS='' -v FPAT="$multiline_pattern" 'print NF' "$3"


                  # count how many times multiline-pattern defined by lines 2 to 4 (inclusive) occurs
                  mpc 2 4 input_file


                  Requirement:



                  The second argument must be at least equal to or greater than the first argument. I make no guarantee to the output if you violate that.



                  Disclaimer:



                  This doesn't work if characters and/or $ appear in any of the lines included as a pattern. awk struggles to process those characters as parts of a pattern even if they're backslash-escaped.






                  share|improve this answer



























                    0












                    0








                    0







                    mpc() tail -n $line_count)
                    awk -v RS='' -v FPAT="$multiline_pattern" 'print NF' "$3"


                    # count how many times multiline-pattern defined by lines 2 to 4 (inclusive) occurs
                    mpc 2 4 input_file


                    Requirement:



                    The second argument must be at least equal to or greater than the first argument. I make no guarantee to the output if you violate that.



                    Disclaimer:



                    This doesn't work if characters and/or $ appear in any of the lines included as a pattern. awk struggles to process those characters as parts of a pattern even if they're backslash-escaped.






                    share|improve this answer















                    mpc() tail -n $line_count)
                    awk -v RS='' -v FPAT="$multiline_pattern" 'print NF' "$3"


                    # count how many times multiline-pattern defined by lines 2 to 4 (inclusive) occurs
                    mpc 2 4 input_file


                    Requirement:



                    The second argument must be at least equal to or greater than the first argument. I make no guarantee to the output if you violate that.



                    Disclaimer:



                    This doesn't work if characters and/or $ appear in any of the lines included as a pattern. awk struggles to process those characters as parts of a pattern even if they're backslash-escaped.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Jan 26 at 2:56

























                    answered Jan 26 at 2:21









                    Niko GambtNiko Gambt

                    1836




                    1836





















                        0














                        How about



                        a="2 5 4"; tr 'n' ' ' < test | grep -o "[^0-9]$a[^0-9]" | wc -l


                        With the separator of your choice....



                        You need the regex to prevent a match in the event of .... 22 5 44 ... or similar






                        share|improve this answer





























                          0














                          How about



                          a="2 5 4"; tr 'n' ' ' < test | grep -o "[^0-9]$a[^0-9]" | wc -l


                          With the separator of your choice....



                          You need the regex to prevent a match in the event of .... 22 5 44 ... or similar






                          share|improve this answer



























                            0












                            0








                            0







                            How about



                            a="2 5 4"; tr 'n' ' ' < test | grep -o "[^0-9]$a[^0-9]" | wc -l


                            With the separator of your choice....



                            You need the regex to prevent a match in the event of .... 22 5 44 ... or similar






                            share|improve this answer















                            How about



                            a="2 5 4"; tr 'n' ' ' < test | grep -o "[^0-9]$a[^0-9]" | wc -l


                            With the separator of your choice....



                            You need the regex to prevent a match in the event of .... 22 5 44 ... or similar







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Jan 26 at 15:02

























                            answered Jan 26 at 14:55









                            bu5hmanbu5hman

                            1,282214




                            1,282214



























                                draft saved

                                draft discarded
















































                                Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f496773%2fcount-multi-line-patterns-in-file%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown






                                Popular posts from this blog

                                How to check contact read email or not when send email to Individual?

                                Displaying single band from multi-band raster using QGIS

                                How many registers does an x86_64 CPU actually have?