Is there a way to limit the range of the range pattern in awk?
Clash Royale CLAN TAG#URR8PPP
I am trying to use the awk
range pattern to find all SQL select
statements in a group of files, inspired by this stackoverflow answer.
From the awk
manual:
The
pattern1, pattern2
form of an expression is called a range pattern. It matches all input records starting with a record that matchespattern1
, and continuing until a record that matchespattern2
, inclusive.
My initial attempt was
awk '/select/,/from/' *
where *
in this case just represents a large number of varied files.
This returned several false hits on HTML select
tags, so I refined my command to
awk '/[^<]select[^>]/,/from/' *
which seems to have eliminated most of those hits.
However, I still get some false hits from occurrences of the word "select" in a comment, and those hits produce very many lines of noise each before they eventually hit a "from" or the end of the file.
What I would like is for the range pattern not to register a match if there are more than, say, 10 lines between the "select" and the "from".
My question is: Can I make the range pattern fail to match if the number of lines between the match of pattern1
and the match of pattern2
exceeds a given threshold, and if so, how?
awk
add a comment |
I am trying to use the awk
range pattern to find all SQL select
statements in a group of files, inspired by this stackoverflow answer.
From the awk
manual:
The
pattern1, pattern2
form of an expression is called a range pattern. It matches all input records starting with a record that matchespattern1
, and continuing until a record that matchespattern2
, inclusive.
My initial attempt was
awk '/select/,/from/' *
where *
in this case just represents a large number of varied files.
This returned several false hits on HTML select
tags, so I refined my command to
awk '/[^<]select[^>]/,/from/' *
which seems to have eliminated most of those hits.
However, I still get some false hits from occurrences of the word "select" in a comment, and those hits produce very many lines of noise each before they eventually hit a "from" or the end of the file.
What I would like is for the range pattern not to register a match if there are more than, say, 10 lines between the "select" and the "from".
My question is: Can I make the range pattern fail to match if the number of lines between the match of pattern1
and the match of pattern2
exceeds a given threshold, and if so, how?
awk
To address your "false hits for 'select' in comments" issue you'd have to refine your pattern. I suppose 'select' is at the beginning of the line (with optional whitespace)? - Then a pattern range/^[[:space:]]*select/,/from/
should help. (You probably need a refinement like that for the/from/
part as well.)
– Janis
Apr 30 '15 at 12:04
@Janis: I would, except I want to match all theselect
statements in all the files given, and there is no guarantee that they all conform to such conventions. It is highly possible that a few statements start after some other non-SQL code on a line.
– Anders Rabo Thorbeck
Apr 30 '15 at 14:13
But then there would be some syntactical delimiter (like a semicolon), wouldn't it? - So I'd expect you could likely match something like/(^|;)[[:space:]]*select/,/from/
then.
– Janis
Apr 30 '15 at 17:23
add a comment |
I am trying to use the awk
range pattern to find all SQL select
statements in a group of files, inspired by this stackoverflow answer.
From the awk
manual:
The
pattern1, pattern2
form of an expression is called a range pattern. It matches all input records starting with a record that matchespattern1
, and continuing until a record that matchespattern2
, inclusive.
My initial attempt was
awk '/select/,/from/' *
where *
in this case just represents a large number of varied files.
This returned several false hits on HTML select
tags, so I refined my command to
awk '/[^<]select[^>]/,/from/' *
which seems to have eliminated most of those hits.
However, I still get some false hits from occurrences of the word "select" in a comment, and those hits produce very many lines of noise each before they eventually hit a "from" or the end of the file.
What I would like is for the range pattern not to register a match if there are more than, say, 10 lines between the "select" and the "from".
My question is: Can I make the range pattern fail to match if the number of lines between the match of pattern1
and the match of pattern2
exceeds a given threshold, and if so, how?
awk
I am trying to use the awk
range pattern to find all SQL select
statements in a group of files, inspired by this stackoverflow answer.
From the awk
manual:
The
pattern1, pattern2
form of an expression is called a range pattern. It matches all input records starting with a record that matchespattern1
, and continuing until a record that matchespattern2
, inclusive.
My initial attempt was
awk '/select/,/from/' *
where *
in this case just represents a large number of varied files.
This returned several false hits on HTML select
tags, so I refined my command to
awk '/[^<]select[^>]/,/from/' *
which seems to have eliminated most of those hits.
However, I still get some false hits from occurrences of the word "select" in a comment, and those hits produce very many lines of noise each before they eventually hit a "from" or the end of the file.
What I would like is for the range pattern not to register a match if there are more than, say, 10 lines between the "select" and the "from".
My question is: Can I make the range pattern fail to match if the number of lines between the match of pattern1
and the match of pattern2
exceeds a given threshold, and if so, how?
awk
awk
edited May 23 '17 at 12:40
Community♦
1
1
asked Apr 30 '15 at 11:08
Anders Rabo ThorbeckAnders Rabo Thorbeck
1062
1062
To address your "false hits for 'select' in comments" issue you'd have to refine your pattern. I suppose 'select' is at the beginning of the line (with optional whitespace)? - Then a pattern range/^[[:space:]]*select/,/from/
should help. (You probably need a refinement like that for the/from/
part as well.)
– Janis
Apr 30 '15 at 12:04
@Janis: I would, except I want to match all theselect
statements in all the files given, and there is no guarantee that they all conform to such conventions. It is highly possible that a few statements start after some other non-SQL code on a line.
– Anders Rabo Thorbeck
Apr 30 '15 at 14:13
But then there would be some syntactical delimiter (like a semicolon), wouldn't it? - So I'd expect you could likely match something like/(^|;)[[:space:]]*select/,/from/
then.
– Janis
Apr 30 '15 at 17:23
add a comment |
To address your "false hits for 'select' in comments" issue you'd have to refine your pattern. I suppose 'select' is at the beginning of the line (with optional whitespace)? - Then a pattern range/^[[:space:]]*select/,/from/
should help. (You probably need a refinement like that for the/from/
part as well.)
– Janis
Apr 30 '15 at 12:04
@Janis: I would, except I want to match all theselect
statements in all the files given, and there is no guarantee that they all conform to such conventions. It is highly possible that a few statements start after some other non-SQL code on a line.
– Anders Rabo Thorbeck
Apr 30 '15 at 14:13
But then there would be some syntactical delimiter (like a semicolon), wouldn't it? - So I'd expect you could likely match something like/(^|;)[[:space:]]*select/,/from/
then.
– Janis
Apr 30 '15 at 17:23
To address your "false hits for 'select' in comments" issue you'd have to refine your pattern. I suppose 'select' is at the beginning of the line (with optional whitespace)? - Then a pattern range
/^[[:space:]]*select/,/from/
should help. (You probably need a refinement like that for the /from/
part as well.)– Janis
Apr 30 '15 at 12:04
To address your "false hits for 'select' in comments" issue you'd have to refine your pattern. I suppose 'select' is at the beginning of the line (with optional whitespace)? - Then a pattern range
/^[[:space:]]*select/,/from/
should help. (You probably need a refinement like that for the /from/
part as well.)– Janis
Apr 30 '15 at 12:04
@Janis: I would, except I want to match all the
select
statements in all the files given, and there is no guarantee that they all conform to such conventions. It is highly possible that a few statements start after some other non-SQL code on a line.– Anders Rabo Thorbeck
Apr 30 '15 at 14:13
@Janis: I would, except I want to match all the
select
statements in all the files given, and there is no guarantee that they all conform to such conventions. It is highly possible that a few statements start after some other non-SQL code on a line.– Anders Rabo Thorbeck
Apr 30 '15 at 14:13
But then there would be some syntactical delimiter (like a semicolon), wouldn't it? - So I'd expect you could likely match something like
/(^|;)[[:space:]]*select/,/from/
then.– Janis
Apr 30 '15 at 17:23
But then there would be some syntactical delimiter (like a semicolon), wouldn't it? - So I'd expect you could likely match something like
/(^|;)[[:space:]]*select/,/from/
then.– Janis
Apr 30 '15 at 17:23
add a comment |
2 Answers
2
active
oldest
votes
You can expand the /pattern1/,/pattern2/
condition as much as you want, by adding a block after to be performed when this occurs:
See for example how we print those numbers being between 50 and 70, but just the first 5 matches of each block:
$ seq 200 | awk '/50/,/70/ if ($0~/50/) c=0; if (c++ <= 5) print'
50
51
52
53
54
55
150
151
152
153
154
155
In your case, you may want to say something like this, that will print the first 10 lines that were matched.
awk '/[^<]select[^>]/,/from/ if (c++ <= 10) print' *
A more complex solution would consist in storing all this output and then printing it at the END
block. This way, you can control the block itself instead of just a specific line. I would do this storing the data in an array, etc.
1
More simplyseq 200 | awk '/50/c=0/50/,/70/ || c++==5'
– Costas
Apr 30 '15 at 11:58
Cool, it works! But I don't understand how this|| c++==5
works and to "what" it is doing the "or" operation. Could you expand it?
– fedorqui
Apr 30 '15 at 12:07
1
c++==...
is equalc==... ; c=c+1
./70/
is equal$0 ~ /70/
. So after,
we will haveif
pattern$0 ~ /70/ OR c == 5
withthen add 1 to c
– Costas
Apr 30 '15 at 12:27
@fedorqui: Thank you, this is useful, but not exactly what I'm looking for. For a limit of 10, this will print 10 lines of noise if no match is found forpattern2
, as opposed to no lines printed which I would prefer. I am a novice atawk
, so there is probably a way to buffer the lines from the match ofpattern1
until either reaching a match ofpattern2
and outputting them all, or exceeding the line threshold and discarding all the lines of that failed match. However, I do not yet know enoughawk
to express that.
– Anders Rabo Thorbeck
Apr 30 '15 at 14:07
@Costas: Thank you for the improvement to @fedorqui's suggestion. I don't quite understand what is happening in thepattern1
of your range pattern, i.e./50/c=0/50/
. Why is/50/
specified twice? Is it valid to write an action in between two regular expression patterns?
– Anders Rabo Thorbeck
Apr 30 '15 at 14:11
|
show 1 more comment
Range patterns are useful but not flexible. Instead of using them, maintain the between-or-not state in a variable. The awk script /select/,/from/
is equivalent to
/select/ printing = 1
printing print
/from/ printing = 0
If you want to limit the range to a number of lines, maintain a counter of lines seen and accumulate the output until you've decided whether to display it.
/select/ select_text = $0; select_line_count = 1;
select_line_count select_text = select_text "n" $0
/from/ if (select_line_count <= 10) print select_text; print
select_line_count = 0
You'll probably want to refine the pattern, for example to require that select
is at the beginning of the line except for whitespace, and is followed by whitespace: /^[t ]*select($|[t ])/
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f199600%2fis-there-a-way-to-limit-the-range-of-the-range-pattern-in-awk%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can expand the /pattern1/,/pattern2/
condition as much as you want, by adding a block after to be performed when this occurs:
See for example how we print those numbers being between 50 and 70, but just the first 5 matches of each block:
$ seq 200 | awk '/50/,/70/ if ($0~/50/) c=0; if (c++ <= 5) print'
50
51
52
53
54
55
150
151
152
153
154
155
In your case, you may want to say something like this, that will print the first 10 lines that were matched.
awk '/[^<]select[^>]/,/from/ if (c++ <= 10) print' *
A more complex solution would consist in storing all this output and then printing it at the END
block. This way, you can control the block itself instead of just a specific line. I would do this storing the data in an array, etc.
1
More simplyseq 200 | awk '/50/c=0/50/,/70/ || c++==5'
– Costas
Apr 30 '15 at 11:58
Cool, it works! But I don't understand how this|| c++==5
works and to "what" it is doing the "or" operation. Could you expand it?
– fedorqui
Apr 30 '15 at 12:07
1
c++==...
is equalc==... ; c=c+1
./70/
is equal$0 ~ /70/
. So after,
we will haveif
pattern$0 ~ /70/ OR c == 5
withthen add 1 to c
– Costas
Apr 30 '15 at 12:27
@fedorqui: Thank you, this is useful, but not exactly what I'm looking for. For a limit of 10, this will print 10 lines of noise if no match is found forpattern2
, as opposed to no lines printed which I would prefer. I am a novice atawk
, so there is probably a way to buffer the lines from the match ofpattern1
until either reaching a match ofpattern2
and outputting them all, or exceeding the line threshold and discarding all the lines of that failed match. However, I do not yet know enoughawk
to express that.
– Anders Rabo Thorbeck
Apr 30 '15 at 14:07
@Costas: Thank you for the improvement to @fedorqui's suggestion. I don't quite understand what is happening in thepattern1
of your range pattern, i.e./50/c=0/50/
. Why is/50/
specified twice? Is it valid to write an action in between two regular expression patterns?
– Anders Rabo Thorbeck
Apr 30 '15 at 14:11
|
show 1 more comment
You can expand the /pattern1/,/pattern2/
condition as much as you want, by adding a block after to be performed when this occurs:
See for example how we print those numbers being between 50 and 70, but just the first 5 matches of each block:
$ seq 200 | awk '/50/,/70/ if ($0~/50/) c=0; if (c++ <= 5) print'
50
51
52
53
54
55
150
151
152
153
154
155
In your case, you may want to say something like this, that will print the first 10 lines that were matched.
awk '/[^<]select[^>]/,/from/ if (c++ <= 10) print' *
A more complex solution would consist in storing all this output and then printing it at the END
block. This way, you can control the block itself instead of just a specific line. I would do this storing the data in an array, etc.
1
More simplyseq 200 | awk '/50/c=0/50/,/70/ || c++==5'
– Costas
Apr 30 '15 at 11:58
Cool, it works! But I don't understand how this|| c++==5
works and to "what" it is doing the "or" operation. Could you expand it?
– fedorqui
Apr 30 '15 at 12:07
1
c++==...
is equalc==... ; c=c+1
./70/
is equal$0 ~ /70/
. So after,
we will haveif
pattern$0 ~ /70/ OR c == 5
withthen add 1 to c
– Costas
Apr 30 '15 at 12:27
@fedorqui: Thank you, this is useful, but not exactly what I'm looking for. For a limit of 10, this will print 10 lines of noise if no match is found forpattern2
, as opposed to no lines printed which I would prefer. I am a novice atawk
, so there is probably a way to buffer the lines from the match ofpattern1
until either reaching a match ofpattern2
and outputting them all, or exceeding the line threshold and discarding all the lines of that failed match. However, I do not yet know enoughawk
to express that.
– Anders Rabo Thorbeck
Apr 30 '15 at 14:07
@Costas: Thank you for the improvement to @fedorqui's suggestion. I don't quite understand what is happening in thepattern1
of your range pattern, i.e./50/c=0/50/
. Why is/50/
specified twice? Is it valid to write an action in between two regular expression patterns?
– Anders Rabo Thorbeck
Apr 30 '15 at 14:11
|
show 1 more comment
You can expand the /pattern1/,/pattern2/
condition as much as you want, by adding a block after to be performed when this occurs:
See for example how we print those numbers being between 50 and 70, but just the first 5 matches of each block:
$ seq 200 | awk '/50/,/70/ if ($0~/50/) c=0; if (c++ <= 5) print'
50
51
52
53
54
55
150
151
152
153
154
155
In your case, you may want to say something like this, that will print the first 10 lines that were matched.
awk '/[^<]select[^>]/,/from/ if (c++ <= 10) print' *
A more complex solution would consist in storing all this output and then printing it at the END
block. This way, you can control the block itself instead of just a specific line. I would do this storing the data in an array, etc.
You can expand the /pattern1/,/pattern2/
condition as much as you want, by adding a block after to be performed when this occurs:
See for example how we print those numbers being between 50 and 70, but just the first 5 matches of each block:
$ seq 200 | awk '/50/,/70/ if ($0~/50/) c=0; if (c++ <= 5) print'
50
51
52
53
54
55
150
151
152
153
154
155
In your case, you may want to say something like this, that will print the first 10 lines that were matched.
awk '/[^<]select[^>]/,/from/ if (c++ <= 10) print' *
A more complex solution would consist in storing all this output and then printing it at the END
block. This way, you can control the block itself instead of just a specific line. I would do this storing the data in an array, etc.
answered Apr 30 '15 at 11:32


fedorquifedorqui
4,14722056
4,14722056
1
More simplyseq 200 | awk '/50/c=0/50/,/70/ || c++==5'
– Costas
Apr 30 '15 at 11:58
Cool, it works! But I don't understand how this|| c++==5
works and to "what" it is doing the "or" operation. Could you expand it?
– fedorqui
Apr 30 '15 at 12:07
1
c++==...
is equalc==... ; c=c+1
./70/
is equal$0 ~ /70/
. So after,
we will haveif
pattern$0 ~ /70/ OR c == 5
withthen add 1 to c
– Costas
Apr 30 '15 at 12:27
@fedorqui: Thank you, this is useful, but not exactly what I'm looking for. For a limit of 10, this will print 10 lines of noise if no match is found forpattern2
, as opposed to no lines printed which I would prefer. I am a novice atawk
, so there is probably a way to buffer the lines from the match ofpattern1
until either reaching a match ofpattern2
and outputting them all, or exceeding the line threshold and discarding all the lines of that failed match. However, I do not yet know enoughawk
to express that.
– Anders Rabo Thorbeck
Apr 30 '15 at 14:07
@Costas: Thank you for the improvement to @fedorqui's suggestion. I don't quite understand what is happening in thepattern1
of your range pattern, i.e./50/c=0/50/
. Why is/50/
specified twice? Is it valid to write an action in between two regular expression patterns?
– Anders Rabo Thorbeck
Apr 30 '15 at 14:11
|
show 1 more comment
1
More simplyseq 200 | awk '/50/c=0/50/,/70/ || c++==5'
– Costas
Apr 30 '15 at 11:58
Cool, it works! But I don't understand how this|| c++==5
works and to "what" it is doing the "or" operation. Could you expand it?
– fedorqui
Apr 30 '15 at 12:07
1
c++==...
is equalc==... ; c=c+1
./70/
is equal$0 ~ /70/
. So after,
we will haveif
pattern$0 ~ /70/ OR c == 5
withthen add 1 to c
– Costas
Apr 30 '15 at 12:27
@fedorqui: Thank you, this is useful, but not exactly what I'm looking for. For a limit of 10, this will print 10 lines of noise if no match is found forpattern2
, as opposed to no lines printed which I would prefer. I am a novice atawk
, so there is probably a way to buffer the lines from the match ofpattern1
until either reaching a match ofpattern2
and outputting them all, or exceeding the line threshold and discarding all the lines of that failed match. However, I do not yet know enoughawk
to express that.
– Anders Rabo Thorbeck
Apr 30 '15 at 14:07
@Costas: Thank you for the improvement to @fedorqui's suggestion. I don't quite understand what is happening in thepattern1
of your range pattern, i.e./50/c=0/50/
. Why is/50/
specified twice? Is it valid to write an action in between two regular expression patterns?
– Anders Rabo Thorbeck
Apr 30 '15 at 14:11
1
1
More simply
seq 200 | awk '/50/c=0/50/,/70/ || c++==5'
– Costas
Apr 30 '15 at 11:58
More simply
seq 200 | awk '/50/c=0/50/,/70/ || c++==5'
– Costas
Apr 30 '15 at 11:58
Cool, it works! But I don't understand how this
|| c++==5
works and to "what" it is doing the "or" operation. Could you expand it?– fedorqui
Apr 30 '15 at 12:07
Cool, it works! But I don't understand how this
|| c++==5
works and to "what" it is doing the "or" operation. Could you expand it?– fedorqui
Apr 30 '15 at 12:07
1
1
c++==...
is equal c==... ; c=c+1
. /70/
is equal $0 ~ /70/
. So after ,
we will have if
pattern $0 ~ /70/ OR c == 5
with then add 1 to c
– Costas
Apr 30 '15 at 12:27
c++==...
is equal c==... ; c=c+1
. /70/
is equal $0 ~ /70/
. So after ,
we will have if
pattern $0 ~ /70/ OR c == 5
with then add 1 to c
– Costas
Apr 30 '15 at 12:27
@fedorqui: Thank you, this is useful, but not exactly what I'm looking for. For a limit of 10, this will print 10 lines of noise if no match is found for
pattern2
, as opposed to no lines printed which I would prefer. I am a novice at awk
, so there is probably a way to buffer the lines from the match of pattern1
until either reaching a match of pattern2
and outputting them all, or exceeding the line threshold and discarding all the lines of that failed match. However, I do not yet know enough awk
to express that.– Anders Rabo Thorbeck
Apr 30 '15 at 14:07
@fedorqui: Thank you, this is useful, but not exactly what I'm looking for. For a limit of 10, this will print 10 lines of noise if no match is found for
pattern2
, as opposed to no lines printed which I would prefer. I am a novice at awk
, so there is probably a way to buffer the lines from the match of pattern1
until either reaching a match of pattern2
and outputting them all, or exceeding the line threshold and discarding all the lines of that failed match. However, I do not yet know enough awk
to express that.– Anders Rabo Thorbeck
Apr 30 '15 at 14:07
@Costas: Thank you for the improvement to @fedorqui's suggestion. I don't quite understand what is happening in the
pattern1
of your range pattern, i.e. /50/c=0/50/
. Why is /50/
specified twice? Is it valid to write an action in between two regular expression patterns?– Anders Rabo Thorbeck
Apr 30 '15 at 14:11
@Costas: Thank you for the improvement to @fedorqui's suggestion. I don't quite understand what is happening in the
pattern1
of your range pattern, i.e. /50/c=0/50/
. Why is /50/
specified twice? Is it valid to write an action in between two regular expression patterns?– Anders Rabo Thorbeck
Apr 30 '15 at 14:11
|
show 1 more comment
Range patterns are useful but not flexible. Instead of using them, maintain the between-or-not state in a variable. The awk script /select/,/from/
is equivalent to
/select/ printing = 1
printing print
/from/ printing = 0
If you want to limit the range to a number of lines, maintain a counter of lines seen and accumulate the output until you've decided whether to display it.
/select/ select_text = $0; select_line_count = 1;
select_line_count select_text = select_text "n" $0
/from/ if (select_line_count <= 10) print select_text; print
select_line_count = 0
You'll probably want to refine the pattern, for example to require that select
is at the beginning of the line except for whitespace, and is followed by whitespace: /^[t ]*select($|[t ])/
add a comment |
Range patterns are useful but not flexible. Instead of using them, maintain the between-or-not state in a variable. The awk script /select/,/from/
is equivalent to
/select/ printing = 1
printing print
/from/ printing = 0
If you want to limit the range to a number of lines, maintain a counter of lines seen and accumulate the output until you've decided whether to display it.
/select/ select_text = $0; select_line_count = 1;
select_line_count select_text = select_text "n" $0
/from/ if (select_line_count <= 10) print select_text; print
select_line_count = 0
You'll probably want to refine the pattern, for example to require that select
is at the beginning of the line except for whitespace, and is followed by whitespace: /^[t ]*select($|[t ])/
add a comment |
Range patterns are useful but not flexible. Instead of using them, maintain the between-or-not state in a variable. The awk script /select/,/from/
is equivalent to
/select/ printing = 1
printing print
/from/ printing = 0
If you want to limit the range to a number of lines, maintain a counter of lines seen and accumulate the output until you've decided whether to display it.
/select/ select_text = $0; select_line_count = 1;
select_line_count select_text = select_text "n" $0
/from/ if (select_line_count <= 10) print select_text; print
select_line_count = 0
You'll probably want to refine the pattern, for example to require that select
is at the beginning of the line except for whitespace, and is followed by whitespace: /^[t ]*select($|[t ])/
Range patterns are useful but not flexible. Instead of using them, maintain the between-or-not state in a variable. The awk script /select/,/from/
is equivalent to
/select/ printing = 1
printing print
/from/ printing = 0
If you want to limit the range to a number of lines, maintain a counter of lines seen and accumulate the output until you've decided whether to display it.
/select/ select_text = $0; select_line_count = 1;
select_line_count select_text = select_text "n" $0
/from/ if (select_line_count <= 10) print select_text; print
select_line_count = 0
You'll probably want to refine the pattern, for example to require that select
is at the beginning of the line except for whitespace, and is followed by whitespace: /^[t ]*select($|[t ])/
answered Apr 30 '15 at 23:06


GillesGilles
539k12810901606
539k12810901606
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f199600%2fis-there-a-way-to-limit-the-range-of-the-range-pattern-in-awk%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
To address your "false hits for 'select' in comments" issue you'd have to refine your pattern. I suppose 'select' is at the beginning of the line (with optional whitespace)? - Then a pattern range
/^[[:space:]]*select/,/from/
should help. (You probably need a refinement like that for the/from/
part as well.)– Janis
Apr 30 '15 at 12:04
@Janis: I would, except I want to match all the
select
statements in all the files given, and there is no guarantee that they all conform to such conventions. It is highly possible that a few statements start after some other non-SQL code on a line.– Anders Rabo Thorbeck
Apr 30 '15 at 14:13
But then there would be some syntactical delimiter (like a semicolon), wouldn't it? - So I'd expect you could likely match something like
/(^|;)[[:space:]]*select/,/from/
then.– Janis
Apr 30 '15 at 17:23