How shall I perform multiline matching and substitution using awk?

Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
In a text file, ignoring any trailing whitespace at the end of each line, I assume that if a line is not ended with a digit, then there is a line break between the line and the next line, and I would like to find these line breaks and then concatenate them into one line. For example
line 1
li
ne 2
There is a line break between the second and the third lines and I should modify the file to be
line 1
line 2
To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:
$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile
To concatenate two lines separated by a line break, I am still wondering.
Thanks.
text-processing awk gawk
add a comment |
up vote
1
down vote
favorite
In a text file, ignoring any trailing whitespace at the end of each line, I assume that if a line is not ended with a digit, then there is a line break between the line and the next line, and I would like to find these line breaks and then concatenate them into one line. For example
line 1
li
ne 2
There is a line break between the second and the third lines and I should modify the file to be
line 1
line 2
To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:
$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile
To concatenate two lines separated by a line break, I am still wondering.
Thanks.
text-processing awk gawk
setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using thematch()function, if you're not using its return value or theRSTARTorRLENGTHvariables.
– mosvy
yesterday
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
In a text file, ignoring any trailing whitespace at the end of each line, I assume that if a line is not ended with a digit, then there is a line break between the line and the next line, and I would like to find these line breaks and then concatenate them into one line. For example
line 1
li
ne 2
There is a line break between the second and the third lines and I should modify the file to be
line 1
line 2
To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:
$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile
To concatenate two lines separated by a line break, I am still wondering.
Thanks.
text-processing awk gawk
In a text file, ignoring any trailing whitespace at the end of each line, I assume that if a line is not ended with a digit, then there is a line break between the line and the next line, and I would like to find these line breaks and then concatenate them into one line. For example
line 1
li
ne 2
There is a line break between the second and the third lines and I should modify the file to be
line 1
line 2
To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:
$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile
To concatenate two lines separated by a line break, I am still wondering.
Thanks.
text-processing awk gawk
text-processing awk gawk
asked yesterday
Tim
24.8k70239434
24.8k70239434
setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using thematch()function, if you're not using its return value or theRSTARTorRLENGTHvariables.
– mosvy
yesterday
add a comment |
setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using thematch()function, if you're not using its return value or theRSTARTorRLENGTHvariables.
– mosvy
yesterday
setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (
ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.– mosvy
yesterday
setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (
ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.– mosvy
yesterday
add a comment |
4 Answers
4
active
oldest
votes
up vote
1
down vote
accepted
You could run something along the lines of
awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex
RS=SUBSEPsets the Register Separator to a value that is never present in a text file (slurps the input file to$0)- then do you favorite multiline transformation
Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday
I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday
IsRS="f"also a working solution?
– Tim
23 hours ago
1
This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago
1
@JJoao In general, print non-record data withprintfand records withprint. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to useprintf.
– Kusalananda
12 hours ago
|
show 6 more comments
up vote
4
down vote
I would address it differently: by looping over the input until you find a "line-ending condition":
awk '
line=$0;
while($0 !~ /[[:digit:]] *$/ && getline > 0)
line=line$0;
print line
' < input
On an extended input file of:
line 1
li
ne 2
li
ne
number 3
line 4
Or, more verbosely (to see the trailing space):
$ cat -e input
line 1$
li$
ne 2$
li$
ne $
number 3$
line 4$
The output is:
line 1
line 2
line number 3
line 4
Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday
What "multilne patterns" are you thinking of?
– RudiC
yesterday
add a comment |
up vote
2
down vote
$ cat file
line 1
li
ne 2
lo
ng li
ne 3
$ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
line 1
line 2
long line 3
This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).
Approximate sed equivalent (but with an explicit loop):
$ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
line 1
line 2
long line 3
add a comment |
up vote
0
down vote
Small GNU sed?
sed ':L; /[0-9] *$/!N; bL;; s/n//g' file
doesn't work for me?
– andrew lorien
23 hours ago
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You could run something along the lines of
awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex
RS=SUBSEPsets the Register Separator to a value that is never present in a text file (slurps the input file to$0)- then do you favorite multiline transformation
Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday
I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday
IsRS="f"also a working solution?
– Tim
23 hours ago
1
This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago
1
@JJoao In general, print non-record data withprintfand records withprint. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to useprintf.
– Kusalananda
12 hours ago
|
show 6 more comments
up vote
1
down vote
accepted
You could run something along the lines of
awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex
RS=SUBSEPsets the Register Separator to a value that is never present in a text file (slurps the input file to$0)- then do you favorite multiline transformation
Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday
I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday
IsRS="f"also a working solution?
– Tim
23 hours ago
1
This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago
1
@JJoao In general, print non-record data withprintfand records withprint. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to useprintf.
– Kusalananda
12 hours ago
|
show 6 more comments
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You could run something along the lines of
awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex
RS=SUBSEPsets the Register Separator to a value that is never present in a text file (slurps the input file to$0)- then do you favorite multiline transformation
You could run something along the lines of
awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex
RS=SUBSEPsets the Register Separator to a value that is never present in a text file (slurps the input file to$0)- then do you favorite multiline transformation
edited 12 hours ago
answered yesterday
JJoao
6,9441826
6,9441826
Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday
I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday
IsRS="f"also a working solution?
– Tim
23 hours ago
1
This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago
1
@JJoao In general, print non-record data withprintfand records withprint. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to useprintf.
– Kusalananda
12 hours ago
|
show 6 more comments
Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday
I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday
IsRS="f"also a working solution?
– Tim
23 hours ago
1
This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago
1
@JJoao In general, print non-record data withprintfand records withprint. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to useprintf.
– Kusalananda
12 hours ago
Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday
Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday
I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday
I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday
Is
RS="f" also a working solution?– Tim
23 hours ago
Is
RS="f" also a working solution?– Tim
23 hours ago
1
1
This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago
This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago
1
1
@JJoao In general, print non-record data with
printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.– Kusalananda
12 hours ago
@JJoao In general, print non-record data with
printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.– Kusalananda
12 hours ago
|
show 6 more comments
up vote
4
down vote
I would address it differently: by looping over the input until you find a "line-ending condition":
awk '
line=$0;
while($0 !~ /[[:digit:]] *$/ && getline > 0)
line=line$0;
print line
' < input
On an extended input file of:
line 1
li
ne 2
li
ne
number 3
line 4
Or, more verbosely (to see the trailing space):
$ cat -e input
line 1$
li$
ne 2$
li$
ne $
number 3$
line 4$
The output is:
line 1
line 2
line number 3
line 4
Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday
What "multilne patterns" are you thinking of?
– RudiC
yesterday
add a comment |
up vote
4
down vote
I would address it differently: by looping over the input until you find a "line-ending condition":
awk '
line=$0;
while($0 !~ /[[:digit:]] *$/ && getline > 0)
line=line$0;
print line
' < input
On an extended input file of:
line 1
li
ne 2
li
ne
number 3
line 4
Or, more verbosely (to see the trailing space):
$ cat -e input
line 1$
li$
ne 2$
li$
ne $
number 3$
line 4$
The output is:
line 1
line 2
line number 3
line 4
Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday
What "multilne patterns" are you thinking of?
– RudiC
yesterday
add a comment |
up vote
4
down vote
up vote
4
down vote
I would address it differently: by looping over the input until you find a "line-ending condition":
awk '
line=$0;
while($0 !~ /[[:digit:]] *$/ && getline > 0)
line=line$0;
print line
' < input
On an extended input file of:
line 1
li
ne 2
li
ne
number 3
line 4
Or, more verbosely (to see the trailing space):
$ cat -e input
line 1$
li$
ne 2$
li$
ne $
number 3$
line 4$
The output is:
line 1
line 2
line number 3
line 4
I would address it differently: by looping over the input until you find a "line-ending condition":
awk '
line=$0;
while($0 !~ /[[:digit:]] *$/ && getline > 0)
line=line$0;
print line
' < input
On an extended input file of:
line 1
li
ne 2
li
ne
number 3
line 4
Or, more verbosely (to see the trailing space):
$ cat -e input
line 1$
li$
ne 2$
li$
ne $
number 3$
line 4$
The output is:
line 1
line 2
line number 3
line 4
edited yesterday
qubert
5666
5666
answered yesterday
Jeff Schaller
35.8k952119
35.8k952119
Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday
What "multilne patterns" are you thinking of?
– RudiC
yesterday
add a comment |
Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday
What "multilne patterns" are you thinking of?
– RudiC
yesterday
Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday
Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday
What "multilne patterns" are you thinking of?
– RudiC
yesterday
What "multilne patterns" are you thinking of?
– RudiC
yesterday
add a comment |
up vote
2
down vote
$ cat file
line 1
li
ne 2
lo
ng li
ne 3
$ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
line 1
line 2
long line 3
This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).
Approximate sed equivalent (but with an explicit loop):
$ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
line 1
line 2
long line 3
add a comment |
up vote
2
down vote
$ cat file
line 1
li
ne 2
lo
ng li
ne 3
$ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
line 1
line 2
long line 3
This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).
Approximate sed equivalent (but with an explicit loop):
$ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
line 1
line 2
long line 3
add a comment |
up vote
2
down vote
up vote
2
down vote
$ cat file
line 1
li
ne 2
lo
ng li
ne 3
$ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
line 1
line 2
long line 3
This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).
Approximate sed equivalent (but with an explicit loop):
$ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
line 1
line 2
long line 3
$ cat file
line 1
li
ne 2
lo
ng li
ne 3
$ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
line 1
line 2
long line 3
This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).
Approximate sed equivalent (but with an explicit loop):
$ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
line 1
line 2
long line 3
answered 23 hours ago
Kusalananda
115k15218349
115k15218349
add a comment |
add a comment |
up vote
0
down vote
Small GNU sed?
sed ':L; /[0-9] *$/!N; bL;; s/n//g' file
doesn't work for me?
– andrew lorien
23 hours ago
add a comment |
up vote
0
down vote
Small GNU sed?
sed ':L; /[0-9] *$/!N; bL;; s/n//g' file
doesn't work for me?
– andrew lorien
23 hours ago
add a comment |
up vote
0
down vote
up vote
0
down vote
Small GNU sed?
sed ':L; /[0-9] *$/!N; bL;; s/n//g' file
Small GNU sed?
sed ':L; /[0-9] *$/!N; bL;; s/n//g' file
edited 23 hours ago
Kusalananda
115k15218349
115k15218349
answered yesterday
RudiC
2,9311211
2,9311211
doesn't work for me?
– andrew lorien
23 hours ago
add a comment |
doesn't work for me?
– andrew lorien
23 hours ago
doesn't work for me?
– andrew lorien
23 hours ago
doesn't work for me?
– andrew lorien
23 hours ago
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481498%2fhow-shall-i-perform-multiline-matching-and-substitution-using-awk%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (
ORS). Also, there's absolutely no point in using thematch()function, if you're not using its return value or theRSTARTorRLENGTHvariables.– mosvy
yesterday