How shall I perform multiline matching and substitution using awk?

up vote
1
down vote

favorite

In a text file, ignoring any trailing whitespace at the end of each line, I assume that if a line is not ended with a digit, then there is a line break between the line and the next line, and I would like to find these line breaks and then concatenate them into one line. For example

line 1
li
ne 2

There is a line break between the second and the third lines and I should modify the file to be

line 1
line 2

To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:

$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile

To concatenate two lines separated by a line break, I am still wondering.

Thanks.

asked yesterday

Tim

24.8k70239434

setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.
– mosvy
yesterday

add a comment |

up vote
1
down vote

favorite

line 1
li
ne 2

There is a line break between the second and the third lines and I should modify the file to be

line 1
line 2

To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:

$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile

To concatenate two lines separated by a line break, I am still wondering.

Thanks.

asked yesterday

Tim

24.8k70239434

setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.
– mosvy
yesterday

add a comment |

up vote
1
down vote

favorite

line 1
li
ne 2

There is a line break between the second and the third lines and I should modify the file to be

line 1
line 2

To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:

$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile

To concatenate two lines separated by a line break, I am still wondering.

Thanks.

asked yesterday

Tim

24.8k70239434

line 1
li
ne 2

There is a line break between the second and the third lines and I should modify the file to be

line 1
line 2

To find such line breaks, I need to do multiline matching. I does it by changing record separator, but the following doesn't work:

$ awk 'BEGINRS="";; if (match($0, /[^[:digit:] ] *n/)) print $0; ' inputfile

To concatenate two lines separated by a line break, I am still wondering.

Thanks.

text-processing awk gawk

asked yesterday

Tim

24.8k70239434

asked yesterday

Tim

24.8k70239434

asked yesterday

Tim

24.8k70239434

asked yesterday

Tim

24.8k70239434

asked yesterday

Tim

24.8k70239434

setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.
– mosvy
yesterday

add a comment |

setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.
– mosvy
yesterday

setting RS to the empty string will turn on paragraph mode (records will be separated by runs of empty lines), not 'multiline matching' which is always on in awk. It's no wonder your script doesn't work, because it will just treat the whole file as a single record and print it, terminated by an extra newline (ORS). Also, there's absolutely no point in using the match() function, if you're not using its return value or the RSTART or RLENGTH variables.
– mosvy
yesterday

add a comment |

4 Answers
4

active

oldest

votes

up vote
1
down vote

accepted

You could run something along the lines of

awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex

RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

then do you favorite multiline transformation

edited 12 hours ago

answered yesterday

JJoao

6,9441826

Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday

Is RS="f" also a working solution?
– Tim
23 hours ago

1

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago

1

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
– Kusalananda
12 hours ago

|
show 6 more comments

up vote
4
down vote

I would address it differently: by looping over the input until you find a "line-ending condition":

awk ' 
 line=$0; 
 while($0 !~ /[[:digit:]] *$/ && getline > 0) 
 line=line$0; 
 
 print line
 ' < input

On an extended input file of:

line 1
li
ne 2
li
ne 
number 3
line 4

Or, more verbosely (to see the trailing space):

$ cat -e input
line 1$
li$
ne 2$
li$
ne $
number 3$
line 4$

The output is:

line 1
line 2
line number 3
line 4

edited yesterday

qubert

5666

answered yesterday

Jeff Schaller

35.8k952119

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday

What "multilne patterns" are you thinking of?
– RudiC
yesterday

add a comment |

up vote
2
down vote

$ cat file
line 1
li
ne 2
lo
ng li
ne 3

$ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
line 1
line 2
long line 3

This accumulates an "output line" in the variable line, and whenever this variable ends with a digit, it is printed and reset. It is also printed at the very end to output the last line (whether complete or not).

Approximate sed equivalent (but with an explicit loop):

$ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
line 1
line 2
long line 3

answered 23 hours ago

Kusalananda

115k15218349

add a comment |

up vote
0
down vote

Small GNU sed?

sed ':L; /[0-9] *$/!N; bL;; s/n//g' file

edited 23 hours ago

Kusalananda

115k15218349

answered yesterday

RudiC

2,9311211

doesn't work for me?
– andrew lorien
23 hours ago

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f481498%2fhow-shall-i-perform-multiline-matching-and-substitution-using-awk%23new-answer', 'question_page');

);

Post as a guest

Name

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
1
down vote

accepted

You could run something along the lines of

awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex

RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

then do you favorite multiline transformation

edited 12 hours ago

answered yesterday

JJoao

6,9441826

Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday

Is RS="f" also a working solution?
– Tim
23 hours ago

1

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago

1

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
– Kusalananda
12 hours ago

|
show 6 more comments

up vote
1
down vote

accepted

You could run something along the lines of

awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex

RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

then do you favorite multiline transformation

edited 12 hours ago

answered yesterday

JJoao

6,9441826

Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday

Is RS="f" also a working solution?
– Tim
23 hours ago

1

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago

1

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
– Kusalananda
12 hours ago

|
show 6 more comments

up vote
1
down vote

accepted

You could run something along the lines of

awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex

RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

then do you favorite multiline transformation

edited 12 hours ago

answered yesterday

JJoao

6,9441826

You could run something along the lines of

awk 'BEGINRS=SUBSEP; ORS="" print gensub(/([^0-9])n/,"\1","g",$0)' ex

RS=SUBSEP sets the Register Separator to a value that is never present in a text file (slurps the input file to $0)

then do you favorite multiline transformation

edited 12 hours ago

answered yesterday

JJoao

6,9441826

edited 12 hours ago

answered yesterday

JJoao

6,9441826

answered yesterday

JJoao

6,9441826

answered yesterday

JJoao

6,9441826

Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday

Is RS="f" also a working solution?
– Tim
23 hours ago

1

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago

1

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
– Kusalananda
12 hours ago

|
show 6 more comments

Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday

Is RS="f" also a working solution?
– Tim
23 hours ago

1

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago

1

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
– Kusalananda
12 hours ago

Thanks. Do you know matching without substitution for multiline case?
– Tim
yesterday

I was wondering if this reply doesn't work well sometimes? Why is this reply downvoted?
– Tim
yesterday

Is RS="f" also a working solution?
– Tim
23 hours ago

This seems to add an empty line at the end of the output. I'm not sure exactly why at the moment.
– Kusalananda
23 hours ago

@JJoao In general, print non-record data with printf and records with print. Since you're operating in "slurp mode" here (so to speak) and therefore do not really operate on records, it would be appropriate to use printf.
– Kusalananda
12 hours ago

|
show 6 more comments

up vote
4
down vote

I would address it differently: by looping over the input until you find a "line-ending condition":

awk ' 
 line=$0; 
 while($0 !~ /[[:digit:]] *$/ && getline > 0) 
 line=line$0; 
 
 print line
 ' < input

On an extended input file of:

line 1
li
ne 2
li
ne 
number 3
line 4

Or, more verbosely (to see the trailing space):

$ cat -e input
line 1$
li$
ne 2$
li$
ne $
number 3$
line 4$

The output is:

line 1
line 2
line number 3
line 4

edited yesterday

qubert

5666

answered yesterday

Jeff Schaller

35.8k952119

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday

What "multilne patterns" are you thinking of?
– RudiC
yesterday

add a comment |

up vote
4
down vote

I would address it differently: by looping over the input until you find a "line-ending condition":

awk ' 
 line=$0; 
 while($0 !~ /[[:digit:]] *$/ && getline > 0) 
 line=line$0; 
 
 print line
 ' < input

On an extended input file of:

line 1
li
ne 2
li
ne 
number 3
line 4

Or, more verbosely (to see the trailing space):

$ cat -e input
line 1$
li$
ne 2$
li$
ne $
number 3$
line 4$

The output is:

line 1
line 2
line number 3
line 4

edited yesterday

qubert

5666

answered yesterday

Jeff Schaller

35.8k952119

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday

What "multilne patterns" are you thinking of?
– RudiC
yesterday

add a comment |

up vote
4
down vote

I would address it differently: by looping over the input until you find a "line-ending condition":

awk ' 
 line=$0; 
 while($0 !~ /[[:digit:]] *$/ && getline > 0) 
 line=line$0; 
 
 print line
 ' < input

On an extended input file of:

line 1
li
ne 2
li
ne 
number 3
line 4

Or, more verbosely (to see the trailing space):

$ cat -e input
line 1$
li$
ne 2$
li$
ne $
number 3$
line 4$

The output is:

line 1
line 2
line number 3
line 4

edited yesterday

qubert

5666

answered yesterday

Jeff Schaller

35.8k952119

I would address it differently: by looping over the input until you find a "line-ending condition":

awk ' 
 line=$0; 
 while($0 !~ /[[:digit:]] *$/ && getline > 0) 
 line=line$0; 
 
 print line
 ' < input

On an extended input file of:

line 1
li
ne 2
li
ne 
number 3
line 4

Or, more verbosely (to see the trailing space):

$ cat -e input
line 1$
li$
ne 2$
li$
ne $
number 3$
line 4$

The output is:

line 1
line 2
line number 3
line 4

edited yesterday

qubert

5666

answered yesterday

Jeff Schaller

35.8k952119

edited yesterday

qubert

5666

edited yesterday

qubert

5666

edited yesterday

qubert

5666

answered yesterday

Jeff Schaller

35.8k952119

answered yesterday

Jeff Schaller

35.8k952119

answered yesterday

Jeff Schaller

35.8k952119

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday

What "multilne patterns" are you thinking of?
– RudiC
yesterday

add a comment |

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday

What "multilne patterns" are you thinking of?
– RudiC
yesterday

Thanks. The script in your reply is very specific to the problem. I would like to see if there is a more general script, which can allow me to specify a multiline pattern and match (and substitute) the matches.
– Tim
yesterday

What "multilne patterns" are you thinking of?
– RudiC
yesterday

add a comment |

up vote
2
down vote

$ cat file
line 1
li
ne 2
lo
ng li
ne 3

$ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
line 1
line 2
long line 3

Approximate sed equivalent (but with an explicit loop):

$ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
line 1
line 2
long line 3

answered 23 hours ago

Kusalananda

115k15218349

add a comment |

up vote
2
down vote

$ cat file
line 1
li
ne 2
lo
ng li
ne 3

$ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
line 1
line 2
long line 3

Approximate sed equivalent (but with an explicit loop):

$ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
line 1
line 2
long line 3

answered 23 hours ago

Kusalananda

115k15218349

add a comment |

up vote
2
down vote

$ cat file
line 1
li
ne 2
lo
ng li
ne 3

$ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
line 1
line 2
long line 3

Approximate sed equivalent (but with an explicit loop):

$ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
line 1
line 2
long line 3

answered 23 hours ago

Kusalananda

115k15218349

$ cat file
line 1
li
ne 2
lo
ng li
ne 3

$ awk 'line ~ /[0-9]$/ print line; line = "" line = line $0 END print line ' file
line 1
line 2
long line 3

Approximate sed equivalent (but with an explicit loop):

$ sed -e ':again' -e '/[0-9]$/ p; d; ; N; s/n//' -e 'tagain' file
line 1
line 2
long line 3

answered 23 hours ago

Kusalananda

115k15218349

answered 23 hours ago

Kusalananda

115k15218349

answered 23 hours ago

Kusalananda

115k15218349

answered 23 hours ago

Kusalananda

115k15218349

add a comment |

up vote
0
down vote

Small GNU sed?

sed ':L; /[0-9] *$/!N; bL;; s/n//g' file

edited 23 hours ago

Kusalananda

115k15218349

answered yesterday

RudiC

2,9311211

doesn't work for me?
– andrew lorien
23 hours ago

add a comment |

up vote
0
down vote

Small GNU sed?

sed ':L; /[0-9] *$/!N; bL;; s/n//g' file

edited 23 hours ago

Kusalananda

115k15218349

answered yesterday

RudiC

2,9311211

doesn't work for me?
– andrew lorien
23 hours ago

add a comment |

up vote
0
down vote

Small GNU sed?

sed ':L; /[0-9] *$/!N; bL;; s/n//g' file

edited 23 hours ago

Kusalananda

115k15218349

answered yesterday

RudiC

2,9311211

Small GNU sed?

sed ':L; /[0-9] *$/!N; bL;; s/n//g' file

edited 23 hours ago

Kusalananda

115k15218349

answered yesterday

RudiC

2,9311211

edited 23 hours ago

Kusalananda

115k15218349

edited 23 hours ago

Kusalananda

115k15218349

edited 23 hours ago

Kusalananda

115k15218349

answered yesterday

RudiC

2,9311211

answered yesterday

RudiC

2,9311211

answered yesterday

RudiC

2,9311211

doesn't work for me?
– andrew lorien
23 hours ago

add a comment |

doesn't work for me?
– andrew lorien
23 hours ago

doesn't work for me?
– andrew lorien
23 hours ago

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu