Sed: how to replace nextline n symbol in text files?
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
I need to fix an error and to replace the second tag </time>
with </tags>
in an XML file with the following structure:
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
I'm trying to do it using sed and since I have 2 </time>
closing tag per item, my idea is to replace </time><geo>
with </tags><geo>
.
However there is a next line symbol in between, so I'm using n
but it doesn't work:
sed 's/time>n<geo>/tags>n<geo>/g' old.xml > new.xml
text-processing sed
add a comment |
up vote
3
down vote
favorite
I need to fix an error and to replace the second tag </time>
with </tags>
in an XML file with the following structure:
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
I'm trying to do it using sed and since I have 2 </time>
closing tag per item, my idea is to replace </time><geo>
with </tags><geo>
.
However there is a next line symbol in between, so I'm using n
but it doesn't work:
sed 's/time>n<geo>/tags>n<geo>/g' old.xml > new.xml
text-processing sed
It is true thatn
separates a line from the nextline but it is commonly known as newline character
– miracle173
Dec 1 '14 at 1:00
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I need to fix an error and to replace the second tag </time>
with </tags>
in an XML file with the following structure:
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
I'm trying to do it using sed and since I have 2 </time>
closing tag per item, my idea is to replace </time><geo>
with </tags><geo>
.
However there is a next line symbol in between, so I'm using n
but it doesn't work:
sed 's/time>n<geo>/tags>n<geo>/g' old.xml > new.xml
text-processing sed
I need to fix an error and to replace the second tag </time>
with </tags>
in an XML file with the following structure:
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
I'm trying to do it using sed and since I have 2 </time>
closing tag per item, my idea is to replace </time><geo>
with </tags><geo>
.
However there is a next line symbol in between, so I'm using n
but it doesn't work:
sed 's/time>n<geo>/tags>n<geo>/g' old.xml > new.xml
text-processing sed
text-processing sed
edited Nov 20 at 22:26
Rui F Ribeiro
38.2k1475125
38.2k1475125
asked Apr 23 '11 at 14:35
aneuryzm
69031214
69031214
It is true thatn
separates a line from the nextline but it is commonly known as newline character
– miracle173
Dec 1 '14 at 1:00
add a comment |
It is true thatn
separates a line from the nextline but it is commonly known as newline character
– miracle173
Dec 1 '14 at 1:00
It is true that
n
separates a line from the nextline but it is commonly known as newline character– miracle173
Dec 1 '14 at 1:00
It is true that
n
separates a line from the nextline but it is commonly known as newline character– miracle173
Dec 1 '14 at 1:00
add a comment |
3 Answers
3
active
oldest
votes
up vote
4
down vote
accepted
Sed processes its input line by line, so a newline character will never spontaneously appear in the input. What you could do is put lines ending in </time
on hold; then if the next line begins with <geo>
, do the substitution in the previous line. (This is possible in sed, using the “hold space”, but I recommend turning to awk or perl when you need the hold space.)
However, given your sample input, you can just change </time>
into </tags>
when the line begins with <tags>
.
sed -e '/^<tags>/ s!</time>$!</tags>!'
add a comment |
up vote
2
down vote
While perhaps the solution to your problem can easily be achieved by other means, the answer to your question is a simple one. sed
, by default, works a line at a time on 2 buffers - one persistent across line cycles called the h
old space and one refreshed at least once per cycle called the pattern space - and the latter is where all edits are performed.
Look ahead can be gained in one of two ways - you can save old lines and fall behind the line cycle in order to make better use of commands to swap and compare the buffers. This involves command primitives such as [hH]
old, [gG]
et, ex
change - which save to, copy from and swap out the hold buffer respectively - and lower case forms overwrite and uppercase forms append to their target buffer.
Or you can work future lines into a constant edit algorithm in which you consistently remove as many input lines as you read per cycle. This latter would be my preference here - especially because sed
makes it so very easy and efficient - especially with the N;P;D
commands.
Here's a demo using your example data:
sed '$!N;s/ime(>n<geo)/ags1/;P;D
' <<IN
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
IN
N
ext, P
rint, and D
elete, like their lower-case counterparts n;p;d
get the next line of input, print, and delete into/from pattern space respectively. Unlike their lowercase counterparts (if a little less unlike in N
's case), these three work on newline boundaries rather than pattern space as a whole.
N
will append the next input line to pattern space following an
ewline character.P
will print only up to the first occuringn
ewline character in pattern space.D
will delete only up to and including the first occurringn
ewline in pattern space before quitting the script for the current cycle and queueing up the next with whatever remains in pattern space, or, if nothing remains following its delete action, with the next line awaiting on input as per usual.
These three can work together to expand sed
's edit window on a file very simply and efficiently - sed
slides through a file printing per cycle only the oldest from a series of lines it consistently deletes and replenishes according to a scripter's instructions - which leaves the sed
der in charge of the line cycle.
And a next line lookahead is easily expanded upon. If you wanted a 4-line pattern-space window throughout the script you could do:
sed -e '1N;N' -e ';N;...;P;D'
...or, perhaps more usefully...
sed -e ':next
$!/(.*n)3/!
N;b next' -e '
;...cmds...;P;D'
...in which sed
only draws in an input line - and continues to do so until it has enough before executing any other commands - if there are fewer than three n
ewline characters in pattern space and the current line is not the last. This occurs regardless of whatever the edits made by the subsequent commands might do.
add a comment |
up vote
0
down vote
To literary answer the question:
I solve this problem (text to edit spans multiple lines) by a little cheat:
cat input.txt | tr 'n' '@' | sed -e 's/txt@iam@interestedin/iaminterested@intxt/g' | tr '@' 'n' > output.txt
The only thing you have to be sure of is that the character you replace the newline with does not yet exist in your input.
You must also be careful that the result is not too long to edit - a pattern space consisting of an entire file is one that can either very difficult or even impossible to work with - a text file is officially only a text file if its newline delimiters are no more than 500 bytes apart.
– mikeserv
Nov 30 '14 at 20:36
@mikeserv : sorry, care to differ. It is after all a stream editor link Although there may be platforms that impose a limit. Maybe you can give an example?
– JdeHaan
Dec 1 '14 at 7:55
it is a stream editor for text files. POSIX imposes the 500 char per line limit. But if you want an example just try your thing on a file with millions of lines - anysed
will segfault.
– mikeserv
Dec 1 '14 at 15:44
@mikeserv : I did and do, daily. A syslog of over 5GB (didn't care for the number of lines) is no problem at all for 'my thing' and if you had cared to read the link in my previous reply you would have read that GNU-sed, as do others, imposes "no limit" for the maximum line length. An assertion that is not easily made and neither would go unchallenged for long. POSIX is a minimum standard. Anyone can go beyond its limits.
– JdeHaan
Dec 1 '14 at 17:27
It imposes no limit - but memory does. The entire file must fit into memory at once. And probably more than once - each edit represents an altered version of the line that must also be stored in memory at least until the line is written. It is amazing to me that you do this at all w/ a 5gb file - either you have a good deal of memory or a good deal of swap - but it is horribly inefficient in any case. And there is the problem of the replacement delimiter as you mention. POSIX is a minimum standard yes - that's what a standard is. You make no mention of GNUsed
in your answer.
– mikeserv
Dec 1 '14 at 17:31
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
Sed processes its input line by line, so a newline character will never spontaneously appear in the input. What you could do is put lines ending in </time
on hold; then if the next line begins with <geo>
, do the substitution in the previous line. (This is possible in sed, using the “hold space”, but I recommend turning to awk or perl when you need the hold space.)
However, given your sample input, you can just change </time>
into </tags>
when the line begins with <tags>
.
sed -e '/^<tags>/ s!</time>$!</tags>!'
add a comment |
up vote
4
down vote
accepted
Sed processes its input line by line, so a newline character will never spontaneously appear in the input. What you could do is put lines ending in </time
on hold; then if the next line begins with <geo>
, do the substitution in the previous line. (This is possible in sed, using the “hold space”, but I recommend turning to awk or perl when you need the hold space.)
However, given your sample input, you can just change </time>
into </tags>
when the line begins with <tags>
.
sed -e '/^<tags>/ s!</time>$!</tags>!'
add a comment |
up vote
4
down vote
accepted
up vote
4
down vote
accepted
Sed processes its input line by line, so a newline character will never spontaneously appear in the input. What you could do is put lines ending in </time
on hold; then if the next line begins with <geo>
, do the substitution in the previous line. (This is possible in sed, using the “hold space”, but I recommend turning to awk or perl when you need the hold space.)
However, given your sample input, you can just change </time>
into </tags>
when the line begins with <tags>
.
sed -e '/^<tags>/ s!</time>$!</tags>!'
Sed processes its input line by line, so a newline character will never spontaneously appear in the input. What you could do is put lines ending in </time
on hold; then if the next line begins with <geo>
, do the substitution in the previous line. (This is possible in sed, using the “hold space”, but I recommend turning to awk or perl when you need the hold space.)
However, given your sample input, you can just change </time>
into </tags>
when the line begins with <tags>
.
sed -e '/^<tags>/ s!</time>$!</tags>!'
answered Apr 23 '11 at 14:56
Gilles
522k12610401571
522k12610401571
add a comment |
add a comment |
up vote
2
down vote
While perhaps the solution to your problem can easily be achieved by other means, the answer to your question is a simple one. sed
, by default, works a line at a time on 2 buffers - one persistent across line cycles called the h
old space and one refreshed at least once per cycle called the pattern space - and the latter is where all edits are performed.
Look ahead can be gained in one of two ways - you can save old lines and fall behind the line cycle in order to make better use of commands to swap and compare the buffers. This involves command primitives such as [hH]
old, [gG]
et, ex
change - which save to, copy from and swap out the hold buffer respectively - and lower case forms overwrite and uppercase forms append to their target buffer.
Or you can work future lines into a constant edit algorithm in which you consistently remove as many input lines as you read per cycle. This latter would be my preference here - especially because sed
makes it so very easy and efficient - especially with the N;P;D
commands.
Here's a demo using your example data:
sed '$!N;s/ime(>n<geo)/ags1/;P;D
' <<IN
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
IN
N
ext, P
rint, and D
elete, like their lower-case counterparts n;p;d
get the next line of input, print, and delete into/from pattern space respectively. Unlike their lowercase counterparts (if a little less unlike in N
's case), these three work on newline boundaries rather than pattern space as a whole.
N
will append the next input line to pattern space following an
ewline character.P
will print only up to the first occuringn
ewline character in pattern space.D
will delete only up to and including the first occurringn
ewline in pattern space before quitting the script for the current cycle and queueing up the next with whatever remains in pattern space, or, if nothing remains following its delete action, with the next line awaiting on input as per usual.
These three can work together to expand sed
's edit window on a file very simply and efficiently - sed
slides through a file printing per cycle only the oldest from a series of lines it consistently deletes and replenishes according to a scripter's instructions - which leaves the sed
der in charge of the line cycle.
And a next line lookahead is easily expanded upon. If you wanted a 4-line pattern-space window throughout the script you could do:
sed -e '1N;N' -e ';N;...;P;D'
...or, perhaps more usefully...
sed -e ':next
$!/(.*n)3/!
N;b next' -e '
;...cmds...;P;D'
...in which sed
only draws in an input line - and continues to do so until it has enough before executing any other commands - if there are fewer than three n
ewline characters in pattern space and the current line is not the last. This occurs regardless of whatever the edits made by the subsequent commands might do.
add a comment |
up vote
2
down vote
While perhaps the solution to your problem can easily be achieved by other means, the answer to your question is a simple one. sed
, by default, works a line at a time on 2 buffers - one persistent across line cycles called the h
old space and one refreshed at least once per cycle called the pattern space - and the latter is where all edits are performed.
Look ahead can be gained in one of two ways - you can save old lines and fall behind the line cycle in order to make better use of commands to swap and compare the buffers. This involves command primitives such as [hH]
old, [gG]
et, ex
change - which save to, copy from and swap out the hold buffer respectively - and lower case forms overwrite and uppercase forms append to their target buffer.
Or you can work future lines into a constant edit algorithm in which you consistently remove as many input lines as you read per cycle. This latter would be my preference here - especially because sed
makes it so very easy and efficient - especially with the N;P;D
commands.
Here's a demo using your example data:
sed '$!N;s/ime(>n<geo)/ags1/;P;D
' <<IN
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
IN
N
ext, P
rint, and D
elete, like their lower-case counterparts n;p;d
get the next line of input, print, and delete into/from pattern space respectively. Unlike their lowercase counterparts (if a little less unlike in N
's case), these three work on newline boundaries rather than pattern space as a whole.
N
will append the next input line to pattern space following an
ewline character.P
will print only up to the first occuringn
ewline character in pattern space.D
will delete only up to and including the first occurringn
ewline in pattern space before quitting the script for the current cycle and queueing up the next with whatever remains in pattern space, or, if nothing remains following its delete action, with the next line awaiting on input as per usual.
These three can work together to expand sed
's edit window on a file very simply and efficiently - sed
slides through a file printing per cycle only the oldest from a series of lines it consistently deletes and replenishes according to a scripter's instructions - which leaves the sed
der in charge of the line cycle.
And a next line lookahead is easily expanded upon. If you wanted a 4-line pattern-space window throughout the script you could do:
sed -e '1N;N' -e ';N;...;P;D'
...or, perhaps more usefully...
sed -e ':next
$!/(.*n)3/!
N;b next' -e '
;...cmds...;P;D'
...in which sed
only draws in an input line - and continues to do so until it has enough before executing any other commands - if there are fewer than three n
ewline characters in pattern space and the current line is not the last. This occurs regardless of whatever the edits made by the subsequent commands might do.
add a comment |
up vote
2
down vote
up vote
2
down vote
While perhaps the solution to your problem can easily be achieved by other means, the answer to your question is a simple one. sed
, by default, works a line at a time on 2 buffers - one persistent across line cycles called the h
old space and one refreshed at least once per cycle called the pattern space - and the latter is where all edits are performed.
Look ahead can be gained in one of two ways - you can save old lines and fall behind the line cycle in order to make better use of commands to swap and compare the buffers. This involves command primitives such as [hH]
old, [gG]
et, ex
change - which save to, copy from and swap out the hold buffer respectively - and lower case forms overwrite and uppercase forms append to their target buffer.
Or you can work future lines into a constant edit algorithm in which you consistently remove as many input lines as you read per cycle. This latter would be my preference here - especially because sed
makes it so very easy and efficient - especially with the N;P;D
commands.
Here's a demo using your example data:
sed '$!N;s/ime(>n<geo)/ags1/;P;D
' <<IN
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
IN
N
ext, P
rint, and D
elete, like their lower-case counterparts n;p;d
get the next line of input, print, and delete into/from pattern space respectively. Unlike their lowercase counterparts (if a little less unlike in N
's case), these three work on newline boundaries rather than pattern space as a whole.
N
will append the next input line to pattern space following an
ewline character.P
will print only up to the first occuringn
ewline character in pattern space.D
will delete only up to and including the first occurringn
ewline in pattern space before quitting the script for the current cycle and queueing up the next with whatever remains in pattern space, or, if nothing remains following its delete action, with the next line awaiting on input as per usual.
These three can work together to expand sed
's edit window on a file very simply and efficiently - sed
slides through a file printing per cycle only the oldest from a series of lines it consistently deletes and replenishes according to a scripter's instructions - which leaves the sed
der in charge of the line cycle.
And a next line lookahead is easily expanded upon. If you wanted a 4-line pattern-space window throughout the script you could do:
sed -e '1N;N' -e ';N;...;P;D'
...or, perhaps more usefully...
sed -e ':next
$!/(.*n)3/!
N;b next' -e '
;...cmds...;P;D'
...in which sed
only draws in an input line - and continues to do so until it has enough before executing any other commands - if there are fewer than three n
ewline characters in pattern space and the current line is not the last. This occurs regardless of whatever the edits made by the subsequent commands might do.
While perhaps the solution to your problem can easily be achieved by other means, the answer to your question is a simple one. sed
, by default, works a line at a time on 2 buffers - one persistent across line cycles called the h
old space and one refreshed at least once per cycle called the pattern space - and the latter is where all edits are performed.
Look ahead can be gained in one of two ways - you can save old lines and fall behind the line cycle in order to make better use of commands to swap and compare the buffers. This involves command primitives such as [hH]
old, [gG]
et, ex
change - which save to, copy from and swap out the hold buffer respectively - and lower case forms overwrite and uppercase forms append to their target buffer.
Or you can work future lines into a constant edit algorithm in which you consistently remove as many input lines as you read per cycle. This latter would be my preference here - especially because sed
makes it so very easy and efficient - especially with the N;P;D
commands.
Here's a demo using your example data:
sed '$!N;s/ime(>n<geo)/ags1/;P;D
' <<IN
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
<time>20260664</time>
<tags>substancesummit ss</time>
<geo>asdsadsa</geo>
IN
N
ext, P
rint, and D
elete, like their lower-case counterparts n;p;d
get the next line of input, print, and delete into/from pattern space respectively. Unlike their lowercase counterparts (if a little less unlike in N
's case), these three work on newline boundaries rather than pattern space as a whole.
N
will append the next input line to pattern space following an
ewline character.P
will print only up to the first occuringn
ewline character in pattern space.D
will delete only up to and including the first occurringn
ewline in pattern space before quitting the script for the current cycle and queueing up the next with whatever remains in pattern space, or, if nothing remains following its delete action, with the next line awaiting on input as per usual.
These three can work together to expand sed
's edit window on a file very simply and efficiently - sed
slides through a file printing per cycle only the oldest from a series of lines it consistently deletes and replenishes according to a scripter's instructions - which leaves the sed
der in charge of the line cycle.
And a next line lookahead is easily expanded upon. If you wanted a 4-line pattern-space window throughout the script you could do:
sed -e '1N;N' -e ';N;...;P;D'
...or, perhaps more usefully...
sed -e ':next
$!/(.*n)3/!
N;b next' -e '
;...cmds...;P;D'
...in which sed
only draws in an input line - and continues to do so until it has enough before executing any other commands - if there are fewer than three n
ewline characters in pattern space and the current line is not the last. This occurs regardless of whatever the edits made by the subsequent commands might do.
edited Nov 30 '14 at 21:43
answered Nov 30 '14 at 21:11
mikeserv
45k566152
45k566152
add a comment |
add a comment |
up vote
0
down vote
To literary answer the question:
I solve this problem (text to edit spans multiple lines) by a little cheat:
cat input.txt | tr 'n' '@' | sed -e 's/txt@iam@interestedin/iaminterested@intxt/g' | tr '@' 'n' > output.txt
The only thing you have to be sure of is that the character you replace the newline with does not yet exist in your input.
You must also be careful that the result is not too long to edit - a pattern space consisting of an entire file is one that can either very difficult or even impossible to work with - a text file is officially only a text file if its newline delimiters are no more than 500 bytes apart.
– mikeserv
Nov 30 '14 at 20:36
@mikeserv : sorry, care to differ. It is after all a stream editor link Although there may be platforms that impose a limit. Maybe you can give an example?
– JdeHaan
Dec 1 '14 at 7:55
it is a stream editor for text files. POSIX imposes the 500 char per line limit. But if you want an example just try your thing on a file with millions of lines - anysed
will segfault.
– mikeserv
Dec 1 '14 at 15:44
@mikeserv : I did and do, daily. A syslog of over 5GB (didn't care for the number of lines) is no problem at all for 'my thing' and if you had cared to read the link in my previous reply you would have read that GNU-sed, as do others, imposes "no limit" for the maximum line length. An assertion that is not easily made and neither would go unchallenged for long. POSIX is a minimum standard. Anyone can go beyond its limits.
– JdeHaan
Dec 1 '14 at 17:27
It imposes no limit - but memory does. The entire file must fit into memory at once. And probably more than once - each edit represents an altered version of the line that must also be stored in memory at least until the line is written. It is amazing to me that you do this at all w/ a 5gb file - either you have a good deal of memory or a good deal of swap - but it is horribly inefficient in any case. And there is the problem of the replacement delimiter as you mention. POSIX is a minimum standard yes - that's what a standard is. You make no mention of GNUsed
in your answer.
– mikeserv
Dec 1 '14 at 17:31
add a comment |
up vote
0
down vote
To literary answer the question:
I solve this problem (text to edit spans multiple lines) by a little cheat:
cat input.txt | tr 'n' '@' | sed -e 's/txt@iam@interestedin/iaminterested@intxt/g' | tr '@' 'n' > output.txt
The only thing you have to be sure of is that the character you replace the newline with does not yet exist in your input.
You must also be careful that the result is not too long to edit - a pattern space consisting of an entire file is one that can either very difficult or even impossible to work with - a text file is officially only a text file if its newline delimiters are no more than 500 bytes apart.
– mikeserv
Nov 30 '14 at 20:36
@mikeserv : sorry, care to differ. It is after all a stream editor link Although there may be platforms that impose a limit. Maybe you can give an example?
– JdeHaan
Dec 1 '14 at 7:55
it is a stream editor for text files. POSIX imposes the 500 char per line limit. But if you want an example just try your thing on a file with millions of lines - anysed
will segfault.
– mikeserv
Dec 1 '14 at 15:44
@mikeserv : I did and do, daily. A syslog of over 5GB (didn't care for the number of lines) is no problem at all for 'my thing' and if you had cared to read the link in my previous reply you would have read that GNU-sed, as do others, imposes "no limit" for the maximum line length. An assertion that is not easily made and neither would go unchallenged for long. POSIX is a minimum standard. Anyone can go beyond its limits.
– JdeHaan
Dec 1 '14 at 17:27
It imposes no limit - but memory does. The entire file must fit into memory at once. And probably more than once - each edit represents an altered version of the line that must also be stored in memory at least until the line is written. It is amazing to me that you do this at all w/ a 5gb file - either you have a good deal of memory or a good deal of swap - but it is horribly inefficient in any case. And there is the problem of the replacement delimiter as you mention. POSIX is a minimum standard yes - that's what a standard is. You make no mention of GNUsed
in your answer.
– mikeserv
Dec 1 '14 at 17:31
add a comment |
up vote
0
down vote
up vote
0
down vote
To literary answer the question:
I solve this problem (text to edit spans multiple lines) by a little cheat:
cat input.txt | tr 'n' '@' | sed -e 's/txt@iam@interestedin/iaminterested@intxt/g' | tr '@' 'n' > output.txt
The only thing you have to be sure of is that the character you replace the newline with does not yet exist in your input.
To literary answer the question:
I solve this problem (text to edit spans multiple lines) by a little cheat:
cat input.txt | tr 'n' '@' | sed -e 's/txt@iam@interestedin/iaminterested@intxt/g' | tr '@' 'n' > output.txt
The only thing you have to be sure of is that the character you replace the newline with does not yet exist in your input.
answered Nov 30 '14 at 18:29
JdeHaan
352213
352213
You must also be careful that the result is not too long to edit - a pattern space consisting of an entire file is one that can either very difficult or even impossible to work with - a text file is officially only a text file if its newline delimiters are no more than 500 bytes apart.
– mikeserv
Nov 30 '14 at 20:36
@mikeserv : sorry, care to differ. It is after all a stream editor link Although there may be platforms that impose a limit. Maybe you can give an example?
– JdeHaan
Dec 1 '14 at 7:55
it is a stream editor for text files. POSIX imposes the 500 char per line limit. But if you want an example just try your thing on a file with millions of lines - anysed
will segfault.
– mikeserv
Dec 1 '14 at 15:44
@mikeserv : I did and do, daily. A syslog of over 5GB (didn't care for the number of lines) is no problem at all for 'my thing' and if you had cared to read the link in my previous reply you would have read that GNU-sed, as do others, imposes "no limit" for the maximum line length. An assertion that is not easily made and neither would go unchallenged for long. POSIX is a minimum standard. Anyone can go beyond its limits.
– JdeHaan
Dec 1 '14 at 17:27
It imposes no limit - but memory does. The entire file must fit into memory at once. And probably more than once - each edit represents an altered version of the line that must also be stored in memory at least until the line is written. It is amazing to me that you do this at all w/ a 5gb file - either you have a good deal of memory or a good deal of swap - but it is horribly inefficient in any case. And there is the problem of the replacement delimiter as you mention. POSIX is a minimum standard yes - that's what a standard is. You make no mention of GNUsed
in your answer.
– mikeserv
Dec 1 '14 at 17:31
add a comment |
You must also be careful that the result is not too long to edit - a pattern space consisting of an entire file is one that can either very difficult or even impossible to work with - a text file is officially only a text file if its newline delimiters are no more than 500 bytes apart.
– mikeserv
Nov 30 '14 at 20:36
@mikeserv : sorry, care to differ. It is after all a stream editor link Although there may be platforms that impose a limit. Maybe you can give an example?
– JdeHaan
Dec 1 '14 at 7:55
it is a stream editor for text files. POSIX imposes the 500 char per line limit. But if you want an example just try your thing on a file with millions of lines - anysed
will segfault.
– mikeserv
Dec 1 '14 at 15:44
@mikeserv : I did and do, daily. A syslog of over 5GB (didn't care for the number of lines) is no problem at all for 'my thing' and if you had cared to read the link in my previous reply you would have read that GNU-sed, as do others, imposes "no limit" for the maximum line length. An assertion that is not easily made and neither would go unchallenged for long. POSIX is a minimum standard. Anyone can go beyond its limits.
– JdeHaan
Dec 1 '14 at 17:27
It imposes no limit - but memory does. The entire file must fit into memory at once. And probably more than once - each edit represents an altered version of the line that must also be stored in memory at least until the line is written. It is amazing to me that you do this at all w/ a 5gb file - either you have a good deal of memory or a good deal of swap - but it is horribly inefficient in any case. And there is the problem of the replacement delimiter as you mention. POSIX is a minimum standard yes - that's what a standard is. You make no mention of GNUsed
in your answer.
– mikeserv
Dec 1 '14 at 17:31
You must also be careful that the result is not too long to edit - a pattern space consisting of an entire file is one that can either very difficult or even impossible to work with - a text file is officially only a text file if its newline delimiters are no more than 500 bytes apart.
– mikeserv
Nov 30 '14 at 20:36
You must also be careful that the result is not too long to edit - a pattern space consisting of an entire file is one that can either very difficult or even impossible to work with - a text file is officially only a text file if its newline delimiters are no more than 500 bytes apart.
– mikeserv
Nov 30 '14 at 20:36
@mikeserv : sorry, care to differ. It is after all a stream editor link Although there may be platforms that impose a limit. Maybe you can give an example?
– JdeHaan
Dec 1 '14 at 7:55
@mikeserv : sorry, care to differ. It is after all a stream editor link Although there may be platforms that impose a limit. Maybe you can give an example?
– JdeHaan
Dec 1 '14 at 7:55
it is a stream editor for text files. POSIX imposes the 500 char per line limit. But if you want an example just try your thing on a file with millions of lines - any
sed
will segfault.– mikeserv
Dec 1 '14 at 15:44
it is a stream editor for text files. POSIX imposes the 500 char per line limit. But if you want an example just try your thing on a file with millions of lines - any
sed
will segfault.– mikeserv
Dec 1 '14 at 15:44
@mikeserv : I did and do, daily. A syslog of over 5GB (didn't care for the number of lines) is no problem at all for 'my thing' and if you had cared to read the link in my previous reply you would have read that GNU-sed, as do others, imposes "no limit" for the maximum line length. An assertion that is not easily made and neither would go unchallenged for long. POSIX is a minimum standard. Anyone can go beyond its limits.
– JdeHaan
Dec 1 '14 at 17:27
@mikeserv : I did and do, daily. A syslog of over 5GB (didn't care for the number of lines) is no problem at all for 'my thing' and if you had cared to read the link in my previous reply you would have read that GNU-sed, as do others, imposes "no limit" for the maximum line length. An assertion that is not easily made and neither would go unchallenged for long. POSIX is a minimum standard. Anyone can go beyond its limits.
– JdeHaan
Dec 1 '14 at 17:27
It imposes no limit - but memory does. The entire file must fit into memory at once. And probably more than once - each edit represents an altered version of the line that must also be stored in memory at least until the line is written. It is amazing to me that you do this at all w/ a 5gb file - either you have a good deal of memory or a good deal of swap - but it is horribly inefficient in any case. And there is the problem of the replacement delimiter as you mention. POSIX is a minimum standard yes - that's what a standard is. You make no mention of GNU
sed
in your answer.– mikeserv
Dec 1 '14 at 17:31
It imposes no limit - but memory does. The entire file must fit into memory at once. And probably more than once - each edit represents an altered version of the line that must also be stored in memory at least until the line is written. It is amazing to me that you do this at all w/ a 5gb file - either you have a good deal of memory or a good deal of swap - but it is horribly inefficient in any case. And there is the problem of the replacement delimiter as you mention. POSIX is a minimum standard yes - that's what a standard is. You make no mention of GNU
sed
in your answer.– mikeserv
Dec 1 '14 at 17:31
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f11876%2fsed-how-to-replace-nextline-n-symbol-in-text-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It is true that
n
separates a line from the nextline but it is commonly known as newline character– miracle173
Dec 1 '14 at 1:00