sed: regex input buffer length larger than INT_MAX
Clash Royale CLAN TAG#URR8PPP
up vote
-2
down vote
favorite
I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.
sed: regex input buffer length larger than INT_MAX
My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line).ÃÂ
For example, an input of
The quick brown fox
jumps over
the lazy dog.
should yield a result of
["The quick brown fox","jumps over","the lazy dog.",]
Assume that the input file doesnâÂÂt contain any quotes.
The code I run is this:
cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"
truncate --size=-1 "$FILE"
cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"
sed version:
sed --version
sed (GNU sed) 4.5
Any thoughts?
text-processing sed scripting regular-expression
 |Â
show 3 more comments
up vote
-2
down vote
favorite
I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.
sed: regex input buffer length larger than INT_MAX
My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line).ÃÂ
For example, an input of
The quick brown fox
jumps over
the lazy dog.
should yield a result of
["The quick brown fox","jumps over","the lazy dog.",]
Assume that the input file doesnâÂÂt contain any quotes.
The code I run is this:
cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"
truncate --size=-1 "$FILE"
cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"
sed version:
sed --version
sed (GNU sed) 4.5
Any thoughts?
text-processing sed scripting regular-expression
without the expression, you'll get less useful answers.
â Thomas Dickey
Jul 4 at 23:37
@ThomasDickey updated the post with the sed expression
â Chris
Jul 4 at 23:42
It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
â Thomas Dickey
Jul 4 at 23:52
1
Rather than slurping the whole file into memory (using-z
) why not insert the[
at the start of the first line and the]
at the end of the last?sed -e '1s/^/[/' -e '$s/$/]/'
â steeldriver
Jul 5 at 0:43
1
@ThomasDickey: (1)â¯Yes, the regular expressions in ChrisâÂÂs commands are âÂÂunanchoredâ â they donâÂÂt begin with^
or end with$
.âÂÂWhat does that have to do with the question?âÂÂHow does that comment help the OP?âÂÂ(2)â¯Command line utilities are on-topic at U&L.âÂÂSo are shell scripts, within reason;âÂÂa three-line script is certainly reasonable for discussion here.âÂÂ(3)â¯We prefer not to use the word âÂÂforumâ when talking about Stack Exchange.
â G-Man
Jul 5 at 5:10
 |Â
show 3 more comments
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.
sed: regex input buffer length larger than INT_MAX
My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line).ÃÂ
For example, an input of
The quick brown fox
jumps over
the lazy dog.
should yield a result of
["The quick brown fox","jumps over","the lazy dog.",]
Assume that the input file doesnâÂÂt contain any quotes.
The code I run is this:
cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"
truncate --size=-1 "$FILE"
cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"
sed version:
sed --version
sed (GNU sed) 4.5
Any thoughts?
text-processing sed scripting regular-expression
I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.
sed: regex input buffer length larger than INT_MAX
My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line).ÃÂ
For example, an input of
The quick brown fox
jumps over
the lazy dog.
should yield a result of
["The quick brown fox","jumps over","the lazy dog.",]
Assume that the input file doesnâÂÂt contain any quotes.
The code I run is this:
cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"
truncate --size=-1 "$FILE"
cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"
sed version:
sed --version
sed (GNU sed) 4.5
Any thoughts?
text-processing sed scripting regular-expression
edited Jul 5 at 4:59
G-Man
11.4k82656
11.4k82656
asked Jul 4 at 23:35
Chris
11
11
without the expression, you'll get less useful answers.
â Thomas Dickey
Jul 4 at 23:37
@ThomasDickey updated the post with the sed expression
â Chris
Jul 4 at 23:42
It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
â Thomas Dickey
Jul 4 at 23:52
1
Rather than slurping the whole file into memory (using-z
) why not insert the[
at the start of the first line and the]
at the end of the last?sed -e '1s/^/[/' -e '$s/$/]/'
â steeldriver
Jul 5 at 0:43
1
@ThomasDickey: (1)â¯Yes, the regular expressions in ChrisâÂÂs commands are âÂÂunanchoredâ â they donâÂÂt begin with^
or end with$
.âÂÂWhat does that have to do with the question?âÂÂHow does that comment help the OP?âÂÂ(2)â¯Command line utilities are on-topic at U&L.âÂÂSo are shell scripts, within reason;âÂÂa three-line script is certainly reasonable for discussion here.âÂÂ(3)â¯We prefer not to use the word âÂÂforumâ when talking about Stack Exchange.
â G-Man
Jul 5 at 5:10
 |Â
show 3 more comments
without the expression, you'll get less useful answers.
â Thomas Dickey
Jul 4 at 23:37
@ThomasDickey updated the post with the sed expression
â Chris
Jul 4 at 23:42
It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
â Thomas Dickey
Jul 4 at 23:52
1
Rather than slurping the whole file into memory (using-z
) why not insert the[
at the start of the first line and the]
at the end of the last?sed -e '1s/^/[/' -e '$s/$/]/'
â steeldriver
Jul 5 at 0:43
1
@ThomasDickey: (1)â¯Yes, the regular expressions in ChrisâÂÂs commands are âÂÂunanchoredâ â they donâÂÂt begin with^
or end with$
.âÂÂWhat does that have to do with the question?âÂÂHow does that comment help the OP?âÂÂ(2)â¯Command line utilities are on-topic at U&L.âÂÂSo are shell scripts, within reason;âÂÂa three-line script is certainly reasonable for discussion here.âÂÂ(3)â¯We prefer not to use the word âÂÂforumâ when talking about Stack Exchange.
â G-Man
Jul 5 at 5:10
without the expression, you'll get less useful answers.
â Thomas Dickey
Jul 4 at 23:37
without the expression, you'll get less useful answers.
â Thomas Dickey
Jul 4 at 23:37
@ThomasDickey updated the post with the sed expression
â Chris
Jul 4 at 23:42
@ThomasDickey updated the post with the sed expression
â Chris
Jul 4 at 23:42
It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
â Thomas Dickey
Jul 4 at 23:52
It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
â Thomas Dickey
Jul 4 at 23:52
1
1
Rather than slurping the whole file into memory (using
-z
) why not insert the [
at the start of the first line and the ]
at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
â steeldriver
Jul 5 at 0:43
Rather than slurping the whole file into memory (using
-z
) why not insert the [
at the start of the first line and the ]
at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
â steeldriver
Jul 5 at 0:43
1
1
@ThomasDickey: (1)â¯Yes, the regular expressions in ChrisâÂÂs commands are âÂÂunanchoredâ â they donâÂÂt begin with
^
or end with $
.âÂÂWhat does that have to do with the question?âÂÂHow does that comment help the OP?âÂÂ(2)â¯Command line utilities are on-topic at U&L.âÂÂSo are shell scripts, within reason;âÂÂa three-line script is certainly reasonable for discussion here.âÂÂ(3)â¯We prefer not to use the word âÂÂforumâ when talking about Stack Exchange.â G-Man
Jul 5 at 5:10
@ThomasDickey: (1)â¯Yes, the regular expressions in ChrisâÂÂs commands are âÂÂunanchoredâ â they donâÂÂt begin with
^
or end with $
.âÂÂWhat does that have to do with the question?âÂÂHow does that comment help the OP?âÂÂ(2)â¯Command line utilities are on-topic at U&L.âÂÂSo are shell scripts, within reason;âÂÂa three-line script is certainly reasonable for discussion here.âÂÂ(3)â¯We prefer not to use the word âÂÂforumâ when talking about Stack Exchange.â G-Man
Jul 5 at 5:10
 |Â
show 3 more comments
2 Answers
2
active
oldest
votes
up vote
4
down vote
Your question is strange.ÃÂ
You say âÂÂâ¦àthis error just came up.ÃÂ
I tried googling it but there didn't find any result with this.âÂÂ,
making it sound like you have no idea whatâÂÂs happening.ÃÂ
But you do understand it, donâÂÂt you?ÃÂ
When you say sedÃÂ -z
, youâÂÂre telling sed
to read the input,
treating NUL as record (line) separators instead of newline.ÃÂ
But text files typically donâÂÂt have NUL characters in them,
so, in practical terms,
this means that you want sed
to read the entire file
and treat it as one line.ÃÂ
You obviously understand this; your 's/.*/[&]/'
command,
to âÂÂenclose the entirety of the file with square bracketsâÂÂ,
doesnâÂÂt make sense unless you expect the entirety of the file
to be treated as a single line.
So why are you so surprised that your big file
is too big to be handled as a single line?
You say that your script works sometimes âÂÂ
presumably when the size of the file
is below the maximum line size permitted by sed
.ÃÂ
This script should do the same thing, regardless of the size of the file:
cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'
Of course this can still choke if any individual line in the input
is absurdly long.
Notes:
- You donâÂÂt need the
and
;
"$FILE"
is fine. - Following the suggestion made by steeldriver,
this inserts a[
at the beginning of the first line
and appends a]
at the end of the last line. - I left off the
sponge
for illustration purposes.ÃÂ
Overwriting your input file may be operationally necessary,
but itâÂÂs a bad thing to do while youâÂÂre still debugging.ÃÂ
Add thesponge
command back when youâÂÂre sure itâÂÂs doing what you want.
This duplicates your script, so an input of
The quick brown fox
jumps over
the lazy dog.
will yield a result of
["The quick brown fox","jumps over","the lazy dog.",]
with an extra comma before the ]
.ÃÂ
If thatâÂÂs really what you want, OK, thatâÂÂs fine with me.ÃÂ
If you donâÂÂt want the comma at the end, do
cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'
where the '$s/,$/]/'
command
removes the comma at the end of the file when it appends the ]
.
Note also that all of the commands discussed so far
will leave you with a file with no newline characters,
not even one at the end.ÃÂ
This is a malformed text file,
and some commands my fail to process it properly.ÃÂ
If thatâÂÂs really what you want, OK, thatâÂÂs fine with me.ÃÂ
Otherwise, add
echo >> "$FILE"
or
printf 'n' >> "$FILE"
at the end of your script.
The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â Chris
Jul 5 at 11:39
** the comma at the end
â Chris
Jul 5 at 11:50
(1)â¯IâÂÂm not very familiar with JSON or the programs that manipulate it.â ItâÂÂs possible that they wonâÂÂt care about the missing newline.â In particular, if youâÂÂve gotten Program X to work with the output from your current script, you probably wonâÂÂt have any trouble with it in the future.âÂÂ(2)â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., âÂÂThe quick brown foxâ¯â¦âÂÂ).â I think youâÂÂll find that the extra comma is there; thetruncateâ¯--size=-1
removes only the final newline.â Try runningsedâ¯-zâ¯'s/.*/[&]/'
on a small file for comparison.
â G-Man
Jul 5 at 15:43
add a comment |Â
up vote
1
down vote
If you didn't require sed, awk can do this, IMHO a bit more clearly:
Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:
awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"
- as G-Man said, leave off the
sponge
part for debugging - if you don't want the newline at the end, leave out the
n
Add: modified method, based on the request to remove the last comma before adding the brackets:
awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"
(In awk an uninitialized variable in string context is guaranteed to yield an empty string, but if you prefer to be explicit add -vsep=
to the options or ;sep=""
to the BEGIN
block to initialize it.)
got unexpected newline or end of string, hmm let's see
â Chris
Jul 5 at 11:44
@Chris: I fixed the âÂÂunexpected newline or end of stringâ problem.
â G-Man
Jul 5 at 16:06
dave_thompson_085: Your solution behaves the same as the first command in my answer â it includes a comma after the last line of input ("the lazy dog."
), before the]
â which Chris doesnâÂÂt want.â Can you remove that?
â G-Man
Jul 5 at 16:06
@G-Man: tnx and done
â dave_thompson_085
Jul 6 at 2:13
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
Your question is strange.ÃÂ
You say âÂÂâ¦àthis error just came up.ÃÂ
I tried googling it but there didn't find any result with this.âÂÂ,
making it sound like you have no idea whatâÂÂs happening.ÃÂ
But you do understand it, donâÂÂt you?ÃÂ
When you say sedÃÂ -z
, youâÂÂre telling sed
to read the input,
treating NUL as record (line) separators instead of newline.ÃÂ
But text files typically donâÂÂt have NUL characters in them,
so, in practical terms,
this means that you want sed
to read the entire file
and treat it as one line.ÃÂ
You obviously understand this; your 's/.*/[&]/'
command,
to âÂÂenclose the entirety of the file with square bracketsâÂÂ,
doesnâÂÂt make sense unless you expect the entirety of the file
to be treated as a single line.
So why are you so surprised that your big file
is too big to be handled as a single line?
You say that your script works sometimes âÂÂ
presumably when the size of the file
is below the maximum line size permitted by sed
.ÃÂ
This script should do the same thing, regardless of the size of the file:
cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'
Of course this can still choke if any individual line in the input
is absurdly long.
Notes:
- You donâÂÂt need the
and
;
"$FILE"
is fine. - Following the suggestion made by steeldriver,
this inserts a[
at the beginning of the first line
and appends a]
at the end of the last line. - I left off the
sponge
for illustration purposes.ÃÂ
Overwriting your input file may be operationally necessary,
but itâÂÂs a bad thing to do while youâÂÂre still debugging.ÃÂ
Add thesponge
command back when youâÂÂre sure itâÂÂs doing what you want.
This duplicates your script, so an input of
The quick brown fox
jumps over
the lazy dog.
will yield a result of
["The quick brown fox","jumps over","the lazy dog.",]
with an extra comma before the ]
.ÃÂ
If thatâÂÂs really what you want, OK, thatâÂÂs fine with me.ÃÂ
If you donâÂÂt want the comma at the end, do
cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'
where the '$s/,$/]/'
command
removes the comma at the end of the file when it appends the ]
.
Note also that all of the commands discussed so far
will leave you with a file with no newline characters,
not even one at the end.ÃÂ
This is a malformed text file,
and some commands my fail to process it properly.ÃÂ
If thatâÂÂs really what you want, OK, thatâÂÂs fine with me.ÃÂ
Otherwise, add
echo >> "$FILE"
or
printf 'n' >> "$FILE"
at the end of your script.
The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â Chris
Jul 5 at 11:39
** the comma at the end
â Chris
Jul 5 at 11:50
(1)â¯IâÂÂm not very familiar with JSON or the programs that manipulate it.â ItâÂÂs possible that they wonâÂÂt care about the missing newline.â In particular, if youâÂÂve gotten Program X to work with the output from your current script, you probably wonâÂÂt have any trouble with it in the future.âÂÂ(2)â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., âÂÂThe quick brown foxâ¯â¦âÂÂ).â I think youâÂÂll find that the extra comma is there; thetruncateâ¯--size=-1
removes only the final newline.â Try runningsedâ¯-zâ¯'s/.*/[&]/'
on a small file for comparison.
â G-Man
Jul 5 at 15:43
add a comment |Â
up vote
4
down vote
Your question is strange.ÃÂ
You say âÂÂâ¦àthis error just came up.ÃÂ
I tried googling it but there didn't find any result with this.âÂÂ,
making it sound like you have no idea whatâÂÂs happening.ÃÂ
But you do understand it, donâÂÂt you?ÃÂ
When you say sedÃÂ -z
, youâÂÂre telling sed
to read the input,
treating NUL as record (line) separators instead of newline.ÃÂ
But text files typically donâÂÂt have NUL characters in them,
so, in practical terms,
this means that you want sed
to read the entire file
and treat it as one line.ÃÂ
You obviously understand this; your 's/.*/[&]/'
command,
to âÂÂenclose the entirety of the file with square bracketsâÂÂ,
doesnâÂÂt make sense unless you expect the entirety of the file
to be treated as a single line.
So why are you so surprised that your big file
is too big to be handled as a single line?
You say that your script works sometimes âÂÂ
presumably when the size of the file
is below the maximum line size permitted by sed
.ÃÂ
This script should do the same thing, regardless of the size of the file:
cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'
Of course this can still choke if any individual line in the input
is absurdly long.
Notes:
- You donâÂÂt need the
and
;
"$FILE"
is fine. - Following the suggestion made by steeldriver,
this inserts a[
at the beginning of the first line
and appends a]
at the end of the last line. - I left off the
sponge
for illustration purposes.ÃÂ
Overwriting your input file may be operationally necessary,
but itâÂÂs a bad thing to do while youâÂÂre still debugging.ÃÂ
Add thesponge
command back when youâÂÂre sure itâÂÂs doing what you want.
This duplicates your script, so an input of
The quick brown fox
jumps over
the lazy dog.
will yield a result of
["The quick brown fox","jumps over","the lazy dog.",]
with an extra comma before the ]
.ÃÂ
If thatâÂÂs really what you want, OK, thatâÂÂs fine with me.ÃÂ
If you donâÂÂt want the comma at the end, do
cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'
where the '$s/,$/]/'
command
removes the comma at the end of the file when it appends the ]
.
Note also that all of the commands discussed so far
will leave you with a file with no newline characters,
not even one at the end.ÃÂ
This is a malformed text file,
and some commands my fail to process it properly.ÃÂ
If thatâÂÂs really what you want, OK, thatâÂÂs fine with me.ÃÂ
Otherwise, add
echo >> "$FILE"
or
printf 'n' >> "$FILE"
at the end of your script.
The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â Chris
Jul 5 at 11:39
** the comma at the end
â Chris
Jul 5 at 11:50
(1)â¯IâÂÂm not very familiar with JSON or the programs that manipulate it.â ItâÂÂs possible that they wonâÂÂt care about the missing newline.â In particular, if youâÂÂve gotten Program X to work with the output from your current script, you probably wonâÂÂt have any trouble with it in the future.âÂÂ(2)â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., âÂÂThe quick brown foxâ¯â¦âÂÂ).â I think youâÂÂll find that the extra comma is there; thetruncateâ¯--size=-1
removes only the final newline.â Try runningsedâ¯-zâ¯'s/.*/[&]/'
on a small file for comparison.
â G-Man
Jul 5 at 15:43
add a comment |Â
up vote
4
down vote
up vote
4
down vote
Your question is strange.ÃÂ
You say âÂÂâ¦àthis error just came up.ÃÂ
I tried googling it but there didn't find any result with this.âÂÂ,
making it sound like you have no idea whatâÂÂs happening.ÃÂ
But you do understand it, donâÂÂt you?ÃÂ
When you say sedÃÂ -z
, youâÂÂre telling sed
to read the input,
treating NUL as record (line) separators instead of newline.ÃÂ
But text files typically donâÂÂt have NUL characters in them,
so, in practical terms,
this means that you want sed
to read the entire file
and treat it as one line.ÃÂ
You obviously understand this; your 's/.*/[&]/'
command,
to âÂÂenclose the entirety of the file with square bracketsâÂÂ,
doesnâÂÂt make sense unless you expect the entirety of the file
to be treated as a single line.
So why are you so surprised that your big file
is too big to be handled as a single line?
You say that your script works sometimes âÂÂ
presumably when the size of the file
is below the maximum line size permitted by sed
.ÃÂ
This script should do the same thing, regardless of the size of the file:
cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'
Of course this can still choke if any individual line in the input
is absurdly long.
Notes:
- You donâÂÂt need the
and
;
"$FILE"
is fine. - Following the suggestion made by steeldriver,
this inserts a[
at the beginning of the first line
and appends a]
at the end of the last line. - I left off the
sponge
for illustration purposes.ÃÂ
Overwriting your input file may be operationally necessary,
but itâÂÂs a bad thing to do while youâÂÂre still debugging.ÃÂ
Add thesponge
command back when youâÂÂre sure itâÂÂs doing what you want.
This duplicates your script, so an input of
The quick brown fox
jumps over
the lazy dog.
will yield a result of
["The quick brown fox","jumps over","the lazy dog.",]
with an extra comma before the ]
.ÃÂ
If thatâÂÂs really what you want, OK, thatâÂÂs fine with me.ÃÂ
If you donâÂÂt want the comma at the end, do
cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'
where the '$s/,$/]/'
command
removes the comma at the end of the file when it appends the ]
.
Note also that all of the commands discussed so far
will leave you with a file with no newline characters,
not even one at the end.ÃÂ
This is a malformed text file,
and some commands my fail to process it properly.ÃÂ
If thatâÂÂs really what you want, OK, thatâÂÂs fine with me.ÃÂ
Otherwise, add
echo >> "$FILE"
or
printf 'n' >> "$FILE"
at the end of your script.
Your question is strange.ÃÂ
You say âÂÂâ¦àthis error just came up.ÃÂ
I tried googling it but there didn't find any result with this.âÂÂ,
making it sound like you have no idea whatâÂÂs happening.ÃÂ
But you do understand it, donâÂÂt you?ÃÂ
When you say sedÃÂ -z
, youâÂÂre telling sed
to read the input,
treating NUL as record (line) separators instead of newline.ÃÂ
But text files typically donâÂÂt have NUL characters in them,
so, in practical terms,
this means that you want sed
to read the entire file
and treat it as one line.ÃÂ
You obviously understand this; your 's/.*/[&]/'
command,
to âÂÂenclose the entirety of the file with square bracketsâÂÂ,
doesnâÂÂt make sense unless you expect the entirety of the file
to be treated as a single line.
So why are you so surprised that your big file
is too big to be handled as a single line?
You say that your script works sometimes âÂÂ
presumably when the size of the file
is below the maximum line size permitted by sed
.ÃÂ
This script should do the same thing, regardless of the size of the file:
cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'
Of course this can still choke if any individual line in the input
is absurdly long.
Notes:
- You donâÂÂt need the
and
;
"$FILE"
is fine. - Following the suggestion made by steeldriver,
this inserts a[
at the beginning of the first line
and appends a]
at the end of the last line. - I left off the
sponge
for illustration purposes.ÃÂ
Overwriting your input file may be operationally necessary,
but itâÂÂs a bad thing to do while youâÂÂre still debugging.ÃÂ
Add thesponge
command back when youâÂÂre sure itâÂÂs doing what you want.
This duplicates your script, so an input of
The quick brown fox
jumps over
the lazy dog.
will yield a result of
["The quick brown fox","jumps over","the lazy dog.",]
with an extra comma before the ]
.ÃÂ
If thatâÂÂs really what you want, OK, thatâÂÂs fine with me.ÃÂ
If you donâÂÂt want the comma at the end, do
cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'
where the '$s/,$/]/'
command
removes the comma at the end of the file when it appends the ]
.
Note also that all of the commands discussed so far
will leave you with a file with no newline characters,
not even one at the end.ÃÂ
This is a malformed text file,
and some commands my fail to process it properly.ÃÂ
If thatâÂÂs really what you want, OK, thatâÂÂs fine with me.ÃÂ
Otherwise, add
echo >> "$FILE"
or
printf 'n' >> "$FILE"
at the end of your script.
answered Jul 5 at 4:51
G-Man
11.4k82656
11.4k82656
The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â Chris
Jul 5 at 11:39
** the comma at the end
â Chris
Jul 5 at 11:50
(1)â¯IâÂÂm not very familiar with JSON or the programs that manipulate it.â ItâÂÂs possible that they wonâÂÂt care about the missing newline.â In particular, if youâÂÂve gotten Program X to work with the output from your current script, you probably wonâÂÂt have any trouble with it in the future.âÂÂ(2)â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., âÂÂThe quick brown foxâ¯â¦âÂÂ).â I think youâÂÂll find that the extra comma is there; thetruncateâ¯--size=-1
removes only the final newline.â Try runningsedâ¯-zâ¯'s/.*/[&]/'
on a small file for comparison.
â G-Man
Jul 5 at 15:43
add a comment |Â
The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â Chris
Jul 5 at 11:39
** the comma at the end
â Chris
Jul 5 at 11:50
(1)â¯IâÂÂm not very familiar with JSON or the programs that manipulate it.â ItâÂÂs possible that they wonâÂÂt care about the missing newline.â In particular, if youâÂÂve gotten Program X to work with the output from your current script, you probably wonâÂÂt have any trouble with it in the future.âÂÂ(2)â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., âÂÂThe quick brown foxâ¯â¦âÂÂ).â I think youâÂÂll find that the extra comma is there; thetruncateâ¯--size=-1
removes only the final newline.â Try runningsedâ¯-zâ¯'s/.*/[&]/'
on a small file for comparison.
â G-Man
Jul 5 at 15:43
The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â Chris
Jul 5 at 11:39
The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â Chris
Jul 5 at 11:39
** the comma at the end
â Chris
Jul 5 at 11:50
** the comma at the end
â Chris
Jul 5 at 11:50
(1)â¯IâÂÂm not very familiar with JSON or the programs that manipulate it.â ItâÂÂs possible that they wonâÂÂt care about the missing newline.â In particular, if youâÂÂve gotten Program X to work with the output from your current script, you probably wonâÂÂt have any trouble with it in the future.âÂÂ(2)â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., âÂÂThe quick brown foxâ¯â¦âÂÂ).â I think youâÂÂll find that the extra comma is there; the
truncateâ¯--size=-1
removes only the final newline.â Try running sedâ¯-zâ¯'s/.*/[&]/'
on a small file for comparison.â G-Man
Jul 5 at 15:43
(1)â¯IâÂÂm not very familiar with JSON or the programs that manipulate it.â ItâÂÂs possible that they wonâÂÂt care about the missing newline.â In particular, if youâÂÂve gotten Program X to work with the output from your current script, you probably wonâÂÂt have any trouble with it in the future.âÂÂ(2)â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., âÂÂThe quick brown foxâ¯â¦âÂÂ).â I think youâÂÂll find that the extra comma is there; the
truncateâ¯--size=-1
removes only the final newline.â Try running sedâ¯-zâ¯'s/.*/[&]/'
on a small file for comparison.â G-Man
Jul 5 at 15:43
add a comment |Â
up vote
1
down vote
If you didn't require sed, awk can do this, IMHO a bit more clearly:
Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:
awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"
- as G-Man said, leave off the
sponge
part for debugging - if you don't want the newline at the end, leave out the
n
Add: modified method, based on the request to remove the last comma before adding the brackets:
awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"
(In awk an uninitialized variable in string context is guaranteed to yield an empty string, but if you prefer to be explicit add -vsep=
to the options or ;sep=""
to the BEGIN
block to initialize it.)
got unexpected newline or end of string, hmm let's see
â Chris
Jul 5 at 11:44
@Chris: I fixed the âÂÂunexpected newline or end of stringâ problem.
â G-Man
Jul 5 at 16:06
dave_thompson_085: Your solution behaves the same as the first command in my answer â it includes a comma after the last line of input ("the lazy dog."
), before the]
â which Chris doesnâÂÂt want.â Can you remove that?
â G-Man
Jul 5 at 16:06
@G-Man: tnx and done
â dave_thompson_085
Jul 6 at 2:13
add a comment |Â
up vote
1
down vote
If you didn't require sed, awk can do this, IMHO a bit more clearly:
Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:
awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"
- as G-Man said, leave off the
sponge
part for debugging - if you don't want the newline at the end, leave out the
n
Add: modified method, based on the request to remove the last comma before adding the brackets:
awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"
(In awk an uninitialized variable in string context is guaranteed to yield an empty string, but if you prefer to be explicit add -vsep=
to the options or ;sep=""
to the BEGIN
block to initialize it.)
got unexpected newline or end of string, hmm let's see
â Chris
Jul 5 at 11:44
@Chris: I fixed the âÂÂunexpected newline or end of stringâ problem.
â G-Man
Jul 5 at 16:06
dave_thompson_085: Your solution behaves the same as the first command in my answer â it includes a comma after the last line of input ("the lazy dog."
), before the]
â which Chris doesnâÂÂt want.â Can you remove that?
â G-Man
Jul 5 at 16:06
@G-Man: tnx and done
â dave_thompson_085
Jul 6 at 2:13
add a comment |Â
up vote
1
down vote
up vote
1
down vote
If you didn't require sed, awk can do this, IMHO a bit more clearly:
Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:
awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"
- as G-Man said, leave off the
sponge
part for debugging - if you don't want the newline at the end, leave out the
n
Add: modified method, based on the request to remove the last comma before adding the brackets:
awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"
(In awk an uninitialized variable in string context is guaranteed to yield an empty string, but if you prefer to be explicit add -vsep=
to the options or ;sep=""
to the BEGIN
block to initialize it.)
If you didn't require sed, awk can do this, IMHO a bit more clearly:
Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:
awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"
- as G-Man said, leave off the
sponge
part for debugging - if you don't want the newline at the end, leave out the
n
Add: modified method, based on the request to remove the last comma before adding the brackets:
awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"
(In awk an uninitialized variable in string context is guaranteed to yield an empty string, but if you prefer to be explicit add -vsep=
to the options or ;sep=""
to the BEGIN
block to initialize it.)
edited Jul 6 at 2:13
answered Jul 5 at 6:39
dave_thompson_085
1,9451810
1,9451810
got unexpected newline or end of string, hmm let's see
â Chris
Jul 5 at 11:44
@Chris: I fixed the âÂÂunexpected newline or end of stringâ problem.
â G-Man
Jul 5 at 16:06
dave_thompson_085: Your solution behaves the same as the first command in my answer â it includes a comma after the last line of input ("the lazy dog."
), before the]
â which Chris doesnâÂÂt want.â Can you remove that?
â G-Man
Jul 5 at 16:06
@G-Man: tnx and done
â dave_thompson_085
Jul 6 at 2:13
add a comment |Â
got unexpected newline or end of string, hmm let's see
â Chris
Jul 5 at 11:44
@Chris: I fixed the âÂÂunexpected newline or end of stringâ problem.
â G-Man
Jul 5 at 16:06
dave_thompson_085: Your solution behaves the same as the first command in my answer â it includes a comma after the last line of input ("the lazy dog."
), before the]
â which Chris doesnâÂÂt want.â Can you remove that?
â G-Man
Jul 5 at 16:06
@G-Man: tnx and done
â dave_thompson_085
Jul 6 at 2:13
got unexpected newline or end of string, hmm let's see
â Chris
Jul 5 at 11:44
got unexpected newline or end of string, hmm let's see
â Chris
Jul 5 at 11:44
@Chris: I fixed the âÂÂunexpected newline or end of stringâ problem.
â G-Man
Jul 5 at 16:06
@Chris: I fixed the âÂÂunexpected newline or end of stringâ problem.
â G-Man
Jul 5 at 16:06
dave_thompson_085: Your solution behaves the same as the first command in my answer â it includes a comma after the last line of input (
"the lazy dog."
), before the ]
â which Chris doesnâÂÂt want.â Can you remove that?â G-Man
Jul 5 at 16:06
dave_thompson_085: Your solution behaves the same as the first command in my answer â it includes a comma after the last line of input (
"the lazy dog."
), before the ]
â which Chris doesnâÂÂt want.â Can you remove that?â G-Man
Jul 5 at 16:06
@G-Man: tnx and done
â dave_thompson_085
Jul 6 at 2:13
@G-Man: tnx and done
â dave_thompson_085
Jul 6 at 2:13
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f453507%2fsed-regex-input-buffer-length-larger-than-int-max%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
without the expression, you'll get less useful answers.
â Thomas Dickey
Jul 4 at 23:37
@ThomasDickey updated the post with the sed expression
â Chris
Jul 4 at 23:42
It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
â Thomas Dickey
Jul 4 at 23:52
1
Rather than slurping the whole file into memory (using
-z
) why not insert the[
at the start of the first line and the]
at the end of the last?sed -e '1s/^/[/' -e '$s/$/]/'
â steeldriver
Jul 5 at 0:43
1
@ThomasDickey: (1)â¯Yes, the regular expressions in ChrisâÂÂs commands are âÂÂunanchoredâ â they donâÂÂt begin with
^
or end with$
.âÂÂWhat does that have to do with the question?âÂÂHow does that comment help the OP?âÂÂ(2)â¯Command line utilities are on-topic at U&L.âÂÂSo are shell scripts, within reason;âÂÂa three-line script is certainly reasonable for discussion here.âÂÂ(3)â¯We prefer not to use the word âÂÂforumâ when talking about Stack Exchange.â G-Man
Jul 5 at 5:10