sed: regex input buffer length larger than INT_MAX

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-2
down vote

favorite












I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.



sed: regex input buffer length larger than INT_MAX


My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line). 
For example, an input of



The quick brown fox
jumps over
the lazy dog.


should yield a result of



["The quick brown fox","jumps over","the lazy dog.",]


Assume that the input file doesn’t contain any quotes.



The code I run is this:



cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"

truncate --size=-1 "$FILE"

cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"


sed version:



sed --version
sed (GNU sed) 4.5


Any thoughts?







share|improve this question





















  • without the expression, you'll get less useful answers.
    – Thomas Dickey
    Jul 4 at 23:37










  • @ThomasDickey updated the post with the sed expression
    – Chris
    Jul 4 at 23:42











  • It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
    – Thomas Dickey
    Jul 4 at 23:52






  • 1




    Rather than slurping the whole file into memory (using -z) why not insert the [ at the start of the first line and the ] at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
    – steeldriver
    Jul 5 at 0:43






  • 1




    @ThomasDickey: (1) Yes, the regular expressions in Chris’s commands are “unanchored” — they don’t begin with ^ or end with $. What does that have to do with the question? How does that comment help the OP? (2) Command line utilities are on-topic at U&L. So are shell scripts, within reason; a three-line script is certainly reasonable for discussion here. (3) We prefer not to use the word “forum” when talking about Stack Exchange.
    – G-Man
    Jul 5 at 5:10














up vote
-2
down vote

favorite












I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.



sed: regex input buffer length larger than INT_MAX


My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line). 
For example, an input of



The quick brown fox
jumps over
the lazy dog.


should yield a result of



["The quick brown fox","jumps over","the lazy dog.",]


Assume that the input file doesn’t contain any quotes.



The code I run is this:



cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"

truncate --size=-1 "$FILE"

cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"


sed version:



sed --version
sed (GNU sed) 4.5


Any thoughts?







share|improve this question





















  • without the expression, you'll get less useful answers.
    – Thomas Dickey
    Jul 4 at 23:37










  • @ThomasDickey updated the post with the sed expression
    – Chris
    Jul 4 at 23:42











  • It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
    – Thomas Dickey
    Jul 4 at 23:52






  • 1




    Rather than slurping the whole file into memory (using -z) why not insert the [ at the start of the first line and the ] at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
    – steeldriver
    Jul 5 at 0:43






  • 1




    @ThomasDickey: (1) Yes, the regular expressions in Chris’s commands are “unanchored” — they don’t begin with ^ or end with $. What does that have to do with the question? How does that comment help the OP? (2) Command line utilities are on-topic at U&L. So are shell scripts, within reason; a three-line script is certainly reasonable for discussion here. (3) We prefer not to use the word “forum” when talking about Stack Exchange.
    – G-Man
    Jul 5 at 5:10












up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.



sed: regex input buffer length larger than INT_MAX


My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line). 
For example, an input of



The quick brown fox
jumps over
the lazy dog.


should yield a result of



["The quick brown fox","jumps over","the lazy dog.",]


Assume that the input file doesn’t contain any quotes.



The code I run is this:



cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"

truncate --size=-1 "$FILE"

cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"


sed version:



sed --version
sed (GNU sed) 4.5


Any thoughts?







share|improve this question













I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.



sed: regex input buffer length larger than INT_MAX


My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line). 
For example, an input of



The quick brown fox
jumps over
the lazy dog.


should yield a result of



["The quick brown fox","jumps over","the lazy dog.",]


Assume that the input file doesn’t contain any quotes.



The code I run is this:



cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"

truncate --size=-1 "$FILE"

cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"


sed version:



sed --version
sed (GNU sed) 4.5


Any thoughts?









share|improve this question












share|improve this question




share|improve this question








edited Jul 5 at 4:59









G-Man

11.4k82656




11.4k82656









asked Jul 4 at 23:35









Chris

11




11











  • without the expression, you'll get less useful answers.
    – Thomas Dickey
    Jul 4 at 23:37










  • @ThomasDickey updated the post with the sed expression
    – Chris
    Jul 4 at 23:42











  • It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
    – Thomas Dickey
    Jul 4 at 23:52






  • 1




    Rather than slurping the whole file into memory (using -z) why not insert the [ at the start of the first line and the ] at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
    – steeldriver
    Jul 5 at 0:43






  • 1




    @ThomasDickey: (1) Yes, the regular expressions in Chris’s commands are “unanchored” — they don’t begin with ^ or end with $. What does that have to do with the question? How does that comment help the OP? (2) Command line utilities are on-topic at U&L. So are shell scripts, within reason; a three-line script is certainly reasonable for discussion here. (3) We prefer not to use the word “forum” when talking about Stack Exchange.
    – G-Man
    Jul 5 at 5:10
















  • without the expression, you'll get less useful answers.
    – Thomas Dickey
    Jul 4 at 23:37










  • @ThomasDickey updated the post with the sed expression
    – Chris
    Jul 4 at 23:42











  • It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
    – Thomas Dickey
    Jul 4 at 23:52






  • 1




    Rather than slurping the whole file into memory (using -z) why not insert the [ at the start of the first line and the ] at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
    – steeldriver
    Jul 5 at 0:43






  • 1




    @ThomasDickey: (1) Yes, the regular expressions in Chris’s commands are “unanchored” — they don’t begin with ^ or end with $. What does that have to do with the question? How does that comment help the OP? (2) Command line utilities are on-topic at U&L. So are shell scripts, within reason; a three-line script is certainly reasonable for discussion here. (3) We prefer not to use the word “forum” when talking about Stack Exchange.
    – G-Man
    Jul 5 at 5:10















without the expression, you'll get less useful answers.
– Thomas Dickey
Jul 4 at 23:37




without the expression, you'll get less useful answers.
– Thomas Dickey
Jul 4 at 23:37












@ThomasDickey updated the post with the sed expression
– Chris
Jul 4 at 23:42





@ThomasDickey updated the post with the sed expression
– Chris
Jul 4 at 23:42













It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
– Thomas Dickey
Jul 4 at 23:52




It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
– Thomas Dickey
Jul 4 at 23:52




1




1




Rather than slurping the whole file into memory (using -z) why not insert the [ at the start of the first line and the ] at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
– steeldriver
Jul 5 at 0:43




Rather than slurping the whole file into memory (using -z) why not insert the [ at the start of the first line and the ] at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
– steeldriver
Jul 5 at 0:43




1




1




@ThomasDickey: (1) Yes, the regular expressions in Chris’s commands are “unanchored” — they don’t begin with ^ or end with $. What does that have to do with the question? How does that comment help the OP? (2) Command line utilities are on-topic at U&L. So are shell scripts, within reason; a three-line script is certainly reasonable for discussion here. (3) We prefer not to use the word “forum” when talking about Stack Exchange.
– G-Man
Jul 5 at 5:10




@ThomasDickey: (1) Yes, the regular expressions in Chris’s commands are “unanchored” — they don’t begin with ^ or end with $. What does that have to do with the question? How does that comment help the OP? (2) Command line utilities are on-topic at U&L. So are shell scripts, within reason; a three-line script is certainly reasonable for discussion here. (3) We prefer not to use the word “forum” when talking about Stack Exchange.
– G-Man
Jul 5 at 5:10










2 Answers
2






active

oldest

votes

















up vote
4
down vote













Your question is strange. 
You say “… this error just came up. 
I tried googling it but there didn't find any result with this.”,
making it sound like you have no idea what’s happening. 
But you do understand it, don’t you? 
When you say sed -z, you’re telling sed to read the input,
treating NUL as record (line) separators instead of newline. 
But text files typically don’t have NUL characters in them,
so, in practical terms,
this means that you want sed to read the entire file
and treat it as one line. 
You obviously understand this; your 's/.*/[&]/' command,
to “enclose the entirety of the file with square brackets”,
doesn’t make sense unless you expect the entirety of the file
to be treated as a single line.



So why are you so surprised that your big file
is too big to be handled as a single line?



You say that your script works sometimes —
presumably when the size of the file
is below the maximum line size permitted by sed. 
This script should do the same thing, regardless of the size of the file:



cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'


Of course this can still choke if any individual line in the input
is absurdly long.



Notes:



  • You don’t need the and ; "$FILE" is fine.

  • Following the suggestion made by steeldriver,
    this inserts a [ at the beginning of the first line
    and appends a ] at the end of the last line.

  • I left off the sponge for illustration purposes. 
    Overwriting your input file may be operationally necessary,
    but it’s a bad thing to do while you’re still debugging. 
    Add the sponge command back when you’re sure it’s doing what you want.

This duplicates your script, so an input of



The quick brown fox
jumps over
the lazy dog.


will yield a result of



["The quick brown fox","jumps over","the lazy dog.",]


with an extra comma before the ]. 
If that’s really what you want, OK, that’s fine with me. 
If you don’t want the comma at the end, do



cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'


where the '$s/,$/]/' command
removes the comma at the end of the file when it appends the ].



Note also that all of the commands discussed so far
will leave you with a file with no newline characters,
not even one at the end. 
This is a malformed text file,
and some commands my fail to process it properly. 
If that’s really what you want, OK, that’s fine with me. 
Otherwise, add



echo >> "$FILE"


or



printf 'n' >> "$FILE"


at the end of your script.






share|improve this answer





















  • The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
    – Chris
    Jul 5 at 11:39











  • ** the comma at the end
    – Chris
    Jul 5 at 11:50











  • (1) I’m not very familiar with JSON or the programs that manipulate it.  It’s possible that they won’t care about the missing newline.  In particular, if you’ve gotten Program X to work with the output from your current script, you probably won’t have any trouble with it in the future. (2) I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., “The quick brown fox …”).  I think you’ll find that the extra comma is there; the truncate --size=-1 removes only the final newline.  Try running sed -z 's/.*/[&]/' on a small file for comparison.
    – G-Man
    Jul 5 at 15:43


















up vote
1
down vote













If you didn't require sed, awk can do this, IMHO a bit more clearly:



Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:



 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"


  • as G-Man said, leave off the sponge part for debugging

  • if you don't want the newline at the end, leave out the n

Add: modified method, based on the request to remove the last comma before adding the brackets:



 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"


(In awk an uninitialized variable in string context is guaranteed to yield an empty string, but if you prefer to be explicit add -vsep= to the options or ;sep="" to the BEGIN block to initialize it.)






share|improve this answer























  • got unexpected newline or end of string, hmm let's see
    – Chris
    Jul 5 at 11:44











  • @Chris: I fixed the “unexpected newline or end of string” problem.
    – G-Man
    Jul 5 at 16:06










  • dave_thompson_085: Your solution behaves the same as the first command in my answer — it includes a comma after the last line of input ("the lazy dog."), before the ] — which Chris doesn’t want.  Can you remove that?
    – G-Man
    Jul 5 at 16:06










  • @G-Man: tnx and done
    – dave_thompson_085
    Jul 6 at 2:13










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f453507%2fsed-regex-input-buffer-length-larger-than-int-max%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
4
down vote













Your question is strange. 
You say “… this error just came up. 
I tried googling it but there didn't find any result with this.”,
making it sound like you have no idea what’s happening. 
But you do understand it, don’t you? 
When you say sed -z, you’re telling sed to read the input,
treating NUL as record (line) separators instead of newline. 
But text files typically don’t have NUL characters in them,
so, in practical terms,
this means that you want sed to read the entire file
and treat it as one line. 
You obviously understand this; your 's/.*/[&]/' command,
to “enclose the entirety of the file with square brackets”,
doesn’t make sense unless you expect the entirety of the file
to be treated as a single line.



So why are you so surprised that your big file
is too big to be handled as a single line?



You say that your script works sometimes —
presumably when the size of the file
is below the maximum line size permitted by sed. 
This script should do the same thing, regardless of the size of the file:



cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'


Of course this can still choke if any individual line in the input
is absurdly long.



Notes:



  • You don’t need the and ; "$FILE" is fine.

  • Following the suggestion made by steeldriver,
    this inserts a [ at the beginning of the first line
    and appends a ] at the end of the last line.

  • I left off the sponge for illustration purposes. 
    Overwriting your input file may be operationally necessary,
    but it’s a bad thing to do while you’re still debugging. 
    Add the sponge command back when you’re sure it’s doing what you want.

This duplicates your script, so an input of



The quick brown fox
jumps over
the lazy dog.


will yield a result of



["The quick brown fox","jumps over","the lazy dog.",]


with an extra comma before the ]. 
If that’s really what you want, OK, that’s fine with me. 
If you don’t want the comma at the end, do



cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'


where the '$s/,$/]/' command
removes the comma at the end of the file when it appends the ].



Note also that all of the commands discussed so far
will leave you with a file with no newline characters,
not even one at the end. 
This is a malformed text file,
and some commands my fail to process it properly. 
If that’s really what you want, OK, that’s fine with me. 
Otherwise, add



echo >> "$FILE"


or



printf 'n' >> "$FILE"


at the end of your script.






share|improve this answer





















  • The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
    – Chris
    Jul 5 at 11:39











  • ** the comma at the end
    – Chris
    Jul 5 at 11:50











  • (1) I’m not very familiar with JSON or the programs that manipulate it.  It’s possible that they won’t care about the missing newline.  In particular, if you’ve gotten Program X to work with the output from your current script, you probably won’t have any trouble with it in the future. (2) I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., “The quick brown fox …”).  I think you’ll find that the extra comma is there; the truncate --size=-1 removes only the final newline.  Try running sed -z 's/.*/[&]/' on a small file for comparison.
    – G-Man
    Jul 5 at 15:43















up vote
4
down vote













Your question is strange. 
You say “… this error just came up. 
I tried googling it but there didn't find any result with this.”,
making it sound like you have no idea what’s happening. 
But you do understand it, don’t you? 
When you say sed -z, you’re telling sed to read the input,
treating NUL as record (line) separators instead of newline. 
But text files typically don’t have NUL characters in them,
so, in practical terms,
this means that you want sed to read the entire file
and treat it as one line. 
You obviously understand this; your 's/.*/[&]/' command,
to “enclose the entirety of the file with square brackets”,
doesn’t make sense unless you expect the entirety of the file
to be treated as a single line.



So why are you so surprised that your big file
is too big to be handled as a single line?



You say that your script works sometimes —
presumably when the size of the file
is below the maximum line size permitted by sed. 
This script should do the same thing, regardless of the size of the file:



cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'


Of course this can still choke if any individual line in the input
is absurdly long.



Notes:



  • You don’t need the and ; "$FILE" is fine.

  • Following the suggestion made by steeldriver,
    this inserts a [ at the beginning of the first line
    and appends a ] at the end of the last line.

  • I left off the sponge for illustration purposes. 
    Overwriting your input file may be operationally necessary,
    but it’s a bad thing to do while you’re still debugging. 
    Add the sponge command back when you’re sure it’s doing what you want.

This duplicates your script, so an input of



The quick brown fox
jumps over
the lazy dog.


will yield a result of



["The quick brown fox","jumps over","the lazy dog.",]


with an extra comma before the ]. 
If that’s really what you want, OK, that’s fine with me. 
If you don’t want the comma at the end, do



cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'


where the '$s/,$/]/' command
removes the comma at the end of the file when it appends the ].



Note also that all of the commands discussed so far
will leave you with a file with no newline characters,
not even one at the end. 
This is a malformed text file,
and some commands my fail to process it properly. 
If that’s really what you want, OK, that’s fine with me. 
Otherwise, add



echo >> "$FILE"


or



printf 'n' >> "$FILE"


at the end of your script.






share|improve this answer





















  • The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
    – Chris
    Jul 5 at 11:39











  • ** the comma at the end
    – Chris
    Jul 5 at 11:50











  • (1) I’m not very familiar with JSON or the programs that manipulate it.  It’s possible that they won’t care about the missing newline.  In particular, if you’ve gotten Program X to work with the output from your current script, you probably won’t have any trouble with it in the future. (2) I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., “The quick brown fox …”).  I think you’ll find that the extra comma is there; the truncate --size=-1 removes only the final newline.  Try running sed -z 's/.*/[&]/' on a small file for comparison.
    – G-Man
    Jul 5 at 15:43













up vote
4
down vote










up vote
4
down vote









Your question is strange. 
You say “… this error just came up. 
I tried googling it but there didn't find any result with this.”,
making it sound like you have no idea what’s happening. 
But you do understand it, don’t you? 
When you say sed -z, you’re telling sed to read the input,
treating NUL as record (line) separators instead of newline. 
But text files typically don’t have NUL characters in them,
so, in practical terms,
this means that you want sed to read the entire file
and treat it as one line. 
You obviously understand this; your 's/.*/[&]/' command,
to “enclose the entirety of the file with square brackets”,
doesn’t make sense unless you expect the entirety of the file
to be treated as a single line.



So why are you so surprised that your big file
is too big to be handled as a single line?



You say that your script works sometimes —
presumably when the size of the file
is below the maximum line size permitted by sed. 
This script should do the same thing, regardless of the size of the file:



cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'


Of course this can still choke if any individual line in the input
is absurdly long.



Notes:



  • You don’t need the and ; "$FILE" is fine.

  • Following the suggestion made by steeldriver,
    this inserts a [ at the beginning of the first line
    and appends a ] at the end of the last line.

  • I left off the sponge for illustration purposes. 
    Overwriting your input file may be operationally necessary,
    but it’s a bad thing to do while you’re still debugging. 
    Add the sponge command back when you’re sure it’s doing what you want.

This duplicates your script, so an input of



The quick brown fox
jumps over
the lazy dog.


will yield a result of



["The quick brown fox","jumps over","the lazy dog.",]


with an extra comma before the ]. 
If that’s really what you want, OK, that’s fine with me. 
If you don’t want the comma at the end, do



cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'


where the '$s/,$/]/' command
removes the comma at the end of the file when it appends the ].



Note also that all of the commands discussed so far
will leave you with a file with no newline characters,
not even one at the end. 
This is a malformed text file,
and some commands my fail to process it properly. 
If that’s really what you want, OK, that’s fine with me. 
Otherwise, add



echo >> "$FILE"


or



printf 'n' >> "$FILE"


at the end of your script.






share|improve this answer













Your question is strange. 
You say “… this error just came up. 
I tried googling it but there didn't find any result with this.”,
making it sound like you have no idea what’s happening. 
But you do understand it, don’t you? 
When you say sed -z, you’re telling sed to read the input,
treating NUL as record (line) separators instead of newline. 
But text files typically don’t have NUL characters in them,
so, in practical terms,
this means that you want sed to read the entire file
and treat it as one line. 
You obviously understand this; your 's/.*/[&]/' command,
to “enclose the entirety of the file with square brackets”,
doesn’t make sense unless you expect the entirety of the file
to be treated as a single line.



So why are you so surprised that your big file
is too big to be handled as a single line?



You say that your script works sometimes —
presumably when the size of the file
is below the maximum line size permitted by sed. 
This script should do the same thing, regardless of the size of the file:



cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'


Of course this can still choke if any individual line in the input
is absurdly long.



Notes:



  • You don’t need the and ; "$FILE" is fine.

  • Following the suggestion made by steeldriver,
    this inserts a [ at the beginning of the first line
    and appends a ] at the end of the last line.

  • I left off the sponge for illustration purposes. 
    Overwriting your input file may be operationally necessary,
    but it’s a bad thing to do while you’re still debugging. 
    Add the sponge command back when you’re sure it’s doing what you want.

This duplicates your script, so an input of



The quick brown fox
jumps over
the lazy dog.


will yield a result of



["The quick brown fox","jumps over","the lazy dog.",]


with an extra comma before the ]. 
If that’s really what you want, OK, that’s fine with me. 
If you don’t want the comma at the end, do



cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'


where the '$s/,$/]/' command
removes the comma at the end of the file when it appends the ].



Note also that all of the commands discussed so far
will leave you with a file with no newline characters,
not even one at the end. 
This is a malformed text file,
and some commands my fail to process it properly. 
If that’s really what you want, OK, that’s fine with me. 
Otherwise, add



echo >> "$FILE"


or



printf 'n' >> "$FILE"


at the end of your script.







share|improve this answer













share|improve this answer



share|improve this answer











answered Jul 5 at 4:51









G-Man

11.4k82656




11.4k82656











  • The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
    – Chris
    Jul 5 at 11:39











  • ** the comma at the end
    – Chris
    Jul 5 at 11:50











  • (1) I’m not very familiar with JSON or the programs that manipulate it.  It’s possible that they won’t care about the missing newline.  In particular, if you’ve gotten Program X to work with the output from your current script, you probably won’t have any trouble with it in the future. (2) I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., “The quick brown fox …”).  I think you’ll find that the extra comma is there; the truncate --size=-1 removes only the final newline.  Try running sed -z 's/.*/[&]/' on a small file for comparison.
    – G-Man
    Jul 5 at 15:43

















  • The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
    – Chris
    Jul 5 at 11:39











  • ** the comma at the end
    – Chris
    Jul 5 at 11:50











  • (1) I’m not very familiar with JSON or the programs that manipulate it.  It’s possible that they won’t care about the missing newline.  In particular, if you’ve gotten Program X to work with the output from your current script, you probably won’t have any trouble with it in the future. (2) I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., “The quick brown fox …”).  I think you’ll find that the extra comma is there; the truncate --size=-1 removes only the final newline.  Try running sed -z 's/.*/[&]/' on a small file for comparison.
    – G-Man
    Jul 5 at 15:43
















The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
– Chris
Jul 5 at 11:39





The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
– Chris
Jul 5 at 11:39













** the comma at the end
– Chris
Jul 5 at 11:50





** the comma at the end
– Chris
Jul 5 at 11:50













(1) I’m not very familiar with JSON or the programs that manipulate it.  It’s possible that they won’t care about the missing newline.  In particular, if you’ve gotten Program X to work with the output from your current script, you probably won’t have any trouble with it in the future. (2) I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., “The quick brown fox …”).  I think you’ll find that the extra comma is there; the truncate --size=-1 removes only the final newline.  Try running sed -z 's/.*/[&]/' on a small file for comparison.
– G-Man
Jul 5 at 15:43





(1) I’m not very familiar with JSON or the programs that manipulate it.  It’s possible that they won’t care about the missing newline.  In particular, if you’ve gotten Program X to work with the output from your current script, you probably won’t have any trouble with it in the future. (2) I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., “The quick brown fox …”).  I think you’ll find that the extra comma is there; the truncate --size=-1 removes only the final newline.  Try running sed -z 's/.*/[&]/' on a small file for comparison.
– G-Man
Jul 5 at 15:43













up vote
1
down vote













If you didn't require sed, awk can do this, IMHO a bit more clearly:



Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:



 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"


  • as G-Man said, leave off the sponge part for debugging

  • if you don't want the newline at the end, leave out the n

Add: modified method, based on the request to remove the last comma before adding the brackets:



 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"


(In awk an uninitialized variable in string context is guaranteed to yield an empty string, but if you prefer to be explicit add -vsep= to the options or ;sep="" to the BEGIN block to initialize it.)






share|improve this answer























  • got unexpected newline or end of string, hmm let's see
    – Chris
    Jul 5 at 11:44











  • @Chris: I fixed the “unexpected newline or end of string” problem.
    – G-Man
    Jul 5 at 16:06










  • dave_thompson_085: Your solution behaves the same as the first command in my answer — it includes a comma after the last line of input ("the lazy dog."), before the ] — which Chris doesn’t want.  Can you remove that?
    – G-Man
    Jul 5 at 16:06










  • @G-Man: tnx and done
    – dave_thompson_085
    Jul 6 at 2:13














up vote
1
down vote













If you didn't require sed, awk can do this, IMHO a bit more clearly:



Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:



 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"


  • as G-Man said, leave off the sponge part for debugging

  • if you don't want the newline at the end, leave out the n

Add: modified method, based on the request to remove the last comma before adding the brackets:



 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"


(In awk an uninitialized variable in string context is guaranteed to yield an empty string, but if you prefer to be explicit add -vsep= to the options or ;sep="" to the BEGIN block to initialize it.)






share|improve this answer























  • got unexpected newline or end of string, hmm let's see
    – Chris
    Jul 5 at 11:44











  • @Chris: I fixed the “unexpected newline or end of string” problem.
    – G-Man
    Jul 5 at 16:06










  • dave_thompson_085: Your solution behaves the same as the first command in my answer — it includes a comma after the last line of input ("the lazy dog."), before the ] — which Chris doesn’t want.  Can you remove that?
    – G-Man
    Jul 5 at 16:06










  • @G-Man: tnx and done
    – dave_thompson_085
    Jul 6 at 2:13












up vote
1
down vote










up vote
1
down vote









If you didn't require sed, awk can do this, IMHO a bit more clearly:



Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:



 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"


  • as G-Man said, leave off the sponge part for debugging

  • if you don't want the newline at the end, leave out the n

Add: modified method, based on the request to remove the last comma before adding the brackets:



 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"


(In awk an uninitialized variable in string context is guaranteed to yield an empty string, but if you prefer to be explicit add -vsep= to the options or ;sep="" to the BEGIN block to initialize it.)






share|improve this answer















If you didn't require sed, awk can do this, IMHO a bit more clearly:



Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:



 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"


  • as G-Man said, leave off the sponge part for debugging

  • if you don't want the newline at the end, leave out the n

Add: modified method, based on the request to remove the last comma before adding the brackets:



 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"


(In awk an uninitialized variable in string context is guaranteed to yield an empty string, but if you prefer to be explicit add -vsep= to the options or ;sep="" to the BEGIN block to initialize it.)







share|improve this answer















share|improve this answer



share|improve this answer








edited Jul 6 at 2:13


























answered Jul 5 at 6:39









dave_thompson_085

1,9451810




1,9451810











  • got unexpected newline or end of string, hmm let's see
    – Chris
    Jul 5 at 11:44











  • @Chris: I fixed the “unexpected newline or end of string” problem.
    – G-Man
    Jul 5 at 16:06










  • dave_thompson_085: Your solution behaves the same as the first command in my answer — it includes a comma after the last line of input ("the lazy dog."), before the ] — which Chris doesn’t want.  Can you remove that?
    – G-Man
    Jul 5 at 16:06










  • @G-Man: tnx and done
    – dave_thompson_085
    Jul 6 at 2:13
















  • got unexpected newline or end of string, hmm let's see
    – Chris
    Jul 5 at 11:44











  • @Chris: I fixed the “unexpected newline or end of string” problem.
    – G-Man
    Jul 5 at 16:06










  • dave_thompson_085: Your solution behaves the same as the first command in my answer — it includes a comma after the last line of input ("the lazy dog."), before the ] — which Chris doesn’t want.  Can you remove that?
    – G-Man
    Jul 5 at 16:06










  • @G-Man: tnx and done
    – dave_thompson_085
    Jul 6 at 2:13















got unexpected newline or end of string, hmm let's see
– Chris
Jul 5 at 11:44





got unexpected newline or end of string, hmm let's see
– Chris
Jul 5 at 11:44













@Chris: I fixed the “unexpected newline or end of string” problem.
– G-Man
Jul 5 at 16:06




@Chris: I fixed the “unexpected newline or end of string” problem.
– G-Man
Jul 5 at 16:06












dave_thompson_085: Your solution behaves the same as the first command in my answer — it includes a comma after the last line of input ("the lazy dog."), before the ] — which Chris doesn’t want.  Can you remove that?
– G-Man
Jul 5 at 16:06




dave_thompson_085: Your solution behaves the same as the first command in my answer — it includes a comma after the last line of input ("the lazy dog."), before the ] — which Chris doesn’t want.  Can you remove that?
– G-Man
Jul 5 at 16:06












@G-Man: tnx and done
– dave_thompson_085
Jul 6 at 2:13




@G-Man: tnx and done
– dave_thompson_085
Jul 6 at 2:13












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f453507%2fsed-regex-input-buffer-length-larger-than-int-max%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

How many registers does an x86_64 CPU actually have?

Nur Jahan