sed: regex input buffer length larger than INT

sed: regex input buffer length larger than INT_MAX

up vote
-2
down vote

favorite

I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.

sed: regex input buffer length larger than INT_MAX

My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line).Ã‚Â
For example, an input of

The quick brown fox
jumps over
the lazy dog.

should yield a result of

["The quick brown fox","jumps over","the lazy dog.",]

Assume that the input file doesnÃ¢Â€Â™t contain any quotes.

The code I run is this:

cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"

truncate --size=-1 "$FILE"

cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"

sed version:

sed --version
sed (GNU sed) 4.5

Any thoughts?

edited Jul 5 at 4:59

G-Man

11.4k82656

asked Jul 4 at 23:35

Chris

without the expression, you'll get less useful answers.
â€“Â Thomas Dickey
Jul 4 at 23:37

@ThomasDickey updated the post with the sed expression
â€“Â Chris
Jul 4 at 23:42

It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
â€“Â Thomas Dickey
Jul 4 at 23:52

1

Rather than slurping the whole file into memory (using -z) why not insert the [ at the start of the first line and the ] at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
â€“Â steeldriver
Jul 5 at 0:43

1

@ThomasDickey: (1)Ã¢Â€Â¯Yes, the regular expressions in ChrisÃ¢Â€Â™s commands are Ã¢Â€ÂœunanchoredÃ¢Â€Â Ã¢Â€Â” they donÃ¢Â€Â™t begin with ^ or end with $.Ã¢Â€ÂƒWhat does that have to do with the question?Ã¢Â€ÂƒHow does that comment help the OP?Ã¢Â€Âƒ(2)Ã¢Â€Â¯Command line utilities are on-topic at U&L.Ã¢Â€ÂƒSo are shell scripts, within reason;Ã¢Â€Â‚a three-line script is certainly reasonable for discussion here.Ã¢Â€Âƒ(3)Ã¢Â€Â¯We prefer not to use the word Ã¢Â€ÂœforumÃ¢Â€Â when talking about Stack Exchange.
â€“Â G-Man
Jul 5 at 5:10

Â |Â
show 3 more comments

up vote
-2
down vote

favorite

I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.

sed: regex input buffer length larger than INT_MAX

My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line).Ã‚Â
For example, an input of

The quick brown fox
jumps over
the lazy dog.

should yield a result of

["The quick brown fox","jumps over","the lazy dog.",]

Assume that the input file doesnÃ¢Â€Â™t contain any quotes.

The code I run is this:

cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"

truncate --size=-1 "$FILE"

cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"

sed version:

sed --version
sed (GNU sed) 4.5

Any thoughts?

edited Jul 5 at 4:59

G-Man

11.4k82656

asked Jul 4 at 23:35

Chris

without the expression, you'll get less useful answers.
â€“Â Thomas Dickey
Jul 4 at 23:37

@ThomasDickey updated the post with the sed expression
â€“Â Chris
Jul 4 at 23:42

It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
â€“Â Thomas Dickey
Jul 4 at 23:52

1

Rather than slurping the whole file into memory (using -z) why not insert the [ at the start of the first line and the ] at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
â€“Â steeldriver
Jul 5 at 0:43

1

@ThomasDickey: (1)Ã¢Â€Â¯Yes, the regular expressions in ChrisÃ¢Â€Â™s commands are Ã¢Â€ÂœunanchoredÃ¢Â€Â Ã¢Â€Â” they donÃ¢Â€Â™t begin with ^ or end with $.Ã¢Â€ÂƒWhat does that have to do with the question?Ã¢Â€ÂƒHow does that comment help the OP?Ã¢Â€Âƒ(2)Ã¢Â€Â¯Command line utilities are on-topic at U&L.Ã¢Â€ÂƒSo are shell scripts, within reason;Ã¢Â€Â‚a three-line script is certainly reasonable for discussion here.Ã¢Â€Âƒ(3)Ã¢Â€Â¯We prefer not to use the word Ã¢Â€ÂœforumÃ¢Â€Â when talking about Stack Exchange.
â€“Â G-Man
Jul 5 at 5:10

Â |Â
show 3 more comments

up vote
-2
down vote

favorite

I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.

sed: regex input buffer length larger than INT_MAX

My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line).Ã‚Â
For example, an input of

The quick brown fox
jumps over
the lazy dog.

should yield a result of

["The quick brown fox","jumps over","the lazy dog.",]

Assume that the input file doesnÃ¢Â€Â™t contain any quotes.

The code I run is this:

cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"

truncate --size=-1 "$FILE"

cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"

sed version:

sed --version
sed (GNU sed) 4.5

Any thoughts?

edited Jul 5 at 4:59

G-Man

11.4k82656

asked Jul 4 at 23:35

Chris

I have a big file to which I am doing various operations, and this error just came up. I tried googling it but there didn't find any result with this.

sed: regex input buffer length larger than INT_MAX

My purpose is to quote every line, appending a comma,
and subsequently enclose the entirety of the file with square brackets
(as a single line).Ã‚Â
For example, an input of

The quick brown fox
jumps over
the lazy dog.

should yield a result of

["The quick brown fox","jumps over","the lazy dog.",]

Assume that the input file doesnÃ¢Â€Â™t contain any quotes.

The code I run is this:

cat "$FILE" | sed -e 's/.*/"&",/' | sponge "$FILE"

truncate --size=-1 "$FILE"

cat "$FILE" | sed -z 's/.*/[&]/' | tr --delete 'n' | sponge "$FILE"

sed version:

sed --version
sed (GNU sed) 4.5

Any thoughts?

edited Jul 5 at 4:59

G-Man

11.4k82656

asked Jul 4 at 23:35

Chris

edited Jul 5 at 4:59

G-Man

11.4k82656

edited Jul 5 at 4:59

G-Man

11.4k82656

edited Jul 5 at 4:59

G-Man

11.4k82656

asked Jul 4 at 23:35

Chris

asked Jul 4 at 23:35

Chris

asked Jul 4 at 23:35

Chris

without the expression, you'll get less useful answers.
â€“Â Thomas Dickey
Jul 4 at 23:37

@ThomasDickey updated the post with the sed expression
â€“Â Chris
Jul 4 at 23:42

It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
â€“Â Thomas Dickey
Jul 4 at 23:52

1

Rather than slurping the whole file into memory (using -z) why not insert the [ at the start of the first line and the ] at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
â€“Â steeldriver
Jul 5 at 0:43

1

@ThomasDickey: (1)Ã¢Â€Â¯Yes, the regular expressions in ChrisÃ¢Â€Â™s commands are Ã¢Â€ÂœunanchoredÃ¢Â€Â Ã¢Â€Â” they donÃ¢Â€Â™t begin with ^ or end with $.Ã¢Â€ÂƒWhat does that have to do with the question?Ã¢Â€ÂƒHow does that comment help the OP?Ã¢Â€Âƒ(2)Ã¢Â€Â¯Command line utilities are on-topic at U&L.Ã¢Â€ÂƒSo are shell scripts, within reason;Ã¢Â€Â‚a three-line script is certainly reasonable for discussion here.Ã¢Â€Âƒ(3)Ã¢Â€Â¯We prefer not to use the word Ã¢Â€ÂœforumÃ¢Â€Â when talking about Stack Exchange.
â€“Â G-Man
Jul 5 at 5:10

Â |Â
show 3 more comments

without the expression, you'll get less useful answers.
â€“Â Thomas Dickey
Jul 4 at 23:37

@ThomasDickey updated the post with the sed expression
â€“Â Chris
Jul 4 at 23:42

It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
â€“Â Thomas Dickey
Jul 4 at 23:52

1

Rather than slurping the whole file into memory (using -z) why not insert the [ at the start of the first line and the ] at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
â€“Â steeldriver
Jul 5 at 0:43

1

@ThomasDickey: (1)Ã¢Â€Â¯Yes, the regular expressions in ChrisÃ¢Â€Â™s commands are Ã¢Â€ÂœunanchoredÃ¢Â€Â Ã¢Â€Â” they donÃ¢Â€Â™t begin with ^ or end with $.Ã¢Â€ÂƒWhat does that have to do with the question?Ã¢Â€ÂƒHow does that comment help the OP?Ã¢Â€Âƒ(2)Ã¢Â€Â¯Command line utilities are on-topic at U&L.Ã¢Â€ÂƒSo are shell scripts, within reason;Ã¢Â€Â‚a three-line script is certainly reasonable for discussion here.Ã¢Â€Âƒ(3)Ã¢Â€Â¯We prefer not to use the word Ã¢Â€ÂœforumÃ¢Â€Â when talking about Stack Exchange.
â€“Â G-Man
Jul 5 at 5:10

without the expression, you'll get less useful answers.
â€“Â Thomas Dickey
Jul 4 at 23:37

@ThomasDickey updated the post with the sed expression
â€“Â Chris
Jul 4 at 23:42

It's an unanchored regular expression, matching everything without limits, and quoting, then adding a character after each match. That sounds consistent with the error message. By the way, programming questions are in a different forum.
â€“Â Thomas Dickey
Jul 4 at 23:52

Rather than slurping the whole file into memory (using -z) why not insert the [ at the start of the first line and the ] at the end of the last? sed -e '1s/^/[/' -e '$s/$/]/'
â€“Â steeldriver
Jul 5 at 0:43

@ThomasDickey: (1)Ã¢Â€Â¯Yes, the regular expressions in ChrisÃ¢Â€Â™s commands are Ã¢Â€ÂœunanchoredÃ¢Â€Â Ã¢Â€Â” they donÃ¢Â€Â™t begin with ^ or end with $.Ã¢Â€ÂƒWhat does that have to do with the question?Ã¢Â€ÂƒHow does that comment help the OP?Ã¢Â€Âƒ(2)Ã¢Â€Â¯Command line utilities are on-topic at U&L.Ã¢Â€ÂƒSo are shell scripts, within reason;Ã¢Â€Â‚a three-line script is certainly reasonable for discussion here.Ã¢Â€Âƒ(3)Ã¢Â€Â¯We prefer not to use the word Ã¢Â€ÂœforumÃ¢Â€Â when talking about Stack Exchange.
â€“Â G-Man
Jul 5 at 5:10

Â |Â
show 3 more comments

2 Answers
2

active

oldest

votes

up vote
4
down vote

Your question is strange.Ã‚Â
You say Ã¢Â€ÂœÃ¢Â€Â¦Ã‚Â this error just came up.Ã‚Â
I tried googling it but there didn't find any result with this.Ã¢Â€Â,
making it sound like you have no idea whatÃ¢Â€Â™s happening.Ã‚Â
But you do understand it, donÃ¢Â€Â™t you?Ã‚Â
When you say sedÃ‚Â -z, youÃ¢Â€Â™re telling sed to read the input,
treating NUL as record (line) separators instead of newline.Ã‚Â
But text files typically donÃ¢Â€Â™t have NUL characters in them,
so, in practical terms,
this means that you want sed to read the entire file
and treat it as one line.Ã‚Â
You obviously understand this; your 's/.*/[&]/' command,
to Ã¢Â€Âœenclose the entirety of the file with square bracketsÃ¢Â€Â,
doesnÃ¢Â€Â™t make sense unless you expect the entirety of the file
to be treated as a single line.

So why are you so surprised that your big file
is too big to be handled as a single line?

You say that your script works sometimes Ã¢Â€Â”
presumably when the size of the file
is below the maximum line size permitted by sed.Ã‚Â
This script should do the same thing, regardless of the size of the file:

cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'

Of course this can still choke if any individual line in the input
is absurdly long.

Notes:

You donÃ¢Â€Â™t need the and ; "$FILE" is fine.

Following the suggestion made by steeldriver,
this inserts a [ at the beginning of the first line
and appends a ] at the end of the last line.

I left off the sponge for illustration purposes.Ã‚Â
Overwriting your input file may be operationally necessary,
but itÃ¢Â€Â™s a bad thing to do while youÃ¢Â€Â™re still debugging.Ã‚Â
Add the sponge command back when youÃ¢Â€Â™re sure itÃ¢Â€Â™s doing what you want.

This duplicates your script, so an input of

The quick brown fox
jumps over
the lazy dog.

will yield a result of

["The quick brown fox","jumps over","the lazy dog.",]

with an extra comma before the ].Ã‚Â
If thatÃ¢Â€Â™s really what you want, OK, thatÃ¢Â€Â™s fine with me.Ã‚Â
If you donÃ¢Â€Â™t want the comma at the end, do

cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'

where the '$s/,$/]/' command
removes the comma at the end of the file when it appends the ].

Note also that all of the commands discussed so far
will leave you with a file with no newline characters,
not even one at the end.Ã‚Â
This is a malformed text file,
and some commands my fail to process it properly.Ã‚Â
If thatÃ¢Â€Â™s really what you want, OK, thatÃ¢Â€Â™s fine with me.Ã‚Â
Otherwise, add

echo >> "$FILE"

printf 'n' >> "$FILE"

at the end of your script.

answered Jul 5 at 4:51

G-Man

11.4k82656

The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â€“Â Chris
Jul 5 at 11:39

** the comma at the end
â€“Â Chris
Jul 5 at 11:50

(1)Ã¢Â€Â¯IÃ¢Â€Â™m not very familiar with JSON or the programs that manipulate it.Ã¢Â€Â‚ ItÃ¢Â€Â™s possible that they wonÃ¢Â€Â™t care about the missing newline.Ã¢Â€Â‚ In particular, if youÃ¢Â€Â™ve gotten Program X to work with the output from your current script, you probably wonÃ¢Â€Â™t have any trouble with it in the future.Ã¢Â€Âƒ(2)Ã¢Â€Â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., Ã¢Â€ÂœThe quick brown foxÃ¢Â€Â¯Ã¢Â€Â¦Ã¢Â€Â).Ã¢Â€Â‚ I think youÃ¢Â€Â™ll find that the extra comma is there; the truncateÃ¢Â€Â¯--size=-1 removes only the final newline.Ã¢Â€Â‚ Try running sedÃ¢Â€Â¯-zÃ¢Â€Â¯'s/.*/[&]/' on a small file for comparison.
â€“Â G-Man
Jul 5 at 15:43

add a commentÂ |Â

up vote
1
down vote

If you didn't require sed, awk can do this, IMHO a bit more clearly:

Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:

 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"

as G-Man said, leave off the sponge part for debugging

if you don't want the newline at the end, leave out the n

Add: modified method, based on the request to remove the last comma before adding the brackets:

 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"

(In awk an uninitialized variable in string context is guaranteed to yield an empty string, but if you prefer to be explicit add -vsep= to the options or ;sep="" to the BEGIN block to initialize it.)

edited Jul 6 at 2:13

answered Jul 5 at 6:39

dave_thompson_085

1,9451810

got unexpected newline or end of string, hmm let's see
â€“Â Chris
Jul 5 at 11:44

@Chris: I fixed the Ã¢Â€Âœunexpected newline or end of stringÃ¢Â€Â problem.
â€“Â G-Man
Jul 5 at 16:06

dave_thompson_085: Your solution behaves the same as the first command in my answer Ã¢Â€Â” it includes a comma after the last line of input ("the lazy dog."), before the ] Ã¢Â€Â” which Chris doesnÃ¢Â€Â™t want.Ã¢Â€Â‚ Can you remove that?
â€“Â G-Man
Jul 5 at 16:06

@G-Man: tnx and done
â€“Â dave_thompson_085
Jul 6 at 2:13

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f453507%2fsed-regex-input-buffer-length-larger-than-int-max%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
4
down vote

So why are you so surprised that your big file
is too big to be handled as a single line?

cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'

Of course this can still choke if any individual line in the input
is absurdly long.

Notes:

You donÃ¢Â€Â™t need the and ; "$FILE" is fine.

Following the suggestion made by steeldriver,
this inserts a [ at the beginning of the first line
and appends a ] at the end of the last line.

I left off the sponge for illustration purposes.Ã‚Â
Overwriting your input file may be operationally necessary,
but itÃ¢Â€Â™s a bad thing to do while youÃ¢Â€Â™re still debugging.Ã‚Â
Add the sponge command back when youÃ¢Â€Â™re sure itÃ¢Â€Â™s doing what you want.

This duplicates your script, so an input of

The quick brown fox
jumps over
the lazy dog.

will yield a result of

["The quick brown fox","jumps over","the lazy dog.",]

with an extra comma before the ].Ã‚Â
If thatÃ¢Â€Â™s really what you want, OK, thatÃ¢Â€Â™s fine with me.Ã‚Â
If you donÃ¢Â€Â™t want the comma at the end, do

cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'

where the '$s/,$/]/' command
removes the comma at the end of the file when it appends the ].

echo >> "$FILE"

printf 'n' >> "$FILE"

at the end of your script.

answered Jul 5 at 4:51

G-Man

11.4k82656

The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â€“Â Chris
Jul 5 at 11:39

** the comma at the end
â€“Â Chris
Jul 5 at 11:50

(1)Ã¢Â€Â¯IÃ¢Â€Â™m not very familiar with JSON or the programs that manipulate it.Ã¢Â€Â‚ ItÃ¢Â€Â™s possible that they wonÃ¢Â€Â™t care about the missing newline.Ã¢Â€Â‚ In particular, if youÃ¢Â€Â™ve gotten Program X to work with the output from your current script, you probably wonÃ¢Â€Â™t have any trouble with it in the future.Ã¢Â€Âƒ(2)Ã¢Â€Â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., Ã¢Â€ÂœThe quick brown foxÃ¢Â€Â¯Ã¢Â€Â¦Ã¢Â€Â).Ã¢Â€Â‚ I think youÃ¢Â€Â™ll find that the extra comma is there; the truncateÃ¢Â€Â¯--size=-1 removes only the final newline.Ã¢Â€Â‚ Try running sedÃ¢Â€Â¯-zÃ¢Â€Â¯'s/.*/[&]/' on a small file for comparison.
â€“Â G-Man
Jul 5 at 15:43

add a commentÂ |Â

up vote
4
down vote

So why are you so surprised that your big file
is too big to be handled as a single line?

cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'

Of course this can still choke if any individual line in the input
is absurdly long.

Notes:

You donÃ¢Â€Â™t need the and ; "$FILE" is fine.

Following the suggestion made by steeldriver,
this inserts a [ at the beginning of the first line
and appends a ] at the end of the last line.

I left off the sponge for illustration purposes.Ã‚Â
Overwriting your input file may be operationally necessary,
but itÃ¢Â€Â™s a bad thing to do while youÃ¢Â€Â™re still debugging.Ã‚Â
Add the sponge command back when youÃ¢Â€Â™re sure itÃ¢Â€Â™s doing what you want.

This duplicates your script, so an input of

The quick brown fox
jumps over
the lazy dog.

will yield a result of

["The quick brown fox","jumps over","the lazy dog.",]

with an extra comma before the ].Ã‚Â
If thatÃ¢Â€Â™s really what you want, OK, thatÃ¢Â€Â™s fine with me.Ã‚Â
If you donÃ¢Â€Â™t want the comma at the end, do

cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'

where the '$s/,$/]/' command
removes the comma at the end of the file when it appends the ].

echo >> "$FILE"

printf 'n' >> "$FILE"

at the end of your script.

answered Jul 5 at 4:51

G-Man

11.4k82656

The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â€“Â Chris
Jul 5 at 11:39

** the comma at the end
â€“Â Chris
Jul 5 at 11:50

(1)Ã¢Â€Â¯IÃ¢Â€Â™m not very familiar with JSON or the programs that manipulate it.Ã¢Â€Â‚ ItÃ¢Â€Â™s possible that they wonÃ¢Â€Â™t care about the missing newline.Ã¢Â€Â‚ In particular, if youÃ¢Â€Â™ve gotten Program X to work with the output from your current script, you probably wonÃ¢Â€Â™t have any trouble with it in the future.Ã¢Â€Âƒ(2)Ã¢Â€Â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., Ã¢Â€ÂœThe quick brown foxÃ¢Â€Â¯Ã¢Â€Â¦Ã¢Â€Â).Ã¢Â€Â‚ I think youÃ¢Â€Â™ll find that the extra comma is there; the truncateÃ¢Â€Â¯--size=-1 removes only the final newline.Ã¢Â€Â‚ Try running sedÃ¢Â€Â¯-zÃ¢Â€Â¯'s/.*/[&]/' on a small file for comparison.
â€“Â G-Man
Jul 5 at 15:43

add a commentÂ |Â

up vote
4
down vote

So why are you so surprised that your big file
is too big to be handled as a single line?

cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'

Of course this can still choke if any individual line in the input
is absurdly long.

Notes:

You donÃ¢Â€Â™t need the and ; "$FILE" is fine.

Following the suggestion made by steeldriver,
this inserts a [ at the beginning of the first line
and appends a ] at the end of the last line.

I left off the sponge for illustration purposes.Ã‚Â
Overwriting your input file may be operationally necessary,
but itÃ¢Â€Â™s a bad thing to do while youÃ¢Â€Â™re still debugging.Ã‚Â
Add the sponge command back when youÃ¢Â€Â™re sure itÃ¢Â€Â™s doing what you want.

This duplicates your script, so an input of

The quick brown fox
jumps over
the lazy dog.

will yield a result of

["The quick brown fox","jumps over","the lazy dog.",]

with an extra comma before the ].Ã‚Â
If thatÃ¢Â€Â™s really what you want, OK, thatÃ¢Â€Â™s fine with me.Ã‚Â
If you donÃ¢Â€Â™t want the comma at the end, do

cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'

where the '$s/,$/]/' command
removes the comma at the end of the file when it appends the ].

echo >> "$FILE"

printf 'n' >> "$FILE"

at the end of your script.

answered Jul 5 at 4:51

G-Man

11.4k82656

So why are you so surprised that your big file
is too big to be handled as a single line?

cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/$/]/' | tr --delete 'n'

Of course this can still choke if any individual line in the input
is absurdly long.

Notes:

You donÃ¢Â€Â™t need the and ; "$FILE" is fine.

Following the suggestion made by steeldriver,
this inserts a [ at the beginning of the first line
and appends a ] at the end of the last line.

I left off the sponge for illustration purposes.Ã‚Â
Overwriting your input file may be operationally necessary,
but itÃ¢Â€Â™s a bad thing to do while youÃ¢Â€Â™re still debugging.Ã‚Â
Add the sponge command back when youÃ¢Â€Â™re sure itÃ¢Â€Â™s doing what you want.

This duplicates your script, so an input of

The quick brown fox
jumps over
the lazy dog.

will yield a result of

["The quick brown fox","jumps over","the lazy dog.",]

with an extra comma before the ].Ã‚Â
If thatÃ¢Â€Â™s really what you want, OK, thatÃ¢Â€Â™s fine with me.Ã‚Â
If you donÃ¢Â€Â™t want the comma at the end, do

cat "$FILE" | sed -e 's/.*/"&",/' -e '1s/^/[/' -e '$s/,$/]/' | tr --delete 'n'

where the '$s/,$/]/' command
removes the comma at the end of the file when it appends the ].

echo >> "$FILE"

printf 'n' >> "$FILE"

at the end of your script.

answered Jul 5 at 4:51

G-Man

11.4k82656

answered Jul 5 at 4:51

G-Man

11.4k82656

answered Jul 5 at 4:51

G-Man

11.4k82656

answered Jul 5 at 4:51

G-Man

11.4k82656

The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â€“Â Chris
Jul 5 at 11:39

** the comma at the end
â€“Â Chris
Jul 5 at 11:50

(1)Ã¢Â€Â¯IÃ¢Â€Â™m not very familiar with JSON or the programs that manipulate it.Ã¢Â€Â‚ ItÃ¢Â€Â™s possible that they wonÃ¢Â€Â™t care about the missing newline.Ã¢Â€Â‚ In particular, if youÃ¢Â€Â™ve gotten Program X to work with the output from your current script, you probably wonÃ¢Â€Â™t have any trouble with it in the future.Ã¢Â€Âƒ(2)Ã¢Â€Â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., Ã¢Â€ÂœThe quick brown foxÃ¢Â€Â¯Ã¢Â€Â¦Ã¢Â€Â).Ã¢Â€Â‚ I think youÃ¢Â€Â™ll find that the extra comma is there; the truncateÃ¢Â€Â¯--size=-1 removes only the final newline.Ã¢Â€Â‚ Try running sedÃ¢Â€Â¯-zÃ¢Â€Â¯'s/.*/[&]/' on a small file for comparison.
â€“Â G-Man
Jul 5 at 15:43

add a commentÂ |Â

The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â€“Â Chris
Jul 5 at 11:39

** the comma at the end
â€“Â Chris
Jul 5 at 11:50

(1)Ã¢Â€Â¯IÃ¢Â€Â™m not very familiar with JSON or the programs that manipulate it.Ã¢Â€Â‚ ItÃ¢Â€Â™s possible that they wonÃ¢Â€Â™t care about the missing newline.Ã¢Â€Â‚ In particular, if youÃ¢Â€Â™ve gotten Program X to work with the output from your current script, you probably wonÃ¢Â€Â™t have any trouble with it in the future.Ã¢Â€Âƒ(2)Ã¢Â€Â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., Ã¢Â€ÂœThe quick brown foxÃ¢Â€Â¯Ã¢Â€Â¦Ã¢Â€Â).Ã¢Â€Â‚ I think youÃ¢Â€Â™ll find that the extra comma is there; the truncateÃ¢Â€Â¯--size=-1 removes only the final newline.Ã¢Â€Â‚ Try running sedÃ¢Â€Â¯-zÃ¢Â€Â¯'s/.*/[&]/' on a small file for comparison.
â€“Â G-Man
Jul 5 at 15:43

The output file is supposed to be consumed by other programs as a json array. Do you think the new line is going to play a role in that? thank you for your input. Also please note that I was unaware of the term buffer in the context of the message and what the -z flag was really doing, hence my confusion on this. Thank you for clearing it up. Finally yes, I want the command at the end to be gone, that's why I used truncate
â€“Â Chris
Jul 5 at 11:39

** the comma at the end
â€“Â Chris
Jul 5 at 11:50

(1)Ã¢Â€Â¯IÃ¢Â€Â™m not very familiar with JSON or the programs that manipulate it.Ã¢Â€Â‚ ItÃ¢Â€Â™s possible that they wonÃ¢Â€Â™t care about the missing newline.Ã¢Â€Â‚ In particular, if youÃ¢Â€Â™ve gotten Program X to work with the output from your current script, you probably wonÃ¢Â€Â™t have any trouble with it in the future.Ã¢Â€Âƒ(2)Ã¢Â€Â¯I invite you to run your script (the one you posted in the question) again, on a small sample input (e.g., Ã¢Â€ÂœThe quick brown foxÃ¢Â€Â¯Ã¢Â€Â¦Ã¢Â€Â).Ã¢Â€Â‚ I think youÃ¢Â€Â™ll find that the extra comma is there; the truncateÃ¢Â€Â¯--size=-1 removes only the final newline.Ã¢Â€Â‚ Try running sedÃ¢Â€Â¯-zÃ¢Â€Â¯'s/.*/[&]/' on a small file for comparison.
â€“Â G-Man
Jul 5 at 15:43

add a commentÂ |Â

up vote
1
down vote

If you didn't require sed, awk can do this, IMHO a bit more clearly:

Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:

 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"

as G-Man said, leave off the sponge part for debugging

if you don't want the newline at the end, leave out the n

Add: modified method, based on the request to remove the last comma before adding the brackets:

 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"

edited Jul 6 at 2:13

answered Jul 5 at 6:39

dave_thompson_085

1,9451810

got unexpected newline or end of string, hmm let's see
â€“Â Chris
Jul 5 at 11:44

@Chris: I fixed the Ã¢Â€Âœunexpected newline or end of stringÃ¢Â€Â problem.
â€“Â G-Man
Jul 5 at 16:06

dave_thompson_085: Your solution behaves the same as the first command in my answer Ã¢Â€Â” it includes a comma after the last line of input ("the lazy dog."), before the ] Ã¢Â€Â” which Chris doesnÃ¢Â€Â™t want.Ã¢Â€Â‚ Can you remove that?
â€“Â G-Man
Jul 5 at 16:06

@G-Man: tnx and done
â€“Â dave_thompson_085
Jul 6 at 2:13

add a commentÂ |Â

up vote
1
down vote

If you didn't require sed, awk can do this, IMHO a bit more clearly:

Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:

 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"

as G-Man said, leave off the sponge part for debugging

if you don't want the newline at the end, leave out the n

Add: modified method, based on the request to remove the last comma before adding the brackets:

 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"

edited Jul 6 at 2:13

answered Jul 5 at 6:39

dave_thompson_085

1,9451810

got unexpected newline or end of string, hmm let's see
â€“Â Chris
Jul 5 at 11:44

@Chris: I fixed the Ã¢Â€Âœunexpected newline or end of stringÃ¢Â€Â problem.
â€“Â G-Man
Jul 5 at 16:06

dave_thompson_085: Your solution behaves the same as the first command in my answer Ã¢Â€Â” it includes a comma after the last line of input ("the lazy dog."), before the ] Ã¢Â€Â” which Chris doesnÃ¢Â€Â™t want.Ã¢Â€Â‚ Can you remove that?
â€“Â G-Man
Jul 5 at 16:06

@G-Man: tnx and done
â€“Â dave_thompson_085
Jul 6 at 2:13

add a commentÂ |Â

up vote
1
down vote

If you didn't require sed, awk can do this, IMHO a bit more clearly:

Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:

 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"

as G-Man said, leave off the sponge part for debugging

if you don't want the newline at the end, leave out the n

Add: modified method, based on the request to remove the last comma before adding the brackets:

 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"

edited Jul 6 at 2:13

answered Jul 5 at 6:39

dave_thompson_085

1,9451810

If you didn't require sed, awk can do this, IMHO a bit more clearly:

Edit: original method (fixed by G-Man, tnx), which I based on looking at the sample output in the Q WITH comma after last string:

 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print q $0 q "," ENDprint "]n"' | sponge "$FILE"

as G-Man said, leave off the sponge part for debugging

if you don't want the newline at the end, leave out the n

Add: modified method, based on the request to remove the last comma before adding the brackets:

 awk <"$FILE" -vORS= -vq=" 'BEGINprint "[" print sep q $0 q; sep="," ENDprint "]n"' | sponge "$FILE"

edited Jul 6 at 2:13

answered Jul 5 at 6:39

dave_thompson_085

1,9451810

edited Jul 6 at 2:13

answered Jul 5 at 6:39

dave_thompson_085

1,9451810

answered Jul 5 at 6:39

dave_thompson_085

1,9451810

answered Jul 5 at 6:39

dave_thompson_085

1,9451810

got unexpected newline or end of string, hmm let's see
â€“Â Chris
Jul 5 at 11:44

@Chris: I fixed the Ã¢Â€Âœunexpected newline or end of stringÃ¢Â€Â problem.
â€“Â G-Man
Jul 5 at 16:06

dave_thompson_085: Your solution behaves the same as the first command in my answer Ã¢Â€Â” it includes a comma after the last line of input ("the lazy dog."), before the ] Ã¢Â€Â” which Chris doesnÃ¢Â€Â™t want.Ã¢Â€Â‚ Can you remove that?
â€“Â G-Man
Jul 5 at 16:06

@G-Man: tnx and done
â€“Â dave_thompson_085
Jul 6 at 2:13

add a commentÂ |Â

got unexpected newline or end of string, hmm let's see
â€“Â Chris
Jul 5 at 11:44

@Chris: I fixed the Ã¢Â€Âœunexpected newline or end of stringÃ¢Â€Â problem.
â€“Â G-Man
Jul 5 at 16:06

dave_thompson_085: Your solution behaves the same as the first command in my answer Ã¢Â€Â” it includes a comma after the last line of input ("the lazy dog."), before the ] Ã¢Â€Â” which Chris doesnÃ¢Â€Â™t want.Ã¢Â€Â‚ Can you remove that?
â€“Â G-Man
Jul 5 at 16:06

@G-Man: tnx and done
â€“Â dave_thompson_085
Jul 6 at 2:13

got unexpected newline or end of string, hmm let's see
â€“Â Chris
Jul 5 at 11:44

@Chris: I fixed the Ã¢Â€Âœunexpected newline or end of stringÃ¢Â€Â problem.
â€“Â G-Man
Jul 5 at 16:06

dave_thompson_085: Your solution behaves the same as the first command in my answer Ã¢Â€Â” it includes a comma after the last line of input ("the lazy dog."), before the ] Ã¢Â€Â” which Chris doesnÃ¢Â€Â™t want.Ã¢Â€Â‚ Can you remove that?
â€“Â G-Man
Jul 5 at 16:06

@G-Man: tnx and done
â€“Â dave_thompson_085
Jul 6 at 2:13

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu