Fixing malformed CSV with incorrect new line chars using sed or perl only

up vote
0
down vote

favorite

I have a comma-delimited CSV file but for some reason our system inserts a new line character at a random location in the file which causes the entire file to break. I can get the number of columns in the file.

How do I solve it with sed and/or perl in a one liner command? I know it's solvable with awk but this is for learning purposes. If using perl, I don't want to use the built-in CSV functions. Is it solvable?? I'm on this problem for several days i can't seem to find a solution :(

Sample malformed input (lots of randomly inserted n)

policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResiden
tial LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.7
00455,Ã¢Â€ÂœResiden
tial LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,
3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,
Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
4335
12,FL,CLAY COUNTY,-81.704613,
Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

Required output

policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.700455,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
433512,FL,CLAY COUNTY,-81.704613,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

edited Apr 2 at 20:31

Jeff Schaller

31.1k846105

asked Apr 2 at 6:28

Harry McKenzie

Find why the newlines are inserted and fix that. That would be the best resolution to the issue.
â€“Â Kusalananda
Apr 2 at 6:33

Guess: Newlines are inserted when some internal buffer is full, and written out.
â€“Â dirkt
Apr 2 at 6:36

For the actual question: Your lines start and end with digits, so removing all newlines not between digit will be a start, but this won't fix the file completely. To detect the other newlines, I guess one would need to add knowledge about how the fields look like.
â€“Â dirkt
Apr 2 at 6:38

How would you fix it in awk and why can't you do something similar in Perl or sed?
â€“Â muru
Apr 2 at 6:56

on this problem for several days please add at least one of those attempts to question..
â€“Â Sundeep
Apr 2 at 7:50

add a commentÂ |Â

up vote
0
down vote

favorite

Sample malformed input (lots of randomly inserted n)

policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResiden
tial LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.7
00455,Ã¢Â€ÂœResiden
tial LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,
3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,
Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
4335
12,FL,CLAY COUNTY,-81.704613,
Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

Required output

policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.700455,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
433512,FL,CLAY COUNTY,-81.704613,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

edited Apr 2 at 20:31

Jeff Schaller

31.1k846105

asked Apr 2 at 6:28

Harry McKenzie

Find why the newlines are inserted and fix that. That would be the best resolution to the issue.
â€“Â Kusalananda
Apr 2 at 6:33

Guess: Newlines are inserted when some internal buffer is full, and written out.
â€“Â dirkt
Apr 2 at 6:36

For the actual question: Your lines start and end with digits, so removing all newlines not between digit will be a start, but this won't fix the file completely. To detect the other newlines, I guess one would need to add knowledge about how the fields look like.
â€“Â dirkt
Apr 2 at 6:38

How would you fix it in awk and why can't you do something similar in Perl or sed?
â€“Â muru
Apr 2 at 6:56

on this problem for several days please add at least one of those attempts to question..
â€“Â Sundeep
Apr 2 at 7:50

add a commentÂ |Â

up vote
0
down vote

favorite

Sample malformed input (lots of randomly inserted n)

policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResiden
tial LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.7
00455,Ã¢Â€ÂœResiden
tial LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,
3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,
Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
4335
12,FL,CLAY COUNTY,-81.704613,
Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

Required output

policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.700455,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
433512,FL,CLAY COUNTY,-81.704613,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

edited Apr 2 at 20:31

Jeff Schaller

31.1k846105

asked Apr 2 at 6:28

Harry McKenzie

Sample malformed input (lots of randomly inserted n)

policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResiden
tial LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.7
00455,Ã¢Â€ÂœResiden
tial LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,
3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,
Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
4335
12,FL,CLAY COUNTY,-81.704613,
Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

Required output

policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.700455,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
433512,FL,CLAY COUNTY,-81.704613,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

edited Apr 2 at 20:31

Jeff Schaller

31.1k846105

asked Apr 2 at 6:28

Harry McKenzie

edited Apr 2 at 20:31

Jeff Schaller

31.1k846105

edited Apr 2 at 20:31

Jeff Schaller

31.1k846105

edited Apr 2 at 20:31

Jeff Schaller

31.1k846105

asked Apr 2 at 6:28

Harry McKenzie

asked Apr 2 at 6:28

Harry McKenzie

asked Apr 2 at 6:28

Harry McKenzie

Find why the newlines are inserted and fix that. That would be the best resolution to the issue.
â€“Â Kusalananda
Apr 2 at 6:33

Guess: Newlines are inserted when some internal buffer is full, and written out.
â€“Â dirkt
Apr 2 at 6:36

For the actual question: Your lines start and end with digits, so removing all newlines not between digit will be a start, but this won't fix the file completely. To detect the other newlines, I guess one would need to add knowledge about how the fields look like.
â€“Â dirkt
Apr 2 at 6:38

How would you fix it in awk and why can't you do something similar in Perl or sed?
â€“Â muru
Apr 2 at 6:56

on this problem for several days please add at least one of those attempts to question..
â€“Â Sundeep
Apr 2 at 7:50

add a commentÂ |Â

Find why the newlines are inserted and fix that. That would be the best resolution to the issue.
â€“Â Kusalananda
Apr 2 at 6:33

Guess: Newlines are inserted when some internal buffer is full, and written out.
â€“Â dirkt
Apr 2 at 6:36

For the actual question: Your lines start and end with digits, so removing all newlines not between digit will be a start, but this won't fix the file completely. To detect the other newlines, I guess one would need to add knowledge about how the fields look like.
â€“Â dirkt
Apr 2 at 6:38

How would you fix it in awk and why can't you do something similar in Perl or sed?
â€“Â muru
Apr 2 at 6:56

on this problem for several days please add at least one of those attempts to question..
â€“Â Sundeep
Apr 2 at 7:50

Find why the newlines are inserted and fix that. That would be the best resolution to the issue.
â€“Â Kusalananda
Apr 2 at 6:33

Guess: Newlines are inserted when some internal buffer is full, and written out.
â€“Â dirkt
Apr 2 at 6:36

For the actual question: Your lines start and end with digits, so removing all newlines not between digit will be a start, but this won't fix the file completely. To detect the other newlines, I guess one would need to add knowledge about how the fields look like.
â€“Â dirkt
Apr 2 at 6:38

How would you fix it in awk and why can't you do something similar in Perl or sed?
â€“Â muru
Apr 2 at 6:56

on this problem for several days please add at least one of those attempts to question..
â€“Â Sundeep
Apr 2 at 7:50

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

$ awk -F, ' $NF == "") brokenline=$0; getline; $0 = brokenline $0; print ' file.csv
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.700455,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
433512,FL,CLAY COUNTY,-81.704613,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

The awk code will append the next line of input to the current line for as long as there is less than six fields in the current line, or the last field is empty (there is one line that is broken just after the last field separator).

A Perl workalike:

perl -ne 'chomp;while (tr/,/,/ < 5 || /,$/) $_ .= readline; chomp print "$_n"' file.csv

answered Apr 2 at 7:41

Kusalananda

102k13201317

thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â€“Â Harry McKenzie
Apr 2 at 12:44

@HarryMcKenzie I would use the awk solution for that as it's easier to just change the option-argument for the -F flag.
â€“Â Kusalananda
Apr 2 at 12:46

yeah but i need to use perl :(
â€“Â Harry McKenzie
Apr 2 at 12:47

@HarryMcKenzie I don't quite see why.
â€“Â Kusalananda
Apr 2 at 12:47

it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â€“Â Harry McKenzie
Apr 2 at 13:04

add a commentÂ |Â

up vote
0
down vote

Like say by Kusalananda, there is 6 fields on each line, so you can try this gnu sed.

sed -E ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA;g' infile

answered Apr 2 at 10:22

ctac_

1,016116

but unfortunately doesn't work. you didnt test it :p
â€“Â Harry McKenzie
Apr 2 at 13:10

Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â€“Â ctac_
Apr 2 at 13:41

im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â€“Â Harry McKenzie
Apr 4 at 11:17

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f434979%2ffixing-malformed-csv-with-incorrect-new-line-chars-using-sed-or-perl-only%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

$ awk -F, ' $NF == "") brokenline=$0; getline; $0 = brokenline $0; print ' file.csv
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.700455,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
433512,FL,CLAY COUNTY,-81.704613,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

A Perl workalike:

perl -ne 'chomp;while (tr/,/,/ < 5 || /,$/) $_ .= readline; chomp print "$_n"' file.csv

answered Apr 2 at 7:41

Kusalananda

102k13201317

thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â€“Â Harry McKenzie
Apr 2 at 12:44

@HarryMcKenzie I would use the awk solution for that as it's easier to just change the option-argument for the -F flag.
â€“Â Kusalananda
Apr 2 at 12:46

yeah but i need to use perl :(
â€“Â Harry McKenzie
Apr 2 at 12:47

@HarryMcKenzie I don't quite see why.
â€“Â Kusalananda
Apr 2 at 12:47

it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â€“Â Harry McKenzie
Apr 2 at 13:04

add a commentÂ |Â

up vote
1
down vote

accepted

$ awk -F, ' $NF == "") brokenline=$0; getline; $0 = brokenline $0; print ' file.csv
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.700455,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
433512,FL,CLAY COUNTY,-81.704613,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

A Perl workalike:

perl -ne 'chomp;while (tr/,/,/ < 5 || /,$/) $_ .= readline; chomp print "$_n"' file.csv

answered Apr 2 at 7:41

Kusalananda

102k13201317

thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â€“Â Harry McKenzie
Apr 2 at 12:44

@HarryMcKenzie I would use the awk solution for that as it's easier to just change the option-argument for the -F flag.
â€“Â Kusalananda
Apr 2 at 12:46

yeah but i need to use perl :(
â€“Â Harry McKenzie
Apr 2 at 12:47

@HarryMcKenzie I don't quite see why.
â€“Â Kusalananda
Apr 2 at 12:47

it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â€“Â Harry McKenzie
Apr 2 at 13:04

add a commentÂ |Â

up vote
1
down vote

accepted

$ awk -F, ' $NF == "") brokenline=$0; getline; $0 = brokenline $0; print ' file.csv
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.700455,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
433512,FL,CLAY COUNTY,-81.704613,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

A Perl workalike:

perl -ne 'chomp;while (tr/,/,/ < 5 || /,$/) $_ .= readline; chomp print "$_n"' file.csv

answered Apr 2 at 7:41

Kusalananda

102k13201317

$ awk -F, ' $NF == "") brokenline=$0; getline; $0 = brokenline $0; print ' file.csv
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
448094,FL,CLAY COUNTY,-81.707664,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
206893,FL,CLAY COUNTY,-81.700455,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
333743,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
172534,FL,CLAY COUNTY,-81.702675,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
785275,FL,CLAY COUNTY,-81.707703,Ã¢Â€ÂœResidential LotÃ¢Â€Â,3
995932,FL,CLAY COUNTY,-81.713882,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
223488,FL,CLAY COUNTY,-81.707146,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1
433512,FL,CLAY COUNTY,-81.704613,Ã¢Â€ÂœResidential LotÃ¢Â€Â,1

A Perl workalike:

perl -ne 'chomp;while (tr/,/,/ < 5 || /,$/) $_ .= readline; chomp print "$_n"' file.csv

answered Apr 2 at 7:41

Kusalananda

102k13201317

answered Apr 2 at 7:41

Kusalananda

102k13201317

answered Apr 2 at 7:41

Kusalananda

102k13201317

answered Apr 2 at 7:41

Kusalananda

102k13201317

thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â€“Â Harry McKenzie
Apr 2 at 12:44

@HarryMcKenzie I would use the awk solution for that as it's easier to just change the option-argument for the -F flag.
â€“Â Kusalananda
Apr 2 at 12:46

yeah but i need to use perl :(
â€“Â Harry McKenzie
Apr 2 at 12:47

@HarryMcKenzie I don't quite see why.
â€“Â Kusalananda
Apr 2 at 12:47

it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â€“Â Harry McKenzie
Apr 2 at 13:04

add a commentÂ |Â

thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â€“Â Harry McKenzie
Apr 2 at 12:44

@HarryMcKenzie I would use the awk solution for that as it's easier to just change the option-argument for the -F flag.
â€“Â Kusalananda
Apr 2 at 12:46

yeah but i need to use perl :(
â€“Â Harry McKenzie
Apr 2 at 12:47

@HarryMcKenzie I don't quite see why.
â€“Â Kusalananda
Apr 2 at 12:47

it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â€“Â Harry McKenzie
Apr 2 at 13:04

thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â€“Â Harry McKenzie
Apr 2 at 12:44

@HarryMcKenzie I would use the awk solution for that as it's easier to just change the option-argument for the -F flag.
â€“Â Kusalananda
Apr 2 at 12:46

yeah but i need to use perl :(
â€“Â Harry McKenzie
Apr 2 at 12:47

@HarryMcKenzie I don't quite see why.
â€“Â Kusalananda
Apr 2 at 12:47

it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â€“Â Harry McKenzie
Apr 2 at 13:04

add a commentÂ |Â

up vote
0
down vote

Like say by Kusalananda, there is 6 fields on each line, so you can try this gnu sed.

sed -E ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA;g' infile

answered Apr 2 at 10:22

ctac_

1,016116

but unfortunately doesn't work. you didnt test it :p
â€“Â Harry McKenzie
Apr 2 at 13:10

Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â€“Â ctac_
Apr 2 at 13:41

im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â€“Â Harry McKenzie
Apr 4 at 11:17

add a commentÂ |Â

up vote
0
down vote

Like say by Kusalananda, there is 6 fields on each line, so you can try this gnu sed.

sed -E ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA;g' infile

answered Apr 2 at 10:22

ctac_

1,016116

but unfortunately doesn't work. you didnt test it :p
â€“Â Harry McKenzie
Apr 2 at 13:10

Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â€“Â ctac_
Apr 2 at 13:41

im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â€“Â Harry McKenzie
Apr 4 at 11:17

add a commentÂ |Â

up vote
0
down vote

Like say by Kusalananda, there is 6 fields on each line, so you can try this gnu sed.

sed -E ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA;g' infile

answered Apr 2 at 10:22

ctac_

1,016116

Like say by Kusalananda, there is 6 fields on each line, so you can try this gnu sed.

sed -E ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA;g' infile

answered Apr 2 at 10:22

ctac_

1,016116

answered Apr 2 at 10:22

ctac_

1,016116

answered Apr 2 at 10:22

ctac_

1,016116

answered Apr 2 at 10:22

ctac_

1,016116

but unfortunately doesn't work. you didnt test it :p
â€“Â Harry McKenzie
Apr 2 at 13:10

Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â€“Â ctac_
Apr 2 at 13:41

im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â€“Â Harry McKenzie
Apr 4 at 11:17

add a commentÂ |Â

but unfortunately doesn't work. you didnt test it :p
â€“Â Harry McKenzie
Apr 2 at 13:10

Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â€“Â ctac_
Apr 2 at 13:41

im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â€“Â Harry McKenzie
Apr 4 at 11:17

but unfortunately doesn't work. you didnt test it :p
â€“Â Harry McKenzie
Apr 2 at 13:10

Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â€“Â ctac_
Apr 2 at 13:41

im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â€“Â Harry McKenzie
Apr 4 at 11:17

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu