Fixing malformed CSV with incorrect new line chars using sed or perl only
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have a comma-delimited CSV file but for some reason our system inserts a new line character at a random location in the file which causes the entire file to break. I can get the number of columns in the file.
How do I solve it with sed
and/or perl
in a one liner command? I know it's solvable with awk
but this is for learning purposes. If using perl
, I don't want to use the built-in CSV functions. Is it solvable?? I'm on this problem for several days i can't seem to find a solution :(
Sample malformed input (lots of randomly inserted n)
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResiden
tial LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.7
00455,âÂÂResiden
tial LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,
3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,
âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
4335
12,FL,CLAY COUNTY,-81.704613,
âÂÂResidential LotâÂÂ,1
Required output
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResidential LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.700455,âÂÂResidential LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
433512,FL,CLAY COUNTY,-81.704613,âÂÂResidential LotâÂÂ,1
text-processing awk sed perl csv
add a comment |Â
up vote
0
down vote
favorite
I have a comma-delimited CSV file but for some reason our system inserts a new line character at a random location in the file which causes the entire file to break. I can get the number of columns in the file.
How do I solve it with sed
and/or perl
in a one liner command? I know it's solvable with awk
but this is for learning purposes. If using perl
, I don't want to use the built-in CSV functions. Is it solvable?? I'm on this problem for several days i can't seem to find a solution :(
Sample malformed input (lots of randomly inserted n)
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResiden
tial LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.7
00455,âÂÂResiden
tial LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,
3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,
âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
4335
12,FL,CLAY COUNTY,-81.704613,
âÂÂResidential LotâÂÂ,1
Required output
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResidential LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.700455,âÂÂResidential LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
433512,FL,CLAY COUNTY,-81.704613,âÂÂResidential LotâÂÂ,1
text-processing awk sed perl csv
Find why the newlines are inserted and fix that. That would be the best resolution to the issue.
â Kusalananda
Apr 2 at 6:33
Guess: Newlines are inserted when some internal buffer is full, and written out.
â dirkt
Apr 2 at 6:36
For the actual question: Your lines start and end with digits, so removing all newlines not between digit will be a start, but this won't fix the file completely. To detect the other newlines, I guess one would need to add knowledge about how the fields look like.
â dirkt
Apr 2 at 6:38
How would you fix it in awk and why can't you do something similar in Perl or sed?
â muru
Apr 2 at 6:56
on this problem for several days
please add at least one of those attempts to question..
â Sundeep
Apr 2 at 7:50
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a comma-delimited CSV file but for some reason our system inserts a new line character at a random location in the file which causes the entire file to break. I can get the number of columns in the file.
How do I solve it with sed
and/or perl
in a one liner command? I know it's solvable with awk
but this is for learning purposes. If using perl
, I don't want to use the built-in CSV functions. Is it solvable?? I'm on this problem for several days i can't seem to find a solution :(
Sample malformed input (lots of randomly inserted n)
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResiden
tial LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.7
00455,âÂÂResiden
tial LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,
3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,
âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
4335
12,FL,CLAY COUNTY,-81.704613,
âÂÂResidential LotâÂÂ,1
Required output
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResidential LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.700455,âÂÂResidential LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
433512,FL,CLAY COUNTY,-81.704613,âÂÂResidential LotâÂÂ,1
text-processing awk sed perl csv
I have a comma-delimited CSV file but for some reason our system inserts a new line character at a random location in the file which causes the entire file to break. I can get the number of columns in the file.
How do I solve it with sed
and/or perl
in a one liner command? I know it's solvable with awk
but this is for learning purposes. If using perl
, I don't want to use the built-in CSV functions. Is it solvable?? I'm on this problem for several days i can't seem to find a solution :(
Sample malformed input (lots of randomly inserted n)
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResiden
tial LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.7
00455,âÂÂResiden
tial LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,
3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,
âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
4335
12,FL,CLAY COUNTY,-81.704613,
âÂÂResidential LotâÂÂ,1
Required output
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResidential LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.700455,âÂÂResidential LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
433512,FL,CLAY COUNTY,-81.704613,âÂÂResidential LotâÂÂ,1
text-processing awk sed perl csv
edited Apr 2 at 20:31
Jeff Schaller
31.1k846105
31.1k846105
asked Apr 2 at 6:28
Harry McKenzie
31
31
Find why the newlines are inserted and fix that. That would be the best resolution to the issue.
â Kusalananda
Apr 2 at 6:33
Guess: Newlines are inserted when some internal buffer is full, and written out.
â dirkt
Apr 2 at 6:36
For the actual question: Your lines start and end with digits, so removing all newlines not between digit will be a start, but this won't fix the file completely. To detect the other newlines, I guess one would need to add knowledge about how the fields look like.
â dirkt
Apr 2 at 6:38
How would you fix it in awk and why can't you do something similar in Perl or sed?
â muru
Apr 2 at 6:56
on this problem for several days
please add at least one of those attempts to question..
â Sundeep
Apr 2 at 7:50
add a comment |Â
Find why the newlines are inserted and fix that. That would be the best resolution to the issue.
â Kusalananda
Apr 2 at 6:33
Guess: Newlines are inserted when some internal buffer is full, and written out.
â dirkt
Apr 2 at 6:36
For the actual question: Your lines start and end with digits, so removing all newlines not between digit will be a start, but this won't fix the file completely. To detect the other newlines, I guess one would need to add knowledge about how the fields look like.
â dirkt
Apr 2 at 6:38
How would you fix it in awk and why can't you do something similar in Perl or sed?
â muru
Apr 2 at 6:56
on this problem for several days
please add at least one of those attempts to question..
â Sundeep
Apr 2 at 7:50
Find why the newlines are inserted and fix that. That would be the best resolution to the issue.
â Kusalananda
Apr 2 at 6:33
Find why the newlines are inserted and fix that. That would be the best resolution to the issue.
â Kusalananda
Apr 2 at 6:33
Guess: Newlines are inserted when some internal buffer is full, and written out.
â dirkt
Apr 2 at 6:36
Guess: Newlines are inserted when some internal buffer is full, and written out.
â dirkt
Apr 2 at 6:36
For the actual question: Your lines start and end with digits, so removing all newlines not between digit will be a start, but this won't fix the file completely. To detect the other newlines, I guess one would need to add knowledge about how the fields look like.
â dirkt
Apr 2 at 6:38
For the actual question: Your lines start and end with digits, so removing all newlines not between digit will be a start, but this won't fix the file completely. To detect the other newlines, I guess one would need to add knowledge about how the fields look like.
â dirkt
Apr 2 at 6:38
How would you fix it in awk and why can't you do something similar in Perl or sed?
â muru
Apr 2 at 6:56
How would you fix it in awk and why can't you do something similar in Perl or sed?
â muru
Apr 2 at 6:56
on this problem for several days
please add at least one of those attempts to question..â Sundeep
Apr 2 at 7:50
on this problem for several days
please add at least one of those attempts to question..â Sundeep
Apr 2 at 7:50
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
$ awk -F, ' $NF == "") brokenline=$0; getline; $0 = brokenline $0; print ' file.csv
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResidential LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.700455,âÂÂResidential LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
433512,FL,CLAY COUNTY,-81.704613,âÂÂResidential LotâÂÂ,1
The awk
code will append the next line of input to the current line for as long as there is less than six fields in the current line, or the last field is empty (there is one line that is broken just after the last field separator).
A Perl workalike:
perl -ne 'chomp;while (tr/,/,/ < 5 || /,$/) $_ .= readline; chomp print "$_n"' file.csv
thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â Harry McKenzie
Apr 2 at 12:44
@HarryMcKenzie I would use theawk
solution for that as it's easier to just change the option-argument for the-F
flag.
â Kusalananda
Apr 2 at 12:46
yeah but i need to use perl :(
â Harry McKenzie
Apr 2 at 12:47
@HarryMcKenzie I don't quite see why.
â Kusalananda
Apr 2 at 12:47
it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â Harry McKenzie
Apr 2 at 13:04
add a comment |Â
up vote
0
down vote
Like say by Kusalananda, there is 6 fields on each line, so you can try this gnu sed.
sed -E ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA;g' infile
but unfortunately doesn't work. you didnt test it :p
â Harry McKenzie
Apr 2 at 13:10
Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â ctac_
Apr 2 at 13:41
im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â Harry McKenzie
Apr 4 at 11:17
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
$ awk -F, ' $NF == "") brokenline=$0; getline; $0 = brokenline $0; print ' file.csv
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResidential LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.700455,âÂÂResidential LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
433512,FL,CLAY COUNTY,-81.704613,âÂÂResidential LotâÂÂ,1
The awk
code will append the next line of input to the current line for as long as there is less than six fields in the current line, or the last field is empty (there is one line that is broken just after the last field separator).
A Perl workalike:
perl -ne 'chomp;while (tr/,/,/ < 5 || /,$/) $_ .= readline; chomp print "$_n"' file.csv
thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â Harry McKenzie
Apr 2 at 12:44
@HarryMcKenzie I would use theawk
solution for that as it's easier to just change the option-argument for the-F
flag.
â Kusalananda
Apr 2 at 12:46
yeah but i need to use perl :(
â Harry McKenzie
Apr 2 at 12:47
@HarryMcKenzie I don't quite see why.
â Kusalananda
Apr 2 at 12:47
it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â Harry McKenzie
Apr 2 at 13:04
add a comment |Â
up vote
1
down vote
accepted
$ awk -F, ' $NF == "") brokenline=$0; getline; $0 = brokenline $0; print ' file.csv
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResidential LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.700455,âÂÂResidential LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
433512,FL,CLAY COUNTY,-81.704613,âÂÂResidential LotâÂÂ,1
The awk
code will append the next line of input to the current line for as long as there is less than six fields in the current line, or the last field is empty (there is one line that is broken just after the last field separator).
A Perl workalike:
perl -ne 'chomp;while (tr/,/,/ < 5 || /,$/) $_ .= readline; chomp print "$_n"' file.csv
thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â Harry McKenzie
Apr 2 at 12:44
@HarryMcKenzie I would use theawk
solution for that as it's easier to just change the option-argument for the-F
flag.
â Kusalananda
Apr 2 at 12:46
yeah but i need to use perl :(
â Harry McKenzie
Apr 2 at 12:47
@HarryMcKenzie I don't quite see why.
â Kusalananda
Apr 2 at 12:47
it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â Harry McKenzie
Apr 2 at 13:04
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
$ awk -F, ' $NF == "") brokenline=$0; getline; $0 = brokenline $0; print ' file.csv
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResidential LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.700455,âÂÂResidential LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
433512,FL,CLAY COUNTY,-81.704613,âÂÂResidential LotâÂÂ,1
The awk
code will append the next line of input to the current line for as long as there is less than six fields in the current line, or the last field is empty (there is one line that is broken just after the last field separator).
A Perl workalike:
perl -ne 'chomp;while (tr/,/,/ < 5 || /,$/) $_ .= readline; chomp print "$_n"' file.csv
$ awk -F, ' $NF == "") brokenline=$0; getline; $0 = brokenline $0; print ' file.csv
policyID,statecode,county,Point longitude,Some Thing Here,point_granularity
119736,FL,CLAY COUNTY,-81.711777,âÂÂResidential LotâÂÂ,1
448094,FL,CLAY COUNTY,-81.707664,âÂÂResidential LotâÂÂ,3
206893,FL,CLAY COUNTY,-81.700455,âÂÂResidential LotâÂÂ,1
333743,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
172534,FL,CLAY COUNTY,-81.702675,âÂÂResidential LotâÂÂ,1
785275,FL,CLAY COUNTY,-81.707703,âÂÂResidential LotâÂÂ,3
995932,FL,CLAY COUNTY,-81.713882,âÂÂResidential LotâÂÂ,1
223488,FL,CLAY COUNTY,-81.707146,âÂÂResidential LotâÂÂ,1
433512,FL,CLAY COUNTY,-81.704613,âÂÂResidential LotâÂÂ,1
The awk
code will append the next line of input to the current line for as long as there is less than six fields in the current line, or the last field is empty (there is one line that is broken just after the last field separator).
A Perl workalike:
perl -ne 'chomp;while (tr/,/,/ < 5 || /,$/) $_ .= readline; chomp print "$_n"' file.csv
answered Apr 2 at 7:41
Kusalananda
102k13201317
102k13201317
thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â Harry McKenzie
Apr 2 at 12:44
@HarryMcKenzie I would use theawk
solution for that as it's easier to just change the option-argument for the-F
flag.
â Kusalananda
Apr 2 at 12:46
yeah but i need to use perl :(
â Harry McKenzie
Apr 2 at 12:47
@HarryMcKenzie I don't quite see why.
â Kusalananda
Apr 2 at 12:47
it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â Harry McKenzie
Apr 2 at 13:04
add a comment |Â
thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â Harry McKenzie
Apr 2 at 12:44
@HarryMcKenzie I would use theawk
solution for that as it's easier to just change the option-argument for the-F
flag.
â Kusalananda
Apr 2 at 12:46
yeah but i need to use perl :(
â Harry McKenzie
Apr 2 at 12:47
@HarryMcKenzie I don't quite see why.
â Kusalananda
Apr 2 at 12:47
it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â Harry McKenzie
Apr 2 at 13:04
thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â Harry McKenzie
Apr 2 at 12:44
thank you for your response! i was trying to make it work with any type of delimiter like so: perl -ne 'chomp;while (tr/quotemeta('"$d2"')/quotemeta('"$d2"')/ < '"$columns"'-1 || /quotemeta('"$d2"')$/) $_ .=readline; chomp print "$_n"' where $d2 for example is a bash variable with value || but it doesnt work :(
â Harry McKenzie
Apr 2 at 12:44
@HarryMcKenzie I would use the
awk
solution for that as it's easier to just change the option-argument for the -F
flag.â Kusalananda
Apr 2 at 12:46
@HarryMcKenzie I would use the
awk
solution for that as it's easier to just change the option-argument for the -F
flag.â Kusalananda
Apr 2 at 12:46
yeah but i need to use perl :(
â Harry McKenzie
Apr 2 at 12:47
yeah but i need to use perl :(
â Harry McKenzie
Apr 2 at 12:47
@HarryMcKenzie I don't quite see why.
â Kusalananda
Apr 2 at 12:47
@HarryMcKenzie I don't quite see why.
â Kusalananda
Apr 2 at 12:47
it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â Harry McKenzie
Apr 2 at 13:04
it's ok... it just sucks that our system only has a limited amount of commands and awk is not part of it. and they don't want to install it for some reason so we have to live with sed and perl only.
â Harry McKenzie
Apr 2 at 13:04
add a comment |Â
up vote
0
down vote
Like say by Kusalananda, there is 6 fields on each line, so you can try this gnu sed.
sed -E ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA;g' infile
but unfortunately doesn't work. you didnt test it :p
â Harry McKenzie
Apr 2 at 13:10
Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â ctac_
Apr 2 at 13:41
im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â Harry McKenzie
Apr 4 at 11:17
add a comment |Â
up vote
0
down vote
Like say by Kusalananda, there is 6 fields on each line, so you can try this gnu sed.
sed -E ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA;g' infile
but unfortunately doesn't work. you didnt test it :p
â Harry McKenzie
Apr 2 at 13:10
Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â ctac_
Apr 2 at 13:41
im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â Harry McKenzie
Apr 4 at 11:17
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Like say by Kusalananda, there is 6 fields on each line, so you can try this gnu sed.
sed -E ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA;g' infile
Like say by Kusalananda, there is 6 fields on each line, so you can try this gnu sed.
sed -E ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA;g' infile
answered Apr 2 at 10:22
ctac_
1,016116
1,016116
but unfortunately doesn't work. you didnt test it :p
â Harry McKenzie
Apr 2 at 13:10
Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â ctac_
Apr 2 at 13:41
im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â Harry McKenzie
Apr 4 at 11:17
add a comment |Â
but unfortunately doesn't work. you didnt test it :p
â Harry McKenzie
Apr 2 at 13:10
Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â ctac_
Apr 2 at 13:41
im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â Harry McKenzie
Apr 4 at 11:17
but unfortunately doesn't work. you didnt test it :p
â Harry McKenzie
Apr 2 at 13:10
but unfortunately doesn't work. you didnt test it :p
â Harry McKenzie
Apr 2 at 13:10
Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â ctac_
Apr 2 at 13:41
Is your sed gnu sed? On Openbsd, I must change the code like that : sed -Ee ':A;h;s/^/,/;s/((,[^,]+)6)(.*)/3/;/./g;N;s/n//;bA' -e ';g' infile
â ctac_
Apr 2 at 13:41
im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â Harry McKenzie
Apr 4 at 11:17
im using sed gnu. but thank you very much for helping me, it works now! thank you :)
â Harry McKenzie
Apr 4 at 11:17
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f434979%2ffixing-malformed-csv-with-incorrect-new-line-chars-using-sed-or-perl-only%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Find why the newlines are inserted and fix that. That would be the best resolution to the issue.
â Kusalananda
Apr 2 at 6:33
Guess: Newlines are inserted when some internal buffer is full, and written out.
â dirkt
Apr 2 at 6:36
For the actual question: Your lines start and end with digits, so removing all newlines not between digit will be a start, but this won't fix the file completely. To detect the other newlines, I guess one would need to add knowledge about how the fields look like.
â dirkt
Apr 2 at 6:38
How would you fix it in awk and why can't you do something similar in Perl or sed?
â muru
Apr 2 at 6:56
on this problem for several days
please add at least one of those attempts to question..â Sundeep
Apr 2 at 7:50