Linux - how do I ignore special characters between â â?
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
My file: (1 sample line)
MMP,"01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i
& MetroMax Q shelf, NSF",CLEANING
I need to read this into a Postgresql table with 7 columns.
Breakdown of columns:
MMP
"01_janitorial,02_cleaning_tools"
1
CUBIC_INCH
"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 1. 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF"
CLEANING
The file is basically comma delimited, but I need to ignore the commas, carriage return (if present), and double quotes IF the text is inside a double quote. As in column 2 and 6.
I can use a postgresql copy command to load, or convert the file using awk, perl, sed or whatever to convert the file and then load.
shell-script postgresql
 |Â
show 3 more comments
up vote
1
down vote
favorite
My file: (1 sample line)
MMP,"01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i
& MetroMax Q shelf, NSF",CLEANING
I need to read this into a Postgresql table with 7 columns.
Breakdown of columns:
MMP
"01_janitorial,02_cleaning_tools"
1
CUBIC_INCH
"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 1. 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF"
CLEANING
The file is basically comma delimited, but I need to ignore the commas, carriage return (if present), and double quotes IF the text is inside a double quote. As in column 2 and 6.
I can use a postgresql copy command to load, or convert the file using awk, perl, sed or whatever to convert the file and then load.
shell-script postgresql
If possible, fix the process that generates that file. For the CSV format, a field that contains the quote character should double it:123,"a string ""with double"" quotes, and commas",456
â glenn jackman
Jan 3 at 21:33
This an export file from Akeneo (cots product) so I can not change the process that generates the file.
â J.Turck
Jan 3 at 21:37
3
Definitely raise an issue with the software provider of the product that produced that output - there is no way to parse that file without hitting loads of edge case and cause incorrect parsing. In the meantime I would write a parser in something like python to handle the edge cases as you encounter them. If you can get them to fix it upstream then you can just drop it and use a real csv parser.
â Michael Daffin
Jan 3 at 21:57
Kindly provide sample input and output
â Praveen Kumar BS
Jan 4 at 3:09
this quick-and-dirty sed script will fix up the quotes-inside-quotes so that the input file can be processed by a CSV parser:sed -e 's/"/""/g; s/,""/,"/g; s/"",/",/g; s/^""|""$/"/g;'
. it first converts all"
s to""
, then changes them back to just"
if they are next to commas or the start or end of the line. It doesn't fix any other potential problems in the input file (and if they get something like this wrong, that's quite likely).
â cas
Jan 4 at 4:01
 |Â
show 3 more comments
up vote
1
down vote
favorite
up vote
1
down vote
favorite
My file: (1 sample line)
MMP,"01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i
& MetroMax Q shelf, NSF",CLEANING
I need to read this into a Postgresql table with 7 columns.
Breakdown of columns:
MMP
"01_janitorial,02_cleaning_tools"
1
CUBIC_INCH
"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 1. 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF"
CLEANING
The file is basically comma delimited, but I need to ignore the commas, carriage return (if present), and double quotes IF the text is inside a double quote. As in column 2 and 6.
I can use a postgresql copy command to load, or convert the file using awk, perl, sed or whatever to convert the file and then load.
shell-script postgresql
My file: (1 sample line)
MMP,"01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i
& MetroMax Q shelf, NSF",CLEANING
I need to read this into a Postgresql table with 7 columns.
Breakdown of columns:
MMP
"01_janitorial,02_cleaning_tools"
1
CUBIC_INCH
"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 1. 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF"
CLEANING
The file is basically comma delimited, but I need to ignore the commas, carriage return (if present), and double quotes IF the text is inside a double quote. As in column 2 and 6.
I can use a postgresql copy command to load, or convert the file using awk, perl, sed or whatever to convert the file and then load.
shell-script postgresql
edited Jan 3 at 21:44
Michael Daffin
2,5801517
2,5801517
asked Jan 3 at 21:16
J.Turck
62
62
If possible, fix the process that generates that file. For the CSV format, a field that contains the quote character should double it:123,"a string ""with double"" quotes, and commas",456
â glenn jackman
Jan 3 at 21:33
This an export file from Akeneo (cots product) so I can not change the process that generates the file.
â J.Turck
Jan 3 at 21:37
3
Definitely raise an issue with the software provider of the product that produced that output - there is no way to parse that file without hitting loads of edge case and cause incorrect parsing. In the meantime I would write a parser in something like python to handle the edge cases as you encounter them. If you can get them to fix it upstream then you can just drop it and use a real csv parser.
â Michael Daffin
Jan 3 at 21:57
Kindly provide sample input and output
â Praveen Kumar BS
Jan 4 at 3:09
this quick-and-dirty sed script will fix up the quotes-inside-quotes so that the input file can be processed by a CSV parser:sed -e 's/"/""/g; s/,""/,"/g; s/"",/",/g; s/^""|""$/"/g;'
. it first converts all"
s to""
, then changes them back to just"
if they are next to commas or the start or end of the line. It doesn't fix any other potential problems in the input file (and if they get something like this wrong, that's quite likely).
â cas
Jan 4 at 4:01
 |Â
show 3 more comments
If possible, fix the process that generates that file. For the CSV format, a field that contains the quote character should double it:123,"a string ""with double"" quotes, and commas",456
â glenn jackman
Jan 3 at 21:33
This an export file from Akeneo (cots product) so I can not change the process that generates the file.
â J.Turck
Jan 3 at 21:37
3
Definitely raise an issue with the software provider of the product that produced that output - there is no way to parse that file without hitting loads of edge case and cause incorrect parsing. In the meantime I would write a parser in something like python to handle the edge cases as you encounter them. If you can get them to fix it upstream then you can just drop it and use a real csv parser.
â Michael Daffin
Jan 3 at 21:57
Kindly provide sample input and output
â Praveen Kumar BS
Jan 4 at 3:09
this quick-and-dirty sed script will fix up the quotes-inside-quotes so that the input file can be processed by a CSV parser:sed -e 's/"/""/g; s/,""/,"/g; s/"",/",/g; s/^""|""$/"/g;'
. it first converts all"
s to""
, then changes them back to just"
if they are next to commas or the start or end of the line. It doesn't fix any other potential problems in the input file (and if they get something like this wrong, that's quite likely).
â cas
Jan 4 at 4:01
If possible, fix the process that generates that file. For the CSV format, a field that contains the quote character should double it:
123,"a string ""with double"" quotes, and commas",456
â glenn jackman
Jan 3 at 21:33
If possible, fix the process that generates that file. For the CSV format, a field that contains the quote character should double it:
123,"a string ""with double"" quotes, and commas",456
â glenn jackman
Jan 3 at 21:33
This an export file from Akeneo (cots product) so I can not change the process that generates the file.
â J.Turck
Jan 3 at 21:37
This an export file from Akeneo (cots product) so I can not change the process that generates the file.
â J.Turck
Jan 3 at 21:37
3
3
Definitely raise an issue with the software provider of the product that produced that output - there is no way to parse that file without hitting loads of edge case and cause incorrect parsing. In the meantime I would write a parser in something like python to handle the edge cases as you encounter them. If you can get them to fix it upstream then you can just drop it and use a real csv parser.
â Michael Daffin
Jan 3 at 21:57
Definitely raise an issue with the software provider of the product that produced that output - there is no way to parse that file without hitting loads of edge case and cause incorrect parsing. In the meantime I would write a parser in something like python to handle the edge cases as you encounter them. If you can get them to fix it upstream then you can just drop it and use a real csv parser.
â Michael Daffin
Jan 3 at 21:57
Kindly provide sample input and output
â Praveen Kumar BS
Jan 4 at 3:09
Kindly provide sample input and output
â Praveen Kumar BS
Jan 4 at 3:09
this quick-and-dirty sed script will fix up the quotes-inside-quotes so that the input file can be processed by a CSV parser:
sed -e 's/"/""/g; s/,""/,"/g; s/"",/",/g; s/^""|""$/"/g;'
. it first converts all "
s to ""
, then changes them back to just "
if they are next to commas or the start or end of the line. It doesn't fix any other potential problems in the input file (and if they get something like this wrong, that's quite likely).â cas
Jan 4 at 4:01
this quick-and-dirty sed script will fix up the quotes-inside-quotes so that the input file can be processed by a CSV parser:
sed -e 's/"/""/g; s/,""/,"/g; s/"",/",/g; s/^""|""$/"/g;'
. it first converts all "
s to ""
, then changes them back to just "
if they are next to commas or the start or end of the line. It doesn't fix any other potential problems in the input file (and if they get something like this wrong, that's quite likely).â cas
Jan 4 at 4:01
 |Â
show 3 more comments
4 Answers
4
active
oldest
votes
up vote
0
down vote
Simply using -F,
is often not enough to parse a CSV file. Especially if, as described, the delimiter can be part of a quoted string. You can work around some of this by using FPAT
to use an expression to define a field rather than defining a character for your field delimiter, but awk
will still go line-by-line, so you will have to preemptively consume the line breaks within your data.
Once done, you can do something such as awk 'BEGIN ("[^"]+")" /* normal processing here */ ' /path/to/file
.
That expression will define as a field either "anything that is not a comma" or "A double-quote, one or more of anything that is not a double-quote, followed by a double-quote".
This will, however, explode if any of your quoted data themselves contain double-quotes.
line 6 has double quotes and commas between the double quotes: "(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF"
â J.Turck
Jan 3 at 21:30
Is it possible to generate your data file using a delimiter which is not present in the data themselves (e. g.|
)?
â DopeGhoti
Jan 3 at 21:35
Unfortunately no. It is an export from a cots package - AKeneo.
â J.Turck
Jan 3 at 21:38
add a comment |Â
up vote
0
down vote
As was said, the file was generated incorrectly. Nevertheless, you may try to workaroumd it using not only ,
delimiter but also ",
and ,"
. Of course, custom script will be needed and no warranty you won't meet something like that in your 6th field.
Alternatively, you can strip first five fields assuming the 6th field is the only one messed, then from the result cut out the last field and comma. The remains will be 6th field content.
add a comment |Â
up vote
0
down vote
The solution is going to be very specific to your data file since the quotes aren't properly escaped off. Since there is only one trouble column, it's quite doable though. Here ya go:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Line: $line"
# grabbing the first field is easy ..
f1=$(echo $line | cut -d, -f1 )
# now remove the first field from the line
line=$(echo $line | sed "s/$f1,//" )
echo "Line is now: $line"
# to grab the second field use quote as a delimiter
f2=$(echo $line | cut -d" -f2 )
# now remove the second field from the line
line=$(echo $line | sed "s/"$f2",//" )
echo "Line is now: $line"
# fields 3,4,5 are trivial .. just repeat the same pattern as 1 and then remove them
f3=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f3,//" )
echo "Line is now: $line"
f4=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f4,//" )
echo "Line is now: $line"
f5=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f5,//" )
# here is the "trick" ... reverse the string, then you can cut field 7 first!
line=$(echo $line | rev)
echo "Line is now: $line"
f7=$(echo $line | cut -d, -f1 )
# now remove field 7 from the string, then reverse it back
line=$(echo $line | sed "s/$f7,//" )
f7=$(echo $f7 | rev)
# now we can reverse the remaining string, which is field 6 back to normal
line=$(echo $line | rev)
# and then remove the leading quote
line=$(echo $line | cut --complement -c 1)
# and then remove the trailing quote
line=$(echo $line | sed "s/"$//" )
echo "Line is now: $line"
# and then double up all the remaining quotes
f6=$(echo $line | sed "s/"/""/g" )
echo f1 = $f1
echo f2 = $f2
echo f3 = $f3
echo f4 = $f4
echo f5 = $f5
echo f6 = $f6
echo f7 = $f7
echo $f1,"$f2",$f3,$f4,$f5,"$f6",$f7 >> fixed.txt
done < "$1"
I made it echo lots of output to show you how it works, you can remove all the echo statements to make it fast once you understand it. It appends the fixed line to fixed.txt.
Here is an example run and output:
[root@alpha ~]# ./fixit.sh test.txt
Line: MMP,"01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: "01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: 1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: ,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: GNINAELC,"FSN ,flehs Q xaMorteM & i xaMorteM stif ,yxope epuat ,D"42 x W"84 no stnuom ,gnicaps "3 htiw thgirpu "6 ,yticapac yart )41("
Line is now: (14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF
f1 = MMP
f2 = 01_janitorial,02_cleaning_tools
f3 = 1
f4 =
f5 = CUBIC_INCH
f6 = (14) tray capacity, 6"" upright with 3"" spacing, mounts on 48""W x 24""D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF
f7 = CLEANING
If you need to escape the quotes in some other manner that should be pretty obvious given the above.
Wow, @alfreema, that's a lot of effort. I really appreciate it. The file I gave as my example is a shortened version of reality. My example showed 7 columns, I actually have 155. I believe I have most of it sorted, with the exception of what I call 'soft returns' needing to be removed while leaving the EOL character. The application allows formatting in the certain fields and the users add Tabs, bullets, and returns to format the data. When i read the file each 'soft return' is read as an EOL.
â J.Turck
Jan 4 at 18:30
⢠Sturdy construction: 300lb. (136kg), 400lb. (181kg), and$ 500lb. (227kg) capacity models available.",,,,,,,,,,,,0,DAY,24,INCH,,,Metro,"Fast drying of trays, pans, lids, pots and all pot sink items Promotes food safety by eliminating moisture. Interchangeable shelves adapt to all environments. Efficient organized drying area. Superior air circulation. Mobile unit allows for easy floor clean-up and$ transportability.",MTR2448XEA,,Cutting Board & Tray Drying Rack Group,570-825-2741,Customer,Service,PA,Wilkes-Barre,18705,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,48,INCH,,,^M$
â J.Turck
Jan 4 at 18:38
I want to remove the $ and leave the ^M.
â J.Turck
Jan 4 at 18:38
Are the soft returns literally $, or are you just using that as a placeholder for the example?
â alfreema
Jan 4 at 19:16
when i use :set list (inside the text file with vi) they show as $. My example is a cut and paste.
â J.Turck
Jan 5 at 2:08
 |Â
show 1 more comment
up vote
0
down vote
I get the final product by removing carriage returns within a quoted filed as following script :
$ cat remove_cr.awk
#!/usr/bin/awk -f
record = record $0
# If number of quotes is odd, continue reading record.
if ( gsub( /"/, "&", record ) % 2 )
record = record " "
next
print record
record = ""
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Simply using -F,
is often not enough to parse a CSV file. Especially if, as described, the delimiter can be part of a quoted string. You can work around some of this by using FPAT
to use an expression to define a field rather than defining a character for your field delimiter, but awk
will still go line-by-line, so you will have to preemptively consume the line breaks within your data.
Once done, you can do something such as awk 'BEGIN ("[^"]+")" /* normal processing here */ ' /path/to/file
.
That expression will define as a field either "anything that is not a comma" or "A double-quote, one or more of anything that is not a double-quote, followed by a double-quote".
This will, however, explode if any of your quoted data themselves contain double-quotes.
line 6 has double quotes and commas between the double quotes: "(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF"
â J.Turck
Jan 3 at 21:30
Is it possible to generate your data file using a delimiter which is not present in the data themselves (e. g.|
)?
â DopeGhoti
Jan 3 at 21:35
Unfortunately no. It is an export from a cots package - AKeneo.
â J.Turck
Jan 3 at 21:38
add a comment |Â
up vote
0
down vote
Simply using -F,
is often not enough to parse a CSV file. Especially if, as described, the delimiter can be part of a quoted string. You can work around some of this by using FPAT
to use an expression to define a field rather than defining a character for your field delimiter, but awk
will still go line-by-line, so you will have to preemptively consume the line breaks within your data.
Once done, you can do something such as awk 'BEGIN ("[^"]+")" /* normal processing here */ ' /path/to/file
.
That expression will define as a field either "anything that is not a comma" or "A double-quote, one or more of anything that is not a double-quote, followed by a double-quote".
This will, however, explode if any of your quoted data themselves contain double-quotes.
line 6 has double quotes and commas between the double quotes: "(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF"
â J.Turck
Jan 3 at 21:30
Is it possible to generate your data file using a delimiter which is not present in the data themselves (e. g.|
)?
â DopeGhoti
Jan 3 at 21:35
Unfortunately no. It is an export from a cots package - AKeneo.
â J.Turck
Jan 3 at 21:38
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Simply using -F,
is often not enough to parse a CSV file. Especially if, as described, the delimiter can be part of a quoted string. You can work around some of this by using FPAT
to use an expression to define a field rather than defining a character for your field delimiter, but awk
will still go line-by-line, so you will have to preemptively consume the line breaks within your data.
Once done, you can do something such as awk 'BEGIN ("[^"]+")" /* normal processing here */ ' /path/to/file
.
That expression will define as a field either "anything that is not a comma" or "A double-quote, one or more of anything that is not a double-quote, followed by a double-quote".
This will, however, explode if any of your quoted data themselves contain double-quotes.
Simply using -F,
is often not enough to parse a CSV file. Especially if, as described, the delimiter can be part of a quoted string. You can work around some of this by using FPAT
to use an expression to define a field rather than defining a character for your field delimiter, but awk
will still go line-by-line, so you will have to preemptively consume the line breaks within your data.
Once done, you can do something such as awk 'BEGIN ("[^"]+")" /* normal processing here */ ' /path/to/file
.
That expression will define as a field either "anything that is not a comma" or "A double-quote, one or more of anything that is not a double-quote, followed by a double-quote".
This will, however, explode if any of your quoted data themselves contain double-quotes.
answered Jan 3 at 21:26
DopeGhoti
40.5k54979
40.5k54979
line 6 has double quotes and commas between the double quotes: "(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF"
â J.Turck
Jan 3 at 21:30
Is it possible to generate your data file using a delimiter which is not present in the data themselves (e. g.|
)?
â DopeGhoti
Jan 3 at 21:35
Unfortunately no. It is an export from a cots package - AKeneo.
â J.Turck
Jan 3 at 21:38
add a comment |Â
line 6 has double quotes and commas between the double quotes: "(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF"
â J.Turck
Jan 3 at 21:30
Is it possible to generate your data file using a delimiter which is not present in the data themselves (e. g.|
)?
â DopeGhoti
Jan 3 at 21:35
Unfortunately no. It is an export from a cots package - AKeneo.
â J.Turck
Jan 3 at 21:38
line 6 has double quotes and commas between the double quotes: "(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF"
â J.Turck
Jan 3 at 21:30
line 6 has double quotes and commas between the double quotes: "(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF"
â J.Turck
Jan 3 at 21:30
Is it possible to generate your data file using a delimiter which is not present in the data themselves (e. g.
|
)?â DopeGhoti
Jan 3 at 21:35
Is it possible to generate your data file using a delimiter which is not present in the data themselves (e. g.
|
)?â DopeGhoti
Jan 3 at 21:35
Unfortunately no. It is an export from a cots package - AKeneo.
â J.Turck
Jan 3 at 21:38
Unfortunately no. It is an export from a cots package - AKeneo.
â J.Turck
Jan 3 at 21:38
add a comment |Â
up vote
0
down vote
As was said, the file was generated incorrectly. Nevertheless, you may try to workaroumd it using not only ,
delimiter but also ",
and ,"
. Of course, custom script will be needed and no warranty you won't meet something like that in your 6th field.
Alternatively, you can strip first five fields assuming the 6th field is the only one messed, then from the result cut out the last field and comma. The remains will be 6th field content.
add a comment |Â
up vote
0
down vote
As was said, the file was generated incorrectly. Nevertheless, you may try to workaroumd it using not only ,
delimiter but also ",
and ,"
. Of course, custom script will be needed and no warranty you won't meet something like that in your 6th field.
Alternatively, you can strip first five fields assuming the 6th field is the only one messed, then from the result cut out the last field and comma. The remains will be 6th field content.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
As was said, the file was generated incorrectly. Nevertheless, you may try to workaroumd it using not only ,
delimiter but also ",
and ,"
. Of course, custom script will be needed and no warranty you won't meet something like that in your 6th field.
Alternatively, you can strip first five fields assuming the 6th field is the only one messed, then from the result cut out the last field and comma. The remains will be 6th field content.
As was said, the file was generated incorrectly. Nevertheless, you may try to workaroumd it using not only ,
delimiter but also ",
and ,"
. Of course, custom script will be needed and no warranty you won't meet something like that in your 6th field.
Alternatively, you can strip first five fields assuming the 6th field is the only one messed, then from the result cut out the last field and comma. The remains will be 6th field content.
answered Jan 3 at 23:44
Putnik
4142514
4142514
add a comment |Â
add a comment |Â
up vote
0
down vote
The solution is going to be very specific to your data file since the quotes aren't properly escaped off. Since there is only one trouble column, it's quite doable though. Here ya go:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Line: $line"
# grabbing the first field is easy ..
f1=$(echo $line | cut -d, -f1 )
# now remove the first field from the line
line=$(echo $line | sed "s/$f1,//" )
echo "Line is now: $line"
# to grab the second field use quote as a delimiter
f2=$(echo $line | cut -d" -f2 )
# now remove the second field from the line
line=$(echo $line | sed "s/"$f2",//" )
echo "Line is now: $line"
# fields 3,4,5 are trivial .. just repeat the same pattern as 1 and then remove them
f3=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f3,//" )
echo "Line is now: $line"
f4=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f4,//" )
echo "Line is now: $line"
f5=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f5,//" )
# here is the "trick" ... reverse the string, then you can cut field 7 first!
line=$(echo $line | rev)
echo "Line is now: $line"
f7=$(echo $line | cut -d, -f1 )
# now remove field 7 from the string, then reverse it back
line=$(echo $line | sed "s/$f7,//" )
f7=$(echo $f7 | rev)
# now we can reverse the remaining string, which is field 6 back to normal
line=$(echo $line | rev)
# and then remove the leading quote
line=$(echo $line | cut --complement -c 1)
# and then remove the trailing quote
line=$(echo $line | sed "s/"$//" )
echo "Line is now: $line"
# and then double up all the remaining quotes
f6=$(echo $line | sed "s/"/""/g" )
echo f1 = $f1
echo f2 = $f2
echo f3 = $f3
echo f4 = $f4
echo f5 = $f5
echo f6 = $f6
echo f7 = $f7
echo $f1,"$f2",$f3,$f4,$f5,"$f6",$f7 >> fixed.txt
done < "$1"
I made it echo lots of output to show you how it works, you can remove all the echo statements to make it fast once you understand it. It appends the fixed line to fixed.txt.
Here is an example run and output:
[root@alpha ~]# ./fixit.sh test.txt
Line: MMP,"01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: "01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: 1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: ,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: GNINAELC,"FSN ,flehs Q xaMorteM & i xaMorteM stif ,yxope epuat ,D"42 x W"84 no stnuom ,gnicaps "3 htiw thgirpu "6 ,yticapac yart )41("
Line is now: (14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF
f1 = MMP
f2 = 01_janitorial,02_cleaning_tools
f3 = 1
f4 =
f5 = CUBIC_INCH
f6 = (14) tray capacity, 6"" upright with 3"" spacing, mounts on 48""W x 24""D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF
f7 = CLEANING
If you need to escape the quotes in some other manner that should be pretty obvious given the above.
Wow, @alfreema, that's a lot of effort. I really appreciate it. The file I gave as my example is a shortened version of reality. My example showed 7 columns, I actually have 155. I believe I have most of it sorted, with the exception of what I call 'soft returns' needing to be removed while leaving the EOL character. The application allows formatting in the certain fields and the users add Tabs, bullets, and returns to format the data. When i read the file each 'soft return' is read as an EOL.
â J.Turck
Jan 4 at 18:30
⢠Sturdy construction: 300lb. (136kg), 400lb. (181kg), and$ 500lb. (227kg) capacity models available.",,,,,,,,,,,,0,DAY,24,INCH,,,Metro,"Fast drying of trays, pans, lids, pots and all pot sink items Promotes food safety by eliminating moisture. Interchangeable shelves adapt to all environments. Efficient organized drying area. Superior air circulation. Mobile unit allows for easy floor clean-up and$ transportability.",MTR2448XEA,,Cutting Board & Tray Drying Rack Group,570-825-2741,Customer,Service,PA,Wilkes-Barre,18705,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,48,INCH,,,^M$
â J.Turck
Jan 4 at 18:38
I want to remove the $ and leave the ^M.
â J.Turck
Jan 4 at 18:38
Are the soft returns literally $, or are you just using that as a placeholder for the example?
â alfreema
Jan 4 at 19:16
when i use :set list (inside the text file with vi) they show as $. My example is a cut and paste.
â J.Turck
Jan 5 at 2:08
 |Â
show 1 more comment
up vote
0
down vote
The solution is going to be very specific to your data file since the quotes aren't properly escaped off. Since there is only one trouble column, it's quite doable though. Here ya go:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Line: $line"
# grabbing the first field is easy ..
f1=$(echo $line | cut -d, -f1 )
# now remove the first field from the line
line=$(echo $line | sed "s/$f1,//" )
echo "Line is now: $line"
# to grab the second field use quote as a delimiter
f2=$(echo $line | cut -d" -f2 )
# now remove the second field from the line
line=$(echo $line | sed "s/"$f2",//" )
echo "Line is now: $line"
# fields 3,4,5 are trivial .. just repeat the same pattern as 1 and then remove them
f3=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f3,//" )
echo "Line is now: $line"
f4=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f4,//" )
echo "Line is now: $line"
f5=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f5,//" )
# here is the "trick" ... reverse the string, then you can cut field 7 first!
line=$(echo $line | rev)
echo "Line is now: $line"
f7=$(echo $line | cut -d, -f1 )
# now remove field 7 from the string, then reverse it back
line=$(echo $line | sed "s/$f7,//" )
f7=$(echo $f7 | rev)
# now we can reverse the remaining string, which is field 6 back to normal
line=$(echo $line | rev)
# and then remove the leading quote
line=$(echo $line | cut --complement -c 1)
# and then remove the trailing quote
line=$(echo $line | sed "s/"$//" )
echo "Line is now: $line"
# and then double up all the remaining quotes
f6=$(echo $line | sed "s/"/""/g" )
echo f1 = $f1
echo f2 = $f2
echo f3 = $f3
echo f4 = $f4
echo f5 = $f5
echo f6 = $f6
echo f7 = $f7
echo $f1,"$f2",$f3,$f4,$f5,"$f6",$f7 >> fixed.txt
done < "$1"
I made it echo lots of output to show you how it works, you can remove all the echo statements to make it fast once you understand it. It appends the fixed line to fixed.txt.
Here is an example run and output:
[root@alpha ~]# ./fixit.sh test.txt
Line: MMP,"01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: "01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: 1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: ,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: GNINAELC,"FSN ,flehs Q xaMorteM & i xaMorteM stif ,yxope epuat ,D"42 x W"84 no stnuom ,gnicaps "3 htiw thgirpu "6 ,yticapac yart )41("
Line is now: (14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF
f1 = MMP
f2 = 01_janitorial,02_cleaning_tools
f3 = 1
f4 =
f5 = CUBIC_INCH
f6 = (14) tray capacity, 6"" upright with 3"" spacing, mounts on 48""W x 24""D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF
f7 = CLEANING
If you need to escape the quotes in some other manner that should be pretty obvious given the above.
Wow, @alfreema, that's a lot of effort. I really appreciate it. The file I gave as my example is a shortened version of reality. My example showed 7 columns, I actually have 155. I believe I have most of it sorted, with the exception of what I call 'soft returns' needing to be removed while leaving the EOL character. The application allows formatting in the certain fields and the users add Tabs, bullets, and returns to format the data. When i read the file each 'soft return' is read as an EOL.
â J.Turck
Jan 4 at 18:30
⢠Sturdy construction: 300lb. (136kg), 400lb. (181kg), and$ 500lb. (227kg) capacity models available.",,,,,,,,,,,,0,DAY,24,INCH,,,Metro,"Fast drying of trays, pans, lids, pots and all pot sink items Promotes food safety by eliminating moisture. Interchangeable shelves adapt to all environments. Efficient organized drying area. Superior air circulation. Mobile unit allows for easy floor clean-up and$ transportability.",MTR2448XEA,,Cutting Board & Tray Drying Rack Group,570-825-2741,Customer,Service,PA,Wilkes-Barre,18705,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,48,INCH,,,^M$
â J.Turck
Jan 4 at 18:38
I want to remove the $ and leave the ^M.
â J.Turck
Jan 4 at 18:38
Are the soft returns literally $, or are you just using that as a placeholder for the example?
â alfreema
Jan 4 at 19:16
when i use :set list (inside the text file with vi) they show as $. My example is a cut and paste.
â J.Turck
Jan 5 at 2:08
 |Â
show 1 more comment
up vote
0
down vote
up vote
0
down vote
The solution is going to be very specific to your data file since the quotes aren't properly escaped off. Since there is only one trouble column, it's quite doable though. Here ya go:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Line: $line"
# grabbing the first field is easy ..
f1=$(echo $line | cut -d, -f1 )
# now remove the first field from the line
line=$(echo $line | sed "s/$f1,//" )
echo "Line is now: $line"
# to grab the second field use quote as a delimiter
f2=$(echo $line | cut -d" -f2 )
# now remove the second field from the line
line=$(echo $line | sed "s/"$f2",//" )
echo "Line is now: $line"
# fields 3,4,5 are trivial .. just repeat the same pattern as 1 and then remove them
f3=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f3,//" )
echo "Line is now: $line"
f4=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f4,//" )
echo "Line is now: $line"
f5=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f5,//" )
# here is the "trick" ... reverse the string, then you can cut field 7 first!
line=$(echo $line | rev)
echo "Line is now: $line"
f7=$(echo $line | cut -d, -f1 )
# now remove field 7 from the string, then reverse it back
line=$(echo $line | sed "s/$f7,//" )
f7=$(echo $f7 | rev)
# now we can reverse the remaining string, which is field 6 back to normal
line=$(echo $line | rev)
# and then remove the leading quote
line=$(echo $line | cut --complement -c 1)
# and then remove the trailing quote
line=$(echo $line | sed "s/"$//" )
echo "Line is now: $line"
# and then double up all the remaining quotes
f6=$(echo $line | sed "s/"/""/g" )
echo f1 = $f1
echo f2 = $f2
echo f3 = $f3
echo f4 = $f4
echo f5 = $f5
echo f6 = $f6
echo f7 = $f7
echo $f1,"$f2",$f3,$f4,$f5,"$f6",$f7 >> fixed.txt
done < "$1"
I made it echo lots of output to show you how it works, you can remove all the echo statements to make it fast once you understand it. It appends the fixed line to fixed.txt.
Here is an example run and output:
[root@alpha ~]# ./fixit.sh test.txt
Line: MMP,"01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: "01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: 1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: ,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: GNINAELC,"FSN ,flehs Q xaMorteM & i xaMorteM stif ,yxope epuat ,D"42 x W"84 no stnuom ,gnicaps "3 htiw thgirpu "6 ,yticapac yart )41("
Line is now: (14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF
f1 = MMP
f2 = 01_janitorial,02_cleaning_tools
f3 = 1
f4 =
f5 = CUBIC_INCH
f6 = (14) tray capacity, 6"" upright with 3"" spacing, mounts on 48""W x 24""D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF
f7 = CLEANING
If you need to escape the quotes in some other manner that should be pretty obvious given the above.
The solution is going to be very specific to your data file since the quotes aren't properly escaped off. Since there is only one trouble column, it's quite doable though. Here ya go:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Line: $line"
# grabbing the first field is easy ..
f1=$(echo $line | cut -d, -f1 )
# now remove the first field from the line
line=$(echo $line | sed "s/$f1,//" )
echo "Line is now: $line"
# to grab the second field use quote as a delimiter
f2=$(echo $line | cut -d" -f2 )
# now remove the second field from the line
line=$(echo $line | sed "s/"$f2",//" )
echo "Line is now: $line"
# fields 3,4,5 are trivial .. just repeat the same pattern as 1 and then remove them
f3=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f3,//" )
echo "Line is now: $line"
f4=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f4,//" )
echo "Line is now: $line"
f5=$(echo $line | cut -d, -f1 )
line=$(echo $line | sed "s/$f5,//" )
# here is the "trick" ... reverse the string, then you can cut field 7 first!
line=$(echo $line | rev)
echo "Line is now: $line"
f7=$(echo $line | cut -d, -f1 )
# now remove field 7 from the string, then reverse it back
line=$(echo $line | sed "s/$f7,//" )
f7=$(echo $f7 | rev)
# now we can reverse the remaining string, which is field 6 back to normal
line=$(echo $line | rev)
# and then remove the leading quote
line=$(echo $line | cut --complement -c 1)
# and then remove the trailing quote
line=$(echo $line | sed "s/"$//" )
echo "Line is now: $line"
# and then double up all the remaining quotes
f6=$(echo $line | sed "s/"/""/g" )
echo f1 = $f1
echo f2 = $f2
echo f3 = $f3
echo f4 = $f4
echo f5 = $f5
echo f6 = $f6
echo f7 = $f7
echo $f1,"$f2",$f3,$f4,$f5,"$f6",$f7 >> fixed.txt
done < "$1"
I made it echo lots of output to show you how it works, you can remove all the echo statements to make it fast once you understand it. It appends the fixed line to fixed.txt.
Here is an example run and output:
[root@alpha ~]# ./fixit.sh test.txt
Line: MMP,"01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: "01_janitorial,02_cleaning_tools",1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: 1,,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: ,CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: CUBIC_INCH,"(14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF",CLEANING
Line is now: GNINAELC,"FSN ,flehs Q xaMorteM & i xaMorteM stif ,yxope epuat ,D"42 x W"84 no stnuom ,gnicaps "3 htiw thgirpu "6 ,yticapac yart )41("
Line is now: (14) tray capacity, 6" upright with 3" spacing, mounts on 48"W x 24"D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF
f1 = MMP
f2 = 01_janitorial,02_cleaning_tools
f3 = 1
f4 =
f5 = CUBIC_INCH
f6 = (14) tray capacity, 6"" upright with 3"" spacing, mounts on 48""W x 24""D, taupe epoxy, fits MetroMax i & MetroMax Q shelf, NSF
f7 = CLEANING
If you need to escape the quotes in some other manner that should be pretty obvious given the above.
answered Jan 4 at 4:50
alfreema
293210
293210
Wow, @alfreema, that's a lot of effort. I really appreciate it. The file I gave as my example is a shortened version of reality. My example showed 7 columns, I actually have 155. I believe I have most of it sorted, with the exception of what I call 'soft returns' needing to be removed while leaving the EOL character. The application allows formatting in the certain fields and the users add Tabs, bullets, and returns to format the data. When i read the file each 'soft return' is read as an EOL.
â J.Turck
Jan 4 at 18:30
⢠Sturdy construction: 300lb. (136kg), 400lb. (181kg), and$ 500lb. (227kg) capacity models available.",,,,,,,,,,,,0,DAY,24,INCH,,,Metro,"Fast drying of trays, pans, lids, pots and all pot sink items Promotes food safety by eliminating moisture. Interchangeable shelves adapt to all environments. Efficient organized drying area. Superior air circulation. Mobile unit allows for easy floor clean-up and$ transportability.",MTR2448XEA,,Cutting Board & Tray Drying Rack Group,570-825-2741,Customer,Service,PA,Wilkes-Barre,18705,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,48,INCH,,,^M$
â J.Turck
Jan 4 at 18:38
I want to remove the $ and leave the ^M.
â J.Turck
Jan 4 at 18:38
Are the soft returns literally $, or are you just using that as a placeholder for the example?
â alfreema
Jan 4 at 19:16
when i use :set list (inside the text file with vi) they show as $. My example is a cut and paste.
â J.Turck
Jan 5 at 2:08
 |Â
show 1 more comment
Wow, @alfreema, that's a lot of effort. I really appreciate it. The file I gave as my example is a shortened version of reality. My example showed 7 columns, I actually have 155. I believe I have most of it sorted, with the exception of what I call 'soft returns' needing to be removed while leaving the EOL character. The application allows formatting in the certain fields and the users add Tabs, bullets, and returns to format the data. When i read the file each 'soft return' is read as an EOL.
â J.Turck
Jan 4 at 18:30
⢠Sturdy construction: 300lb. (136kg), 400lb. (181kg), and$ 500lb. (227kg) capacity models available.",,,,,,,,,,,,0,DAY,24,INCH,,,Metro,"Fast drying of trays, pans, lids, pots and all pot sink items Promotes food safety by eliminating moisture. Interchangeable shelves adapt to all environments. Efficient organized drying area. Superior air circulation. Mobile unit allows for easy floor clean-up and$ transportability.",MTR2448XEA,,Cutting Board & Tray Drying Rack Group,570-825-2741,Customer,Service,PA,Wilkes-Barre,18705,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,48,INCH,,,^M$
â J.Turck
Jan 4 at 18:38
I want to remove the $ and leave the ^M.
â J.Turck
Jan 4 at 18:38
Are the soft returns literally $, or are you just using that as a placeholder for the example?
â alfreema
Jan 4 at 19:16
when i use :set list (inside the text file with vi) they show as $. My example is a cut and paste.
â J.Turck
Jan 5 at 2:08
Wow, @alfreema, that's a lot of effort. I really appreciate it. The file I gave as my example is a shortened version of reality. My example showed 7 columns, I actually have 155. I believe I have most of it sorted, with the exception of what I call 'soft returns' needing to be removed while leaving the EOL character. The application allows formatting in the certain fields and the users add Tabs, bullets, and returns to format the data. When i read the file each 'soft return' is read as an EOL.
â J.Turck
Jan 4 at 18:30
Wow, @alfreema, that's a lot of effort. I really appreciate it. The file I gave as my example is a shortened version of reality. My example showed 7 columns, I actually have 155. I believe I have most of it sorted, with the exception of what I call 'soft returns' needing to be removed while leaving the EOL character. The application allows formatting in the certain fields and the users add Tabs, bullets, and returns to format the data. When i read the file each 'soft return' is read as an EOL.
â J.Turck
Jan 4 at 18:30
⢠Sturdy construction: 300lb. (136kg), 400lb. (181kg), and$ 500lb. (227kg) capacity models available.",,,,,,,,,,,,0,DAY,24,INCH,,,Metro,"Fast drying of trays, pans, lids, pots and all pot sink items Promotes food safety by eliminating moisture. Interchangeable shelves adapt to all environments. Efficient organized drying area. Superior air circulation. Mobile unit allows for easy floor clean-up and$ transportability.",MTR2448XEA,,Cutting Board & Tray Drying Rack Group,570-825-2741,Customer,Service,PA,Wilkes-Barre,18705,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,48,INCH,,,^M$
â J.Turck
Jan 4 at 18:38
⢠Sturdy construction: 300lb. (136kg), 400lb. (181kg), and$ 500lb. (227kg) capacity models available.",,,,,,,,,,,,0,DAY,24,INCH,,,Metro,"Fast drying of trays, pans, lids, pots and all pot sink items Promotes food safety by eliminating moisture. Interchangeable shelves adapt to all environments. Efficient organized drying area. Superior air circulation. Mobile unit allows for easy floor clean-up and$ transportability.",MTR2448XEA,,Cutting Board & Tray Drying Rack Group,570-825-2741,Customer,Service,PA,Wilkes-Barre,18705,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,48,INCH,,,^M$
â J.Turck
Jan 4 at 18:38
I want to remove the $ and leave the ^M.
â J.Turck
Jan 4 at 18:38
I want to remove the $ and leave the ^M.
â J.Turck
Jan 4 at 18:38
Are the soft returns literally $, or are you just using that as a placeholder for the example?
â alfreema
Jan 4 at 19:16
Are the soft returns literally $, or are you just using that as a placeholder for the example?
â alfreema
Jan 4 at 19:16
when i use :set list (inside the text file with vi) they show as $. My example is a cut and paste.
â J.Turck
Jan 5 at 2:08
when i use :set list (inside the text file with vi) they show as $. My example is a cut and paste.
â J.Turck
Jan 5 at 2:08
 |Â
show 1 more comment
up vote
0
down vote
I get the final product by removing carriage returns within a quoted filed as following script :
$ cat remove_cr.awk
#!/usr/bin/awk -f
record = record $0
# If number of quotes is odd, continue reading record.
if ( gsub( /"/, "&", record ) % 2 )
record = record " "
next
print record
record = ""
add a comment |Â
up vote
0
down vote
I get the final product by removing carriage returns within a quoted filed as following script :
$ cat remove_cr.awk
#!/usr/bin/awk -f
record = record $0
# If number of quotes is odd, continue reading record.
if ( gsub( /"/, "&", record ) % 2 )
record = record " "
next
print record
record = ""
add a comment |Â
up vote
0
down vote
up vote
0
down vote
I get the final product by removing carriage returns within a quoted filed as following script :
$ cat remove_cr.awk
#!/usr/bin/awk -f
record = record $0
# If number of quotes is odd, continue reading record.
if ( gsub( /"/, "&", record ) % 2 )
record = record " "
next
print record
record = ""
I get the final product by removing carriage returns within a quoted filed as following script :
$ cat remove_cr.awk
#!/usr/bin/awk -f
record = record $0
# If number of quotes is odd, continue reading record.
if ( gsub( /"/, "&", record ) % 2 )
record = record " "
next
print record
record = ""
edited Jan 8 at 17:57
ñÃÂsýù÷
15.3k92462
15.3k92462
answered Jan 8 at 17:50
J.Turck
62
62
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414641%2flinux-how-do-i-ignore-special-characters-between%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
If possible, fix the process that generates that file. For the CSV format, a field that contains the quote character should double it:
123,"a string ""with double"" quotes, and commas",456
â glenn jackman
Jan 3 at 21:33
This an export file from Akeneo (cots product) so I can not change the process that generates the file.
â J.Turck
Jan 3 at 21:37
3
Definitely raise an issue with the software provider of the product that produced that output - there is no way to parse that file without hitting loads of edge case and cause incorrect parsing. In the meantime I would write a parser in something like python to handle the edge cases as you encounter them. If you can get them to fix it upstream then you can just drop it and use a real csv parser.
â Michael Daffin
Jan 3 at 21:57
Kindly provide sample input and output
â Praveen Kumar BS
Jan 4 at 3:09
this quick-and-dirty sed script will fix up the quotes-inside-quotes so that the input file can be processed by a CSV parser:
sed -e 's/"/""/g; s/,""/,"/g; s/"",/",/g; s/^""|""$/"/g;'
. it first converts all"
s to""
, then changes them back to just"
if they are next to commas or the start or end of the line. It doesn't fix any other potential problems in the input file (and if they get something like this wrong, that's quite likely).â cas
Jan 4 at 4:01