Changing field delimiter and quoting character
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I would like to modify the content of 2 different files. How can I obtain the expected output by using a generic script in unix?
1st file:"
inside quoted string,"
and ,
(separator) inside quoted string
example:
"20181115","12345643","This is a "test"","","657","This is a "TEST"","","aaaa"
"20181115","12345632","This is an "example" of the file, a "sample" aaaa","123","",""TEST"","",""
expected output:
~20181115~;~12345643~;~This is a "test"~;~~;~657~;~This is a "TEST"~;~~;~aaaa~
~20181115~;~12345632~;~This is an "example" of the file, a "sample" aaaa~;~123~;~~;~"TEST"~;~~;~~
2nd file:|
(separator) inside quoted string and multiple "
inside the string
example:
"098789"|"Hello world!"| 12,7|"Cities I want to visit Rome| London"|15.11.2018|"Yes"
"032425"|"Travel in ""New York"", USA"| 113,3||15.11.2018|"Yes"
expected output:
~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~
text-processing awk sed
add a comment |
up vote
0
down vote
favorite
I would like to modify the content of 2 different files. How can I obtain the expected output by using a generic script in unix?
1st file:"
inside quoted string,"
and ,
(separator) inside quoted string
example:
"20181115","12345643","This is a "test"","","657","This is a "TEST"","","aaaa"
"20181115","12345632","This is an "example" of the file, a "sample" aaaa","123","",""TEST"","",""
expected output:
~20181115~;~12345643~;~This is a "test"~;~~;~657~;~This is a "TEST"~;~~;~aaaa~
~20181115~;~12345632~;~This is an "example" of the file, a "sample" aaaa~;~123~;~~;~"TEST"~;~~;~~
2nd file:|
(separator) inside quoted string and multiple "
inside the string
example:
"098789"|"Hello world!"| 12,7|"Cities I want to visit Rome| London"|15.11.2018|"Yes"
"032425"|"Travel in ""New York"", USA"| 113,3||15.11.2018|"Yes"
expected output:
~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~
text-processing awk sed
1
Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like""test""
, i.e.,"This is a ""test"""
. Therefore, thes simple solution of using a CSV parser will not work.
– Kusalananda
2 days ago
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I would like to modify the content of 2 different files. How can I obtain the expected output by using a generic script in unix?
1st file:"
inside quoted string,"
and ,
(separator) inside quoted string
example:
"20181115","12345643","This is a "test"","","657","This is a "TEST"","","aaaa"
"20181115","12345632","This is an "example" of the file, a "sample" aaaa","123","",""TEST"","",""
expected output:
~20181115~;~12345643~;~This is a "test"~;~~;~657~;~This is a "TEST"~;~~;~aaaa~
~20181115~;~12345632~;~This is an "example" of the file, a "sample" aaaa~;~123~;~~;~"TEST"~;~~;~~
2nd file:|
(separator) inside quoted string and multiple "
inside the string
example:
"098789"|"Hello world!"| 12,7|"Cities I want to visit Rome| London"|15.11.2018|"Yes"
"032425"|"Travel in ""New York"", USA"| 113,3||15.11.2018|"Yes"
expected output:
~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~
text-processing awk sed
I would like to modify the content of 2 different files. How can I obtain the expected output by using a generic script in unix?
1st file:"
inside quoted string,"
and ,
(separator) inside quoted string
example:
"20181115","12345643","This is a "test"","","657","This is a "TEST"","","aaaa"
"20181115","12345632","This is an "example" of the file, a "sample" aaaa","123","",""TEST"","",""
expected output:
~20181115~;~12345643~;~This is a "test"~;~~;~657~;~This is a "TEST"~;~~;~aaaa~
~20181115~;~12345632~;~This is an "example" of the file, a "sample" aaaa~;~123~;~~;~"TEST"~;~~;~~
2nd file:|
(separator) inside quoted string and multiple "
inside the string
example:
"098789"|"Hello world!"| 12,7|"Cities I want to visit Rome| London"|15.11.2018|"Yes"
"032425"|"Travel in ""New York"", USA"| 113,3||15.11.2018|"Yes"
expected output:
~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~
text-processing awk sed
text-processing awk sed
edited yesterday
Mikel
38.5k998125
38.5k998125
asked 2 days ago
Mathew Linton
43
43
1
Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like""test""
, i.e.,"This is a ""test"""
. Therefore, thes simple solution of using a CSV parser will not work.
– Kusalananda
2 days ago
add a comment |
1
Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like""test""
, i.e.,"This is a ""test"""
. Therefore, thes simple solution of using a CSV parser will not work.
– Kusalananda
2 days ago
1
1
Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like
""test""
, i.e., "This is a ""test"""
. Therefore, thes simple solution of using a CSV parser will not work.– Kusalananda
2 days ago
Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like
""test""
, i.e., "This is a ""test"""
. Therefore, thes simple solution of using a CSV parser will not work.– Kusalananda
2 days ago
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Try simple sed
substitutions for your first problem:
sed 's/","/~;~/g; s/^"|"$/~/g' file
and a more involved awk
script for your second:
awk -F" '$1=$1; for (i=2; i<=NF; i+=2) gsub (" 1' OFS="~" file
It first replaces all |
within double quotes with an unusual placeholder not commonly found in text files, then does the desired replacements, then reverses the placeholder substitution.
Please be aware that both are taylored to your problems and thus not generally applicable to other, even similar, problems without adaption.
Output if applied to sample in your question (Ubuntu, mawk 1.3.3 Nov 1996):
~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~
I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
– Mathew Linton
yesterday
;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
– Mathew Linton
yesterday
Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be yourawk
version? Something strange with your input?
– RudiC
yesterday
the version is GNU Awk 3.1.7
– Mathew Linton
yesterday
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Try simple sed
substitutions for your first problem:
sed 's/","/~;~/g; s/^"|"$/~/g' file
and a more involved awk
script for your second:
awk -F" '$1=$1; for (i=2; i<=NF; i+=2) gsub (" 1' OFS="~" file
It first replaces all |
within double quotes with an unusual placeholder not commonly found in text files, then does the desired replacements, then reverses the placeholder substitution.
Please be aware that both are taylored to your problems and thus not generally applicable to other, even similar, problems without adaption.
Output if applied to sample in your question (Ubuntu, mawk 1.3.3 Nov 1996):
~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~
I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
– Mathew Linton
yesterday
;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
– Mathew Linton
yesterday
Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be yourawk
version? Something strange with your input?
– RudiC
yesterday
the version is GNU Awk 3.1.7
– Mathew Linton
yesterday
add a comment |
up vote
0
down vote
Try simple sed
substitutions for your first problem:
sed 's/","/~;~/g; s/^"|"$/~/g' file
and a more involved awk
script for your second:
awk -F" '$1=$1; for (i=2; i<=NF; i+=2) gsub (" 1' OFS="~" file
It first replaces all |
within double quotes with an unusual placeholder not commonly found in text files, then does the desired replacements, then reverses the placeholder substitution.
Please be aware that both are taylored to your problems and thus not generally applicable to other, even similar, problems without adaption.
Output if applied to sample in your question (Ubuntu, mawk 1.3.3 Nov 1996):
~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~
I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
– Mathew Linton
yesterday
;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
– Mathew Linton
yesterday
Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be yourawk
version? Something strange with your input?
– RudiC
yesterday
the version is GNU Awk 3.1.7
– Mathew Linton
yesterday
add a comment |
up vote
0
down vote
up vote
0
down vote
Try simple sed
substitutions for your first problem:
sed 's/","/~;~/g; s/^"|"$/~/g' file
and a more involved awk
script for your second:
awk -F" '$1=$1; for (i=2; i<=NF; i+=2) gsub (" 1' OFS="~" file
It first replaces all |
within double quotes with an unusual placeholder not commonly found in text files, then does the desired replacements, then reverses the placeholder substitution.
Please be aware that both are taylored to your problems and thus not generally applicable to other, even similar, problems without adaption.
Output if applied to sample in your question (Ubuntu, mawk 1.3.3 Nov 1996):
~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~
Try simple sed
substitutions for your first problem:
sed 's/","/~;~/g; s/^"|"$/~/g' file
and a more involved awk
script for your second:
awk -F" '$1=$1; for (i=2; i<=NF; i+=2) gsub (" 1' OFS="~" file
It first replaces all |
within double quotes with an unusual placeholder not commonly found in text files, then does the desired replacements, then reverses the placeholder substitution.
Please be aware that both are taylored to your problems and thus not generally applicable to other, even similar, problems without adaption.
Output if applied to sample in your question (Ubuntu, mawk 1.3.3 Nov 1996):
~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~
edited yesterday
answered yesterday
RudiC
3,0611211
3,0611211
I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
– Mathew Linton
yesterday
;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
– Mathew Linton
yesterday
Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be yourawk
version? Something strange with your input?
– RudiC
yesterday
the version is GNU Awk 3.1.7
– Mathew Linton
yesterday
add a comment |
I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
– Mathew Linton
yesterday
;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
– Mathew Linton
yesterday
Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be yourawk
version? Something strange with your input?
– RudiC
yesterday
the version is GNU Awk 3.1.7
– Mathew Linton
yesterday
I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
– Mathew Linton
yesterday
I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
– Mathew Linton
yesterday
;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
– Mathew Linton
yesterday
;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
– Mathew Linton
yesterday
Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be your
awk
version? Something strange with your input?– RudiC
yesterday
Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be your
awk
version? Something strange with your input?– RudiC
yesterday
the version is GNU Awk 3.1.7
– Mathew Linton
yesterday
the version is GNU Awk 3.1.7
– Mathew Linton
yesterday
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482020%2fchanging-field-delimiter-and-quoting-character%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like
""test""
, i.e.,"This is a ""test"""
. Therefore, thes simple solution of using a CSV parser will not work.– Kusalananda
2 days ago