Changing field delimiter and quoting character

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I would like to modify the content of 2 different files. How can I obtain the expected output by using a generic script in unix?



1st file:
" inside quoted string,
" and , (separator) inside quoted string



example:



"20181115","12345643","This is a "test"","","657","This is a "TEST"","","aaaa"
"20181115","12345632","This is an "example" of the file, a "sample" aaaa","123","",""TEST"","",""


expected output:



~20181115~;~12345643~;~This is a "test"~;~~;~657~;~This is a "TEST"~;~~;~aaaa~
~20181115~;~12345632~;~This is an "example" of the file, a "sample" aaaa~;~123~;~~;~"TEST"~;~~;~~


2nd file:
| (separator) inside quoted string and multiple " inside the string



example:



"098789"|"Hello world!"| 12,7|"Cities I want to visit Rome| London"|15.11.2018|"Yes"
"032425"|"Travel in ""New York"", USA"| 113,3||15.11.2018|"Yes"


expected output:



~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~









share|improve this question



















  • 1




    Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like ""test"", i.e., "This is a ""test""". Therefore, thes simple solution of using a CSV parser will not work.
    – Kusalananda
    2 days ago














up vote
0
down vote

favorite












I would like to modify the content of 2 different files. How can I obtain the expected output by using a generic script in unix?



1st file:
" inside quoted string,
" and , (separator) inside quoted string



example:



"20181115","12345643","This is a "test"","","657","This is a "TEST"","","aaaa"
"20181115","12345632","This is an "example" of the file, a "sample" aaaa","123","",""TEST"","",""


expected output:



~20181115~;~12345643~;~This is a "test"~;~~;~657~;~This is a "TEST"~;~~;~aaaa~
~20181115~;~12345632~;~This is an "example" of the file, a "sample" aaaa~;~123~;~~;~"TEST"~;~~;~~


2nd file:
| (separator) inside quoted string and multiple " inside the string



example:



"098789"|"Hello world!"| 12,7|"Cities I want to visit Rome| London"|15.11.2018|"Yes"
"032425"|"Travel in ""New York"", USA"| 113,3||15.11.2018|"Yes"


expected output:



~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~









share|improve this question



















  • 1




    Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like ""test"", i.e., "This is a ""test""". Therefore, thes simple solution of using a CSV parser will not work.
    – Kusalananda
    2 days ago












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I would like to modify the content of 2 different files. How can I obtain the expected output by using a generic script in unix?



1st file:
" inside quoted string,
" and , (separator) inside quoted string



example:



"20181115","12345643","This is a "test"","","657","This is a "TEST"","","aaaa"
"20181115","12345632","This is an "example" of the file, a "sample" aaaa","123","",""TEST"","",""


expected output:



~20181115~;~12345643~;~This is a "test"~;~~;~657~;~This is a "TEST"~;~~;~aaaa~
~20181115~;~12345632~;~This is an "example" of the file, a "sample" aaaa~;~123~;~~;~"TEST"~;~~;~~


2nd file:
| (separator) inside quoted string and multiple " inside the string



example:



"098789"|"Hello world!"| 12,7|"Cities I want to visit Rome| London"|15.11.2018|"Yes"
"032425"|"Travel in ""New York"", USA"| 113,3||15.11.2018|"Yes"


expected output:



~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~









share|improve this question















I would like to modify the content of 2 different files. How can I obtain the expected output by using a generic script in unix?



1st file:
" inside quoted string,
" and , (separator) inside quoted string



example:



"20181115","12345643","This is a "test"","","657","This is a "TEST"","","aaaa"
"20181115","12345632","This is an "example" of the file, a "sample" aaaa","123","",""TEST"","",""


expected output:



~20181115~;~12345643~;~This is a "test"~;~~;~657~;~This is a "TEST"~;~~;~aaaa~
~20181115~;~12345632~;~This is an "example" of the file, a "sample" aaaa~;~123~;~~;~"TEST"~;~~;~~


2nd file:
| (separator) inside quoted string and multiple " inside the string



example:



"098789"|"Hello world!"| 12,7|"Cities I want to visit Rome| London"|15.11.2018|"Yes"
"032425"|"Travel in ""New York"", USA"| 113,3||15.11.2018|"Yes"


expected output:



~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~






text-processing awk sed






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday









Mikel

38.5k998125




38.5k998125










asked 2 days ago









Mathew Linton

43




43







  • 1




    Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like ""test"", i.e., "This is a ""test""". Therefore, thes simple solution of using a CSV parser will not work.
    – Kusalananda
    2 days ago












  • 1




    Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like ""test"", i.e., "This is a ""test""". Therefore, thes simple solution of using a CSV parser will not work.
    – Kusalananda
    2 days ago







1




1




Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like ""test"", i.e., "This is a ""test""". Therefore, thes simple solution of using a CSV parser will not work.
– Kusalananda
2 days ago




Unfortunately, the input data is not standard CSV. A double quoted string at the very start or end of a double quoted field should look like ""test"", i.e., "This is a ""test""". Therefore, thes simple solution of using a CSV parser will not work.
– Kusalananda
2 days ago










1 Answer
1






active

oldest

votes

















up vote
0
down vote













Try simple sed substitutions for your first problem:



sed 's/","/~;~/g; s/^"|"$/~/g' file


and a more involved awk script for your second:



awk -F" '$1=$1; for (i=2; i<=NF; i+=2) gsub (" 1' OFS="~" file 


It first replaces all | within double quotes with an unusual placeholder not commonly found in text files, then does the desired replacements, then reverses the placeholder substitution.



Please be aware that both are taylored to your problems and thus not generally applicable to other, even similar, problems without adaption.



Output if applied to sample in your question (Ubuntu, mawk 1.3.3 Nov 1996):



~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~





share|improve this answer






















  • I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
    – Mathew Linton
    yesterday











  • ;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
    – Mathew Linton
    yesterday










  • Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be your awk version? Something strange with your input?
    – RudiC
    yesterday










  • the version is GNU Awk 3.1.7
    – Mathew Linton
    yesterday











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482020%2fchanging-field-delimiter-and-quoting-character%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













Try simple sed substitutions for your first problem:



sed 's/","/~;~/g; s/^"|"$/~/g' file


and a more involved awk script for your second:



awk -F" '$1=$1; for (i=2; i<=NF; i+=2) gsub (" 1' OFS="~" file 


It first replaces all | within double quotes with an unusual placeholder not commonly found in text files, then does the desired replacements, then reverses the placeholder substitution.



Please be aware that both are taylored to your problems and thus not generally applicable to other, even similar, problems without adaption.



Output if applied to sample in your question (Ubuntu, mawk 1.3.3 Nov 1996):



~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~





share|improve this answer






















  • I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
    – Mathew Linton
    yesterday











  • ;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
    – Mathew Linton
    yesterday










  • Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be your awk version? Something strange with your input?
    – RudiC
    yesterday










  • the version is GNU Awk 3.1.7
    – Mathew Linton
    yesterday















up vote
0
down vote













Try simple sed substitutions for your first problem:



sed 's/","/~;~/g; s/^"|"$/~/g' file


and a more involved awk script for your second:



awk -F" '$1=$1; for (i=2; i<=NF; i+=2) gsub (" 1' OFS="~" file 


It first replaces all | within double quotes with an unusual placeholder not commonly found in text files, then does the desired replacements, then reverses the placeholder substitution.



Please be aware that both are taylored to your problems and thus not generally applicable to other, even similar, problems without adaption.



Output if applied to sample in your question (Ubuntu, mawk 1.3.3 Nov 1996):



~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~





share|improve this answer






















  • I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
    – Mathew Linton
    yesterday











  • ;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
    – Mathew Linton
    yesterday










  • Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be your awk version? Something strange with your input?
    – RudiC
    yesterday










  • the version is GNU Awk 3.1.7
    – Mathew Linton
    yesterday













up vote
0
down vote










up vote
0
down vote









Try simple sed substitutions for your first problem:



sed 's/","/~;~/g; s/^"|"$/~/g' file


and a more involved awk script for your second:



awk -F" '$1=$1; for (i=2; i<=NF; i+=2) gsub (" 1' OFS="~" file 


It first replaces all | within double quotes with an unusual placeholder not commonly found in text files, then does the desired replacements, then reverses the placeholder substitution.



Please be aware that both are taylored to your problems and thus not generally applicable to other, even similar, problems without adaption.



Output if applied to sample in your question (Ubuntu, mawk 1.3.3 Nov 1996):



~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~





share|improve this answer














Try simple sed substitutions for your first problem:



sed 's/","/~;~/g; s/^"|"$/~/g' file


and a more involved awk script for your second:



awk -F" '$1=$1; for (i=2; i<=NF; i+=2) gsub (" 1' OFS="~" file 


It first replaces all | within double quotes with an unusual placeholder not commonly found in text files, then does the desired replacements, then reverses the placeholder substitution.



Please be aware that both are taylored to your problems and thus not generally applicable to other, even similar, problems without adaption.



Output if applied to sample in your question (Ubuntu, mawk 1.3.3 Nov 1996):



~098789~;~Hello world!~; 12,7;~Cities I want to visit Rome| London~;15.11.2018;~Yes~
~032425~;~Travel in /"New York/", USA~; 113,3;;15.11.2018;~Yes~






share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday

























answered yesterday









RudiC

3,0611211




3,0611211











  • I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
    – Mathew Linton
    yesterday











  • ;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
    – Mathew Linton
    yesterday










  • Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be your awk version? Something strange with your input?
    – RudiC
    yesterday










  • the version is GNU Awk 3.1.7
    – Mathew Linton
    yesterday

















  • I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
    – Mathew Linton
    yesterday











  • ;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
    – Mathew Linton
    yesterday










  • Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be your awk version? Something strange with your input?
    – RudiC
    yesterday










  • the version is GNU Awk 3.1.7
    – Mathew Linton
    yesterday
















I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
– Mathew Linton
yesterday





I have a question regarding the awk (for second case: the separator is added after each letter and not after each word
– Mathew Linton
yesterday













;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
– Mathew Linton
yesterday




;~;|;0;|;9;|;8;|;7;|;8;|;9;|;~;|;~;|;H;|;e;|;l;|;l;|;o;|; ;|;w;|;o;|;r;|;l;|;d;|;!;|;~;|; ;1;2;,;7;|;~;|;C;|;i;|;t;|;i;|;e;|;s;|; ;|;I;|; ;|;w;|;a;|;n;|;t;|; ;|;t;|;o;|; ;|;v;|;i;|;s;|;i;|;t;|; ;|;R;|;o;|;m;|;e;|;|;|; ;|;L;|;o;|;n;|;d;|;o;|;n;|;~;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~; ;~;|;0;|;3;|;2;|;4;|;2;|;5;|;~;|;~;|;T;|;r;|;a;|;v;|;e;|;l;|; ;|;i;|;n;|; ;|;~;~;|;N;|;e;|;w;|; ;|;Y;|;o;|;r;|;k;|;~;~;|;,;|; ;|;U;|;S;|;A;|;~;|; ;1;1;3;,;3;|;|;1;5;.;1;1;.;2;0;1;8;|;~;|;Y;|;e;|;s;|;~;
– Mathew Linton
yesterday












Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be your awk version? Something strange with your input?
– RudiC
yesterday




Not for me; see output that I added to the answer. Did you copy the proposal one-to-one, char by char? What be your awk version? Something strange with your input?
– RudiC
yesterday












the version is GNU Awk 3.1.7
– Mathew Linton
yesterday





the version is GNU Awk 3.1.7
– Mathew Linton
yesterday


















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482020%2fchanging-field-delimiter-and-quoting-character%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown






Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?