Cut the SUBSTRINGS to a specific length in a CSV file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I have a file like below,
cat Test.csv



"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,


After modification :



cat Test.csv

"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,


The 3rd field containing substrings delimited with ";" has to be replaced with their substrings







share|improve this question






















  • Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
    – Î±Ò“sнιη
    Oct 22 '17 at 7:51











  • @αғsнιη The data has 5 columns, with the 4th and 5th being empty, it seems.
    – Kusalananda
    Oct 22 '17 at 7:52










  • ah, yes, edited my comment
    – Î±Ò“sнιη
    Oct 22 '17 at 7:53














up vote
1
down vote

favorite












I have a file like below,
cat Test.csv



"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,


After modification :



cat Test.csv

"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,


The 3rd field containing substrings delimited with ";" has to be replaced with their substrings







share|improve this question






















  • Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
    – Î±Ò“sнιη
    Oct 22 '17 at 7:51











  • @αғsнιη The data has 5 columns, with the 4th and 5th being empty, it seems.
    – Kusalananda
    Oct 22 '17 at 7:52










  • ah, yes, edited my comment
    – Î±Ò“sнιη
    Oct 22 '17 at 7:53












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a file like below,
cat Test.csv



"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,


After modification :



cat Test.csv

"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,


The 3rd field containing substrings delimited with ";" has to be replaced with their substrings







share|improve this question














I have a file like below,
cat Test.csv



"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,


After modification :



cat Test.csv

"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,


The 3rd field containing substrings delimited with ";" has to be replaced with their substrings









share|improve this question













share|improve this question




share|improve this question








edited Oct 22 '17 at 10:37









RomanPerekhrest

22.5k12145




22.5k12145










asked Oct 22 '17 at 6:56









Pavan

61




61











  • Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
    – Î±Ò“sнιη
    Oct 22 '17 at 7:51











  • @αғsнιη The data has 5 columns, with the 4th and 5th being empty, it seems.
    – Kusalananda
    Oct 22 '17 at 7:52










  • ah, yes, edited my comment
    – Î±Ò“sнιη
    Oct 22 '17 at 7:53
















  • Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
    – Î±Ò“sнιη
    Oct 22 '17 at 7:51











  • @αғsнιη The data has 5 columns, with the 4th and 5th being empty, it seems.
    – Kusalananda
    Oct 22 '17 at 7:52










  • ah, yes, edited my comment
    – Î±Ò“sнιη
    Oct 22 '17 at 7:53















Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
– Î±Ò“sнιη
Oct 22 '17 at 7:51





Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
– Î±Ò“sнιη
Oct 22 '17 at 7:51













@αғsнιη The data has 5 columns, with the 4th and 5th being empty, it seems.
– Kusalananda
Oct 22 '17 at 7:52




@αғsнιη The data has 5 columns, with the 4th and 5th being empty, it seems.
– Kusalananda
Oct 22 '17 at 7:52












ah, yes, edited my comment
– Î±Ò“sнιη
Oct 22 '17 at 7:53




ah, yes, edited my comment
– Î±Ò“sнιη
Oct 22 '17 at 7:53










4 Answers
4






active

oldest

votes

















up vote
0
down vote













The following is using csvkit, because parsing CSV data that contains commas in quoted fields with awk directly is error prone.



This will get column three on the correct format:



csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd


For the given input, this produces



"AB;xy;15"
"C4;x2;rt"



  • csvcut will cut out the third column.


  • sed will remove any double quotes from the data, if they appear first or last on the line.

  • The awk program will go through the ;-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it.

  • The output is written to the file tmp-3rd.

Then it's just a matter of reassembling this with the original data (this is assuming bash or any other shell that can do process substitutions with <(...)):



paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat



  • paste will put the columns together with commas in-between.

  • The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.

  • As an optional step, we pass the data through csvformat which will quote or unquote fields as needed.

The output will be



pav,12345,AB;xy;15,,
xyz,,C4;x2;rt,,



Bypassing the need for the temporary file:



paste -d, 
<( csvcut -c 1,2 file.csv )
<( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
<( csvcut -c 4,5 file.csv ) | csvformat





share|improve this answer




















  • I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
    – Pavan
    Oct 22 '17 at 8:18











  • @Pavan pip install --user csvkit will install the commands in $HOME/.local/bin.
    – Kusalananda
    Oct 22 '17 at 8:29










  • I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
    – Pavan
    Oct 23 '17 at 15:10











  • @Pavan I don't understand. Please update the question with the appropriate information.
    – Kusalananda
    Oct 23 '17 at 15:14










  • I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
    – Pavan
    Oct 23 '17 at 15:22

















up vote
0
down vote













With perl



Assuming ; is only in third field



$ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,



  • "K to match " before string of interest and (?=") to match " after string of interest. But the " themselves not part of captured string as these are lookarounds


  • [^;"]*;[^"]* match any non ; or " characters followed by ; followed by non " characters


  • $&=~s|([^;]2)[^;]+|$1|gr to perform another substitution on the matched string


  • e modifier allows to use Perl code in substitution section



To restrict only for 3rd field



$ cat ip.txt 
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"

$ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"





share|improve this answer



























    up vote
    0
    down vote













    Accurate and robust Python 3.x solution (based on csv.reader object):



    parse_csv.py script:



    import csv, sys
    with open(sys.argv[1]) as f:
    reader = csv.reader(f)
    for l in reader:
    l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
    print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))



    Usage:



    python3 parse_csv.py Test.csv


    The output:



    "pav",12345,"AB;xy;15",,
    "xyz",,"C4;x2;rt",,



    Python's csv module provides robust and flexible support for csv data.






    share|improve this answer





























      up vote
      0
      down vote













      Complex GNU AWK solution (parsing csv data):



      awk -v FPAT='"[^"]+"|[^",]+|,,' ' 
      for (i=1;i<=NF;i++)
      if ($i~/^".*;./)
      len=split($i,a,";"); v=substr(a[1],1,3);
      for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
      v=v"42"
      printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
      (i==NF? ORS:OFS); v=""

      ' OFS=',' Test.csv


      • FPAT='"[^"]+"|[^",]+|,,' - complex regex pattern defining field value


      • if ($i~/^".*;./) ... - if the current field $i contains ; character(s)


      • len=split($i,a,";") - split the field value $i into array a by separator ;. len is assigned with number of elements/chunks created


      • v=substr(a[1],1,3); - capturing the first chunk of the needed length including leading " char, for ex. "AB will be extracted from "ABCD,EF


      • for (j=2;j<=len;j++) ... - iterating through remaining chunks/items


      • v=v"42" - add trailing double quote " to the processed sequence v. 43 is ASCII octal code representing the double quote char ".


      • ($i~/^,,/? (i==NF? ",":""):$i ) - each empty field ,, is recreated with single comma , and common delimiter (also ,). This is to avoid redundant comma cluttering like "pav",,,


      • (i==NF? ORS:OFS) - on encountering the last field i==NF - print output record separator ORS, otherwise - print output filed separator OFS



      The output:



      "pav",12345,"AB;xy;15",,
      "xyz",,"C4;x2;rt",,





      share|improve this answer






















      • Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
        – Pavan
        Oct 22 '17 at 8:25










      • @Pavan, welcome, see my explanation
        – RomanPerekhrest
        Oct 22 '17 at 10:30










      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399655%2fcut-the-substrings-to-a-specific-length-in-a-csv-file%23new-answer', 'question_page');

      );

      Post as a guest






























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      0
      down vote













      The following is using csvkit, because parsing CSV data that contains commas in quoted fields with awk directly is error prone.



      This will get column three on the correct format:



      csvcut -c 3 file.csv |
      sed -r 's/^"|"$//g' |
      awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd


      For the given input, this produces



      "AB;xy;15"
      "C4;x2;rt"



      • csvcut will cut out the third column.


      • sed will remove any double quotes from the data, if they appear first or last on the line.

      • The awk program will go through the ;-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it.

      • The output is written to the file tmp-3rd.

      Then it's just a matter of reassembling this with the original data (this is assuming bash or any other shell that can do process substitutions with <(...)):



      paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat



      • paste will put the columns together with commas in-between.

      • The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.

      • As an optional step, we pass the data through csvformat which will quote or unquote fields as needed.

      The output will be



      pav,12345,AB;xy;15,,
      xyz,,C4;x2;rt,,



      Bypassing the need for the temporary file:



      paste -d, 
      <( csvcut -c 1,2 file.csv )
      <( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
      awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
      <( csvcut -c 4,5 file.csv ) | csvformat





      share|improve this answer




















      • I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
        – Pavan
        Oct 22 '17 at 8:18











      • @Pavan pip install --user csvkit will install the commands in $HOME/.local/bin.
        – Kusalananda
        Oct 22 '17 at 8:29










      • I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
        – Pavan
        Oct 23 '17 at 15:10











      • @Pavan I don't understand. Please update the question with the appropriate information.
        – Kusalananda
        Oct 23 '17 at 15:14










      • I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
        – Pavan
        Oct 23 '17 at 15:22














      up vote
      0
      down vote













      The following is using csvkit, because parsing CSV data that contains commas in quoted fields with awk directly is error prone.



      This will get column three on the correct format:



      csvcut -c 3 file.csv |
      sed -r 's/^"|"$//g' |
      awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd


      For the given input, this produces



      "AB;xy;15"
      "C4;x2;rt"



      • csvcut will cut out the third column.


      • sed will remove any double quotes from the data, if they appear first or last on the line.

      • The awk program will go through the ;-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it.

      • The output is written to the file tmp-3rd.

      Then it's just a matter of reassembling this with the original data (this is assuming bash or any other shell that can do process substitutions with <(...)):



      paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat



      • paste will put the columns together with commas in-between.

      • The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.

      • As an optional step, we pass the data through csvformat which will quote or unquote fields as needed.

      The output will be



      pav,12345,AB;xy;15,,
      xyz,,C4;x2;rt,,



      Bypassing the need for the temporary file:



      paste -d, 
      <( csvcut -c 1,2 file.csv )
      <( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
      awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
      <( csvcut -c 4,5 file.csv ) | csvformat





      share|improve this answer




















      • I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
        – Pavan
        Oct 22 '17 at 8:18











      • @Pavan pip install --user csvkit will install the commands in $HOME/.local/bin.
        – Kusalananda
        Oct 22 '17 at 8:29










      • I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
        – Pavan
        Oct 23 '17 at 15:10











      • @Pavan I don't understand. Please update the question with the appropriate information.
        – Kusalananda
        Oct 23 '17 at 15:14










      • I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
        – Pavan
        Oct 23 '17 at 15:22












      up vote
      0
      down vote










      up vote
      0
      down vote









      The following is using csvkit, because parsing CSV data that contains commas in quoted fields with awk directly is error prone.



      This will get column three on the correct format:



      csvcut -c 3 file.csv |
      sed -r 's/^"|"$//g' |
      awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd


      For the given input, this produces



      "AB;xy;15"
      "C4;x2;rt"



      • csvcut will cut out the third column.


      • sed will remove any double quotes from the data, if they appear first or last on the line.

      • The awk program will go through the ;-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it.

      • The output is written to the file tmp-3rd.

      Then it's just a matter of reassembling this with the original data (this is assuming bash or any other shell that can do process substitutions with <(...)):



      paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat



      • paste will put the columns together with commas in-between.

      • The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.

      • As an optional step, we pass the data through csvformat which will quote or unquote fields as needed.

      The output will be



      pav,12345,AB;xy;15,,
      xyz,,C4;x2;rt,,



      Bypassing the need for the temporary file:



      paste -d, 
      <( csvcut -c 1,2 file.csv )
      <( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
      awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
      <( csvcut -c 4,5 file.csv ) | csvformat





      share|improve this answer












      The following is using csvkit, because parsing CSV data that contains commas in quoted fields with awk directly is error prone.



      This will get column three on the correct format:



      csvcut -c 3 file.csv |
      sed -r 's/^"|"$//g' |
      awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd


      For the given input, this produces



      "AB;xy;15"
      "C4;x2;rt"



      • csvcut will cut out the third column.


      • sed will remove any double quotes from the data, if they appear first or last on the line.

      • The awk program will go through the ;-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it.

      • The output is written to the file tmp-3rd.

      Then it's just a matter of reassembling this with the original data (this is assuming bash or any other shell that can do process substitutions with <(...)):



      paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat



      • paste will put the columns together with commas in-between.

      • The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.

      • As an optional step, we pass the data through csvformat which will quote or unquote fields as needed.

      The output will be



      pav,12345,AB;xy;15,,
      xyz,,C4;x2;rt,,



      Bypassing the need for the temporary file:



      paste -d, 
      <( csvcut -c 1,2 file.csv )
      <( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
      awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
      <( csvcut -c 4,5 file.csv ) | csvformat






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Oct 22 '17 at 7:51









      Kusalananda

      105k14209326




      105k14209326











      • I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
        – Pavan
        Oct 22 '17 at 8:18











      • @Pavan pip install --user csvkit will install the commands in $HOME/.local/bin.
        – Kusalananda
        Oct 22 '17 at 8:29










      • I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
        – Pavan
        Oct 23 '17 at 15:10











      • @Pavan I don't understand. Please update the question with the appropriate information.
        – Kusalananda
        Oct 23 '17 at 15:14










      • I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
        – Pavan
        Oct 23 '17 at 15:22
















      • I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
        – Pavan
        Oct 22 '17 at 8:18











      • @Pavan pip install --user csvkit will install the commands in $HOME/.local/bin.
        – Kusalananda
        Oct 22 '17 at 8:29










      • I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
        – Pavan
        Oct 23 '17 at 15:10











      • @Pavan I don't understand. Please update the question with the appropriate information.
        – Kusalananda
        Oct 23 '17 at 15:14










      • I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
        – Pavan
        Oct 23 '17 at 15:22















      I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
      – Pavan
      Oct 22 '17 at 8:18





      I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
      – Pavan
      Oct 22 '17 at 8:18













      @Pavan pip install --user csvkit will install the commands in $HOME/.local/bin.
      – Kusalananda
      Oct 22 '17 at 8:29




      @Pavan pip install --user csvkit will install the commands in $HOME/.local/bin.
      – Kusalananda
      Oct 22 '17 at 8:29












      I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
      – Pavan
      Oct 23 '17 at 15:10





      I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
      – Pavan
      Oct 23 '17 at 15:10













      @Pavan I don't understand. Please update the question with the appropriate information.
      – Kusalananda
      Oct 23 '17 at 15:14




      @Pavan I don't understand. Please update the question with the appropriate information.
      – Kusalananda
      Oct 23 '17 at 15:14












      I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
      – Pavan
      Oct 23 '17 at 15:22




      I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
      – Pavan
      Oct 23 '17 at 15:22












      up vote
      0
      down vote













      With perl



      Assuming ; is only in third field



      $ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
      "pav",12345,"AB;xy;15",,
      "xyz",,"C4;x2;rt",,



      • "K to match " before string of interest and (?=") to match " after string of interest. But the " themselves not part of captured string as these are lookarounds


      • [^;"]*;[^"]* match any non ; or " characters followed by ; followed by non " characters


      • $&=~s|([^;]2)[^;]+|$1|gr to perform another substitution on the matched string


      • e modifier allows to use Perl code in substitution section



      To restrict only for 3rd field



      $ cat ip.txt 
      "pav",12345,"ABCD,EF;xyz23;15rtg",,
      "xyz",,"C4DEF;x23yu;rtg",,
      "foo;12,23;good",124,253
      12,5232,"xyz","ijk;5545;62"

      $ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
      "pav",12345,"AB;xy;15",,
      "xyz",,"C4;x2;rt",,
      "foo;12,23;good",124,253
      12,5232,"xyz","ijk;5545;62"





      share|improve this answer
























        up vote
        0
        down vote













        With perl



        Assuming ; is only in third field



        $ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
        "pav",12345,"AB;xy;15",,
        "xyz",,"C4;x2;rt",,



        • "K to match " before string of interest and (?=") to match " after string of interest. But the " themselves not part of captured string as these are lookarounds


        • [^;"]*;[^"]* match any non ; or " characters followed by ; followed by non " characters


        • $&=~s|([^;]2)[^;]+|$1|gr to perform another substitution on the matched string


        • e modifier allows to use Perl code in substitution section



        To restrict only for 3rd field



        $ cat ip.txt 
        "pav",12345,"ABCD,EF;xyz23;15rtg",,
        "xyz",,"C4DEF;x23yu;rtg",,
        "foo;12,23;good",124,253
        12,5232,"xyz","ijk;5545;62"

        $ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
        "pav",12345,"AB;xy;15",,
        "xyz",,"C4;x2;rt",,
        "foo;12,23;good",124,253
        12,5232,"xyz","ijk;5545;62"





        share|improve this answer






















          up vote
          0
          down vote










          up vote
          0
          down vote









          With perl



          Assuming ; is only in third field



          $ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
          "pav",12345,"AB;xy;15",,
          "xyz",,"C4;x2;rt",,



          • "K to match " before string of interest and (?=") to match " after string of interest. But the " themselves not part of captured string as these are lookarounds


          • [^;"]*;[^"]* match any non ; or " characters followed by ; followed by non " characters


          • $&=~s|([^;]2)[^;]+|$1|gr to perform another substitution on the matched string


          • e modifier allows to use Perl code in substitution section



          To restrict only for 3rd field



          $ cat ip.txt 
          "pav",12345,"ABCD,EF;xyz23;15rtg",,
          "xyz",,"C4DEF;x23yu;rtg",,
          "foo;12,23;good",124,253
          12,5232,"xyz","ijk;5545;62"

          $ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
          "pav",12345,"AB;xy;15",,
          "xyz",,"C4;x2;rt",,
          "foo;12,23;good",124,253
          12,5232,"xyz","ijk;5545;62"





          share|improve this answer












          With perl



          Assuming ; is only in third field



          $ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
          "pav",12345,"AB;xy;15",,
          "xyz",,"C4;x2;rt",,



          • "K to match " before string of interest and (?=") to match " after string of interest. But the " themselves not part of captured string as these are lookarounds


          • [^;"]*;[^"]* match any non ; or " characters followed by ; followed by non " characters


          • $&=~s|([^;]2)[^;]+|$1|gr to perform another substitution on the matched string


          • e modifier allows to use Perl code in substitution section



          To restrict only for 3rd field



          $ cat ip.txt 
          "pav",12345,"ABCD,EF;xyz23;15rtg",,
          "xyz",,"C4DEF;x23yu;rtg",,
          "foo;12,23;good",124,253
          12,5232,"xyz","ijk;5545;62"

          $ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
          "pav",12345,"AB;xy;15",,
          "xyz",,"C4;x2;rt",,
          "foo;12,23;good",124,253
          12,5232,"xyz","ijk;5545;62"






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Oct 22 '17 at 8:46









          Sundeep

          6,9611826




          6,9611826




















              up vote
              0
              down vote













              Accurate and robust Python 3.x solution (based on csv.reader object):



              parse_csv.py script:



              import csv, sys
              with open(sys.argv[1]) as f:
              reader = csv.reader(f)
              for l in reader:
              l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
              print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))



              Usage:



              python3 parse_csv.py Test.csv


              The output:



              "pav",12345,"AB;xy;15",,
              "xyz",,"C4;x2;rt",,



              Python's csv module provides robust and flexible support for csv data.






              share|improve this answer


























                up vote
                0
                down vote













                Accurate and robust Python 3.x solution (based on csv.reader object):



                parse_csv.py script:



                import csv, sys
                with open(sys.argv[1]) as f:
                reader = csv.reader(f)
                for l in reader:
                l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
                print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))



                Usage:



                python3 parse_csv.py Test.csv


                The output:



                "pav",12345,"AB;xy;15",,
                "xyz",,"C4;x2;rt",,



                Python's csv module provides robust and flexible support for csv data.






                share|improve this answer
























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Accurate and robust Python 3.x solution (based on csv.reader object):



                  parse_csv.py script:



                  import csv, sys
                  with open(sys.argv[1]) as f:
                  reader = csv.reader(f)
                  for l in reader:
                  l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
                  print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))



                  Usage:



                  python3 parse_csv.py Test.csv


                  The output:



                  "pav",12345,"AB;xy;15",,
                  "xyz",,"C4;x2;rt",,



                  Python's csv module provides robust and flexible support for csv data.






                  share|improve this answer














                  Accurate and robust Python 3.x solution (based on csv.reader object):



                  parse_csv.py script:



                  import csv, sys
                  with open(sys.argv[1]) as f:
                  reader = csv.reader(f)
                  for l in reader:
                  l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
                  print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))



                  Usage:



                  python3 parse_csv.py Test.csv


                  The output:



                  "pav",12345,"AB;xy;15",,
                  "xyz",,"C4;x2;rt",,



                  Python's csv module provides robust and flexible support for csv data.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Oct 22 '17 at 10:32

























                  answered Oct 22 '17 at 10:13









                  RomanPerekhrest

                  22.5k12145




                  22.5k12145




















                      up vote
                      0
                      down vote













                      Complex GNU AWK solution (parsing csv data):



                      awk -v FPAT='"[^"]+"|[^",]+|,,' ' 
                      for (i=1;i<=NF;i++)
                      if ($i~/^".*;./)
                      len=split($i,a,";"); v=substr(a[1],1,3);
                      for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
                      v=v"42"
                      printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
                      (i==NF? ORS:OFS); v=""

                      ' OFS=',' Test.csv


                      • FPAT='"[^"]+"|[^",]+|,,' - complex regex pattern defining field value


                      • if ($i~/^".*;./) ... - if the current field $i contains ; character(s)


                      • len=split($i,a,";") - split the field value $i into array a by separator ;. len is assigned with number of elements/chunks created


                      • v=substr(a[1],1,3); - capturing the first chunk of the needed length including leading " char, for ex. "AB will be extracted from "ABCD,EF


                      • for (j=2;j<=len;j++) ... - iterating through remaining chunks/items


                      • v=v"42" - add trailing double quote " to the processed sequence v. 43 is ASCII octal code representing the double quote char ".


                      • ($i~/^,,/? (i==NF? ",":""):$i ) - each empty field ,, is recreated with single comma , and common delimiter (also ,). This is to avoid redundant comma cluttering like "pav",,,


                      • (i==NF? ORS:OFS) - on encountering the last field i==NF - print output record separator ORS, otherwise - print output filed separator OFS



                      The output:



                      "pav",12345,"AB;xy;15",,
                      "xyz",,"C4;x2;rt",,





                      share|improve this answer






















                      • Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
                        – Pavan
                        Oct 22 '17 at 8:25










                      • @Pavan, welcome, see my explanation
                        – RomanPerekhrest
                        Oct 22 '17 at 10:30














                      up vote
                      0
                      down vote













                      Complex GNU AWK solution (parsing csv data):



                      awk -v FPAT='"[^"]+"|[^",]+|,,' ' 
                      for (i=1;i<=NF;i++)
                      if ($i~/^".*;./)
                      len=split($i,a,";"); v=substr(a[1],1,3);
                      for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
                      v=v"42"
                      printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
                      (i==NF? ORS:OFS); v=""

                      ' OFS=',' Test.csv


                      • FPAT='"[^"]+"|[^",]+|,,' - complex regex pattern defining field value


                      • if ($i~/^".*;./) ... - if the current field $i contains ; character(s)


                      • len=split($i,a,";") - split the field value $i into array a by separator ;. len is assigned with number of elements/chunks created


                      • v=substr(a[1],1,3); - capturing the first chunk of the needed length including leading " char, for ex. "AB will be extracted from "ABCD,EF


                      • for (j=2;j<=len;j++) ... - iterating through remaining chunks/items


                      • v=v"42" - add trailing double quote " to the processed sequence v. 43 is ASCII octal code representing the double quote char ".


                      • ($i~/^,,/? (i==NF? ",":""):$i ) - each empty field ,, is recreated with single comma , and common delimiter (also ,). This is to avoid redundant comma cluttering like "pav",,,


                      • (i==NF? ORS:OFS) - on encountering the last field i==NF - print output record separator ORS, otherwise - print output filed separator OFS



                      The output:



                      "pav",12345,"AB;xy;15",,
                      "xyz",,"C4;x2;rt",,





                      share|improve this answer






















                      • Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
                        – Pavan
                        Oct 22 '17 at 8:25










                      • @Pavan, welcome, see my explanation
                        – RomanPerekhrest
                        Oct 22 '17 at 10:30












                      up vote
                      0
                      down vote










                      up vote
                      0
                      down vote









                      Complex GNU AWK solution (parsing csv data):



                      awk -v FPAT='"[^"]+"|[^",]+|,,' ' 
                      for (i=1;i<=NF;i++)
                      if ($i~/^".*;./)
                      len=split($i,a,";"); v=substr(a[1],1,3);
                      for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
                      v=v"42"
                      printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
                      (i==NF? ORS:OFS); v=""

                      ' OFS=',' Test.csv


                      • FPAT='"[^"]+"|[^",]+|,,' - complex regex pattern defining field value


                      • if ($i~/^".*;./) ... - if the current field $i contains ; character(s)


                      • len=split($i,a,";") - split the field value $i into array a by separator ;. len is assigned with number of elements/chunks created


                      • v=substr(a[1],1,3); - capturing the first chunk of the needed length including leading " char, for ex. "AB will be extracted from "ABCD,EF


                      • for (j=2;j<=len;j++) ... - iterating through remaining chunks/items


                      • v=v"42" - add trailing double quote " to the processed sequence v. 43 is ASCII octal code representing the double quote char ".


                      • ($i~/^,,/? (i==NF? ",":""):$i ) - each empty field ,, is recreated with single comma , and common delimiter (also ,). This is to avoid redundant comma cluttering like "pav",,,


                      • (i==NF? ORS:OFS) - on encountering the last field i==NF - print output record separator ORS, otherwise - print output filed separator OFS



                      The output:



                      "pav",12345,"AB;xy;15",,
                      "xyz",,"C4;x2;rt",,





                      share|improve this answer














                      Complex GNU AWK solution (parsing csv data):



                      awk -v FPAT='"[^"]+"|[^",]+|,,' ' 
                      for (i=1;i<=NF;i++)
                      if ($i~/^".*;./)
                      len=split($i,a,";"); v=substr(a[1],1,3);
                      for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
                      v=v"42"
                      printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
                      (i==NF? ORS:OFS); v=""

                      ' OFS=',' Test.csv


                      • FPAT='"[^"]+"|[^",]+|,,' - complex regex pattern defining field value


                      • if ($i~/^".*;./) ... - if the current field $i contains ; character(s)


                      • len=split($i,a,";") - split the field value $i into array a by separator ;. len is assigned with number of elements/chunks created


                      • v=substr(a[1],1,3); - capturing the first chunk of the needed length including leading " char, for ex. "AB will be extracted from "ABCD,EF


                      • for (j=2;j<=len;j++) ... - iterating through remaining chunks/items


                      • v=v"42" - add trailing double quote " to the processed sequence v. 43 is ASCII octal code representing the double quote char ".


                      • ($i~/^,,/? (i==NF? ",":""):$i ) - each empty field ,, is recreated with single comma , and common delimiter (also ,). This is to avoid redundant comma cluttering like "pav",,,


                      • (i==NF? ORS:OFS) - on encountering the last field i==NF - print output record separator ORS, otherwise - print output filed separator OFS



                      The output:



                      "pav",12345,"AB;xy;15",,
                      "xyz",,"C4;x2;rt",,






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Oct 22 '17 at 10:38

























                      answered Oct 22 '17 at 7:45









                      RomanPerekhrest

                      22.5k12145




                      22.5k12145











                      • Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
                        – Pavan
                        Oct 22 '17 at 8:25










                      • @Pavan, welcome, see my explanation
                        – RomanPerekhrest
                        Oct 22 '17 at 10:30
















                      • Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
                        – Pavan
                        Oct 22 '17 at 8:25










                      • @Pavan, welcome, see my explanation
                        – RomanPerekhrest
                        Oct 22 '17 at 10:30















                      Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
                      – Pavan
                      Oct 22 '17 at 8:25




                      Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
                      – Pavan
                      Oct 22 '17 at 8:25












                      @Pavan, welcome, see my explanation
                      – RomanPerekhrest
                      Oct 22 '17 at 10:30




                      @Pavan, welcome, see my explanation
                      – RomanPerekhrest
                      Oct 22 '17 at 10:30

















                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399655%2fcut-the-substrings-to-a-specific-length-in-a-csv-file%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Bahrain

                      Postfix configuration issue with fips on centos 7; mailgun relay