Field separator part of a column - incorrect parsing unix

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-2
down vote

favorite












I want to do a check the number of columns in a CSV file before processing it. The problem is that the delimiter (comma) also occurs in the text of some fields and because of that I cannot parse it correctly and I receive too many columns.



Eg:



~new file: 12345~,~125.5~,,,~ example (45), case (20)~,,


7 columns



  1. ~new file: 12345~

  2. ~125.5~

  3. empty

  4. empty

  5. ~ example (45), case (20)~

  6. empty

  7. empty

The problem is the comma inside ~example (45), case (20)~ in 5th column.



I tried to replace delimiter , with ; using sed but I had to do more than one iteration.



I would like a general rule that will match multiple cases with a more optimal approach.










share|improve this question









New contributor




Mathew Linton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • How do you know which commas are part of the data and which are field separators? Proper CSV uses double quotes to surround such fields.
    – roaima
    1 hour ago











  • This is a txt file that is extracted from an application and the separteur when the extract was done was set to be comma.
    – Mathew Linton
    57 mins ago










  • @MathewLinton. Thank you Mat, Are columns 3 4 6 7 empty in all the rows in your data? if not and I assume NOT, then the command column is more than enough. If YES then you don't have seven columns and the command column is more than enough ;-)
    – Goro
    24 mins ago










  • It looks like the ~ character is the quoting character (so ~hello, world~ is one field). Is that correct?
    – roaima
    4 mins ago










  • @Goro.In your example you remove all ','. In column 5th I want ',' since is the column value ~ example (45), case (20)~ and I don't want to alter the data.
    – Mathew Linton
    1 min ago














up vote
-2
down vote

favorite












I want to do a check the number of columns in a CSV file before processing it. The problem is that the delimiter (comma) also occurs in the text of some fields and because of that I cannot parse it correctly and I receive too many columns.



Eg:



~new file: 12345~,~125.5~,,,~ example (45), case (20)~,,


7 columns



  1. ~new file: 12345~

  2. ~125.5~

  3. empty

  4. empty

  5. ~ example (45), case (20)~

  6. empty

  7. empty

The problem is the comma inside ~example (45), case (20)~ in 5th column.



I tried to replace delimiter , with ; using sed but I had to do more than one iteration.



I would like a general rule that will match multiple cases with a more optimal approach.










share|improve this question









New contributor




Mathew Linton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • How do you know which commas are part of the data and which are field separators? Proper CSV uses double quotes to surround such fields.
    – roaima
    1 hour ago











  • This is a txt file that is extracted from an application and the separteur when the extract was done was set to be comma.
    – Mathew Linton
    57 mins ago










  • @MathewLinton. Thank you Mat, Are columns 3 4 6 7 empty in all the rows in your data? if not and I assume NOT, then the command column is more than enough. If YES then you don't have seven columns and the command column is more than enough ;-)
    – Goro
    24 mins ago










  • It looks like the ~ character is the quoting character (so ~hello, world~ is one field). Is that correct?
    – roaima
    4 mins ago










  • @Goro.In your example you remove all ','. In column 5th I want ',' since is the column value ~ example (45), case (20)~ and I don't want to alter the data.
    – Mathew Linton
    1 min ago












up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











I want to do a check the number of columns in a CSV file before processing it. The problem is that the delimiter (comma) also occurs in the text of some fields and because of that I cannot parse it correctly and I receive too many columns.



Eg:



~new file: 12345~,~125.5~,,,~ example (45), case (20)~,,


7 columns



  1. ~new file: 12345~

  2. ~125.5~

  3. empty

  4. empty

  5. ~ example (45), case (20)~

  6. empty

  7. empty

The problem is the comma inside ~example (45), case (20)~ in 5th column.



I tried to replace delimiter , with ; using sed but I had to do more than one iteration.



I would like a general rule that will match multiple cases with a more optimal approach.










share|improve this question









New contributor




Mathew Linton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I want to do a check the number of columns in a CSV file before processing it. The problem is that the delimiter (comma) also occurs in the text of some fields and because of that I cannot parse it correctly and I receive too many columns.



Eg:



~new file: 12345~,~125.5~,,,~ example (45), case (20)~,,


7 columns



  1. ~new file: 12345~

  2. ~125.5~

  3. empty

  4. empty

  5. ~ example (45), case (20)~

  6. empty

  7. empty

The problem is the comma inside ~example (45), case (20)~ in 5th column.



I tried to replace delimiter , with ; using sed but I had to do more than one iteration.



I would like a general rule that will match multiple cases with a more optimal approach.







text-processing awk sed csv






share|improve this question









New contributor




Mathew Linton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Mathew Linton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 9 mins ago









Kusalananda

109k14211334




109k14211334






New contributor




Mathew Linton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 1 hour ago









Mathew Linton

62




62




New contributor




Mathew Linton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Mathew Linton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Mathew Linton is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • How do you know which commas are part of the data and which are field separators? Proper CSV uses double quotes to surround such fields.
    – roaima
    1 hour ago











  • This is a txt file that is extracted from an application and the separteur when the extract was done was set to be comma.
    – Mathew Linton
    57 mins ago










  • @MathewLinton. Thank you Mat, Are columns 3 4 6 7 empty in all the rows in your data? if not and I assume NOT, then the command column is more than enough. If YES then you don't have seven columns and the command column is more than enough ;-)
    – Goro
    24 mins ago










  • It looks like the ~ character is the quoting character (so ~hello, world~ is one field). Is that correct?
    – roaima
    4 mins ago










  • @Goro.In your example you remove all ','. In column 5th I want ',' since is the column value ~ example (45), case (20)~ and I don't want to alter the data.
    – Mathew Linton
    1 min ago
















  • How do you know which commas are part of the data and which are field separators? Proper CSV uses double quotes to surround such fields.
    – roaima
    1 hour ago











  • This is a txt file that is extracted from an application and the separteur when the extract was done was set to be comma.
    – Mathew Linton
    57 mins ago










  • @MathewLinton. Thank you Mat, Are columns 3 4 6 7 empty in all the rows in your data? if not and I assume NOT, then the command column is more than enough. If YES then you don't have seven columns and the command column is more than enough ;-)
    – Goro
    24 mins ago










  • It looks like the ~ character is the quoting character (so ~hello, world~ is one field). Is that correct?
    – roaima
    4 mins ago










  • @Goro.In your example you remove all ','. In column 5th I want ',' since is the column value ~ example (45), case (20)~ and I don't want to alter the data.
    – Mathew Linton
    1 min ago















How do you know which commas are part of the data and which are field separators? Proper CSV uses double quotes to surround such fields.
– roaima
1 hour ago





How do you know which commas are part of the data and which are field separators? Proper CSV uses double quotes to surround such fields.
– roaima
1 hour ago













This is a txt file that is extracted from an application and the separteur when the extract was done was set to be comma.
– Mathew Linton
57 mins ago




This is a txt file that is extracted from an application and the separteur when the extract was done was set to be comma.
– Mathew Linton
57 mins ago












@MathewLinton. Thank you Mat, Are columns 3 4 6 7 empty in all the rows in your data? if not and I assume NOT, then the command column is more than enough. If YES then you don't have seven columns and the command column is more than enough ;-)
– Goro
24 mins ago




@MathewLinton. Thank you Mat, Are columns 3 4 6 7 empty in all the rows in your data? if not and I assume NOT, then the command column is more than enough. If YES then you don't have seven columns and the command column is more than enough ;-)
– Goro
24 mins ago












It looks like the ~ character is the quoting character (so ~hello, world~ is one field). Is that correct?
– roaima
4 mins ago




It looks like the ~ character is the quoting character (so ~hello, world~ is one field). Is that correct?
– roaima
4 mins ago












@Goro.In your example you remove all ','. In column 5th I want ',' since is the column value ~ example (45), case (20)~ and I don't want to alter the data.
– Mathew Linton
1 min ago




@Goro.In your example you remove all ','. In column 5th I want ',' since is the column value ~ example (45), case (20)~ and I don't want to alter the data.
– Mathew Linton
1 min ago










3 Answers
3






active

oldest

votes

















up vote
1
down vote













I assume , is the columns delimiter, I would just run the command column:



echo "~new file: 12345~,~125.5~,,,~ example (45), case (20)~,," | column -t -s','


or



column -t -s',' file


output:



~new file: 12345~ ~125.5~ ~ example (45) case (20)~ 





share|improve this answer




















  • I think you misunderstood the question. Here ~ example (45), case (20)~ is a single column but column is splitting in into two columns. I see the format of values in each column is ,~...~, where comma also can be within a filed like in ,~something with comma, and rest~, for non empty fields. it's the ~ inplace of quote "
    – sddgob
    1 hour ago











  • No, we have 7 columns. This is an example of the output that can be used: ~new file: 12345~;~125.5~;;;~ example (45), case(20)~;;
    – Mathew Linton
    56 mins ago











  • I amended the question with each column value
    – Mathew Linton
    42 mins ago

















up vote
1
down vote













Using awk you would do:



awk -F, ' gsub(/~[^~]*~/,""); print NF ' infile


for an input like:



~new file: 12345~,~125.5~,,,~ example (45), case (20)~,,
,~125.5~,,,~ example (45), case (20)~


It will return:



7
5


In gsub(/~[^~]*~/,""), we are replacing every pattern started from a ~ till the next ~ seen (like ~...~) with empty string; see below:



awk -F, ' gsub(/~[^~]*~/,""); print $0 ' infile
,,,,,,
,,,,


This assume that there is no inner ~ like ,~some~thing~, in your input.



then print NF will print the number of fields according to the specified filed separator -F .






share|improve this answer





























    up vote
    1
    down vote













    This looks like a CSV file that is using comma as field delimiters and tilde as quoting character.



    Using a proper CSV parser, like the one provided by the Text::CSV Perl module:



    perl -MText::CSV -e 'print scalar(@Text::CSV->new(quote_char=>"~")->getline(*STDIN))' <file.csv


    This would read the first line of the CSV file file.csv and print the number of columns in it. We instantiate a parser that understands that the quote character is a tilde before reading the first line with this parser. The getline() method on this parser would read a line from the given filehandle and return a reference to an array of data, one item per parsed column. The print scalar(...) is a fairly common way to print the length of an array in Perl.



    Another way, using the CSVKit command line CSV parser toolkit:



    csvstat -n -q '~' <file.csv | wc -l


    or equivalently, using long options,



    csvstat --names --quotechar '~' <file.csv | wc -l


    This would likewise read the first line of the input file and return a listing of the headers (the first line of a CSV file usually contains column headers), one per line. The wc -l counts the number of lines returned.




    When you later parse the CSV file, I suggest that you use one of these approaches, or look for a proper parser in the programming language that you are most used to. awk and sed may be used on simple CSV data, but in this case your data is using some of the CSV format features that these tools would have difficulties to cope with without taking great care.






    share|improve this answer






















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );






      Mathew Linton is a new contributor. Be nice, and check out our Code of Conduct.









       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f474609%2ffield-separator-part-of-a-column-incorrect-parsing-unix%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote













      I assume , is the columns delimiter, I would just run the command column:



      echo "~new file: 12345~,~125.5~,,,~ example (45), case (20)~,," | column -t -s','


      or



      column -t -s',' file


      output:



      ~new file: 12345~ ~125.5~ ~ example (45) case (20)~ 





      share|improve this answer




















      • I think you misunderstood the question. Here ~ example (45), case (20)~ is a single column but column is splitting in into two columns. I see the format of values in each column is ,~...~, where comma also can be within a filed like in ,~something with comma, and rest~, for non empty fields. it's the ~ inplace of quote "
        – sddgob
        1 hour ago











      • No, we have 7 columns. This is an example of the output that can be used: ~new file: 12345~;~125.5~;;;~ example (45), case(20)~;;
        – Mathew Linton
        56 mins ago











      • I amended the question with each column value
        – Mathew Linton
        42 mins ago














      up vote
      1
      down vote













      I assume , is the columns delimiter, I would just run the command column:



      echo "~new file: 12345~,~125.5~,,,~ example (45), case (20)~,," | column -t -s','


      or



      column -t -s',' file


      output:



      ~new file: 12345~ ~125.5~ ~ example (45) case (20)~ 





      share|improve this answer




















      • I think you misunderstood the question. Here ~ example (45), case (20)~ is a single column but column is splitting in into two columns. I see the format of values in each column is ,~...~, where comma also can be within a filed like in ,~something with comma, and rest~, for non empty fields. it's the ~ inplace of quote "
        – sddgob
        1 hour ago











      • No, we have 7 columns. This is an example of the output that can be used: ~new file: 12345~;~125.5~;;;~ example (45), case(20)~;;
        – Mathew Linton
        56 mins ago











      • I amended the question with each column value
        – Mathew Linton
        42 mins ago












      up vote
      1
      down vote










      up vote
      1
      down vote









      I assume , is the columns delimiter, I would just run the command column:



      echo "~new file: 12345~,~125.5~,,,~ example (45), case (20)~,," | column -t -s','


      or



      column -t -s',' file


      output:



      ~new file: 12345~ ~125.5~ ~ example (45) case (20)~ 





      share|improve this answer












      I assume , is the columns delimiter, I would just run the command column:



      echo "~new file: 12345~,~125.5~,,,~ example (45), case (20)~,," | column -t -s','


      or



      column -t -s',' file


      output:



      ~new file: 12345~ ~125.5~ ~ example (45) case (20)~ 






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered 1 hour ago









      Goro

      8,24354182




      8,24354182











      • I think you misunderstood the question. Here ~ example (45), case (20)~ is a single column but column is splitting in into two columns. I see the format of values in each column is ,~...~, where comma also can be within a filed like in ,~something with comma, and rest~, for non empty fields. it's the ~ inplace of quote "
        – sddgob
        1 hour ago











      • No, we have 7 columns. This is an example of the output that can be used: ~new file: 12345~;~125.5~;;;~ example (45), case(20)~;;
        – Mathew Linton
        56 mins ago











      • I amended the question with each column value
        – Mathew Linton
        42 mins ago
















      • I think you misunderstood the question. Here ~ example (45), case (20)~ is a single column but column is splitting in into two columns. I see the format of values in each column is ,~...~, where comma also can be within a filed like in ,~something with comma, and rest~, for non empty fields. it's the ~ inplace of quote "
        – sddgob
        1 hour ago











      • No, we have 7 columns. This is an example of the output that can be used: ~new file: 12345~;~125.5~;;;~ example (45), case(20)~;;
        – Mathew Linton
        56 mins ago











      • I amended the question with each column value
        – Mathew Linton
        42 mins ago















      I think you misunderstood the question. Here ~ example (45), case (20)~ is a single column but column is splitting in into two columns. I see the format of values in each column is ,~...~, where comma also can be within a filed like in ,~something with comma, and rest~, for non empty fields. it's the ~ inplace of quote "
      – sddgob
      1 hour ago





      I think you misunderstood the question. Here ~ example (45), case (20)~ is a single column but column is splitting in into two columns. I see the format of values in each column is ,~...~, where comma also can be within a filed like in ,~something with comma, and rest~, for non empty fields. it's the ~ inplace of quote "
      – sddgob
      1 hour ago













      No, we have 7 columns. This is an example of the output that can be used: ~new file: 12345~;~125.5~;;;~ example (45), case(20)~;;
      – Mathew Linton
      56 mins ago





      No, we have 7 columns. This is an example of the output that can be used: ~new file: 12345~;~125.5~;;;~ example (45), case(20)~;;
      – Mathew Linton
      56 mins ago













      I amended the question with each column value
      – Mathew Linton
      42 mins ago




      I amended the question with each column value
      – Mathew Linton
      42 mins ago












      up vote
      1
      down vote













      Using awk you would do:



      awk -F, ' gsub(/~[^~]*~/,""); print NF ' infile


      for an input like:



      ~new file: 12345~,~125.5~,,,~ example (45), case (20)~,,
      ,~125.5~,,,~ example (45), case (20)~


      It will return:



      7
      5


      In gsub(/~[^~]*~/,""), we are replacing every pattern started from a ~ till the next ~ seen (like ~...~) with empty string; see below:



      awk -F, ' gsub(/~[^~]*~/,""); print $0 ' infile
      ,,,,,,
      ,,,,


      This assume that there is no inner ~ like ,~some~thing~, in your input.



      then print NF will print the number of fields according to the specified filed separator -F .






      share|improve this answer


























        up vote
        1
        down vote













        Using awk you would do:



        awk -F, ' gsub(/~[^~]*~/,""); print NF ' infile


        for an input like:



        ~new file: 12345~,~125.5~,,,~ example (45), case (20)~,,
        ,~125.5~,,,~ example (45), case (20)~


        It will return:



        7
        5


        In gsub(/~[^~]*~/,""), we are replacing every pattern started from a ~ till the next ~ seen (like ~...~) with empty string; see below:



        awk -F, ' gsub(/~[^~]*~/,""); print $0 ' infile
        ,,,,,,
        ,,,,


        This assume that there is no inner ~ like ,~some~thing~, in your input.



        then print NF will print the number of fields according to the specified filed separator -F .






        share|improve this answer
























          up vote
          1
          down vote










          up vote
          1
          down vote









          Using awk you would do:



          awk -F, ' gsub(/~[^~]*~/,""); print NF ' infile


          for an input like:



          ~new file: 12345~,~125.5~,,,~ example (45), case (20)~,,
          ,~125.5~,,,~ example (45), case (20)~


          It will return:



          7
          5


          In gsub(/~[^~]*~/,""), we are replacing every pattern started from a ~ till the next ~ seen (like ~...~) with empty string; see below:



          awk -F, ' gsub(/~[^~]*~/,""); print $0 ' infile
          ,,,,,,
          ,,,,


          This assume that there is no inner ~ like ,~some~thing~, in your input.



          then print NF will print the number of fields according to the specified filed separator -F .






          share|improve this answer














          Using awk you would do:



          awk -F, ' gsub(/~[^~]*~/,""); print NF ' infile


          for an input like:



          ~new file: 12345~,~125.5~,,,~ example (45), case (20)~,,
          ,~125.5~,,,~ example (45), case (20)~


          It will return:



          7
          5


          In gsub(/~[^~]*~/,""), we are replacing every pattern started from a ~ till the next ~ seen (like ~...~) with empty string; see below:



          awk -F, ' gsub(/~[^~]*~/,""); print $0 ' infile
          ,,,,,,
          ,,,,


          This assume that there is no inner ~ like ,~some~thing~, in your input.



          then print NF will print the number of fields according to the specified filed separator -F .







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 33 mins ago


























          community wiki





          3 revs
          αғsнιη





















              up vote
              1
              down vote













              This looks like a CSV file that is using comma as field delimiters and tilde as quoting character.



              Using a proper CSV parser, like the one provided by the Text::CSV Perl module:



              perl -MText::CSV -e 'print scalar(@Text::CSV->new(quote_char=>"~")->getline(*STDIN))' <file.csv


              This would read the first line of the CSV file file.csv and print the number of columns in it. We instantiate a parser that understands that the quote character is a tilde before reading the first line with this parser. The getline() method on this parser would read a line from the given filehandle and return a reference to an array of data, one item per parsed column. The print scalar(...) is a fairly common way to print the length of an array in Perl.



              Another way, using the CSVKit command line CSV parser toolkit:



              csvstat -n -q '~' <file.csv | wc -l


              or equivalently, using long options,



              csvstat --names --quotechar '~' <file.csv | wc -l


              This would likewise read the first line of the input file and return a listing of the headers (the first line of a CSV file usually contains column headers), one per line. The wc -l counts the number of lines returned.




              When you later parse the CSV file, I suggest that you use one of these approaches, or look for a proper parser in the programming language that you are most used to. awk and sed may be used on simple CSV data, but in this case your data is using some of the CSV format features that these tools would have difficulties to cope with without taking great care.






              share|improve this answer


























                up vote
                1
                down vote













                This looks like a CSV file that is using comma as field delimiters and tilde as quoting character.



                Using a proper CSV parser, like the one provided by the Text::CSV Perl module:



                perl -MText::CSV -e 'print scalar(@Text::CSV->new(quote_char=>"~")->getline(*STDIN))' <file.csv


                This would read the first line of the CSV file file.csv and print the number of columns in it. We instantiate a parser that understands that the quote character is a tilde before reading the first line with this parser. The getline() method on this parser would read a line from the given filehandle and return a reference to an array of data, one item per parsed column. The print scalar(...) is a fairly common way to print the length of an array in Perl.



                Another way, using the CSVKit command line CSV parser toolkit:



                csvstat -n -q '~' <file.csv | wc -l


                or equivalently, using long options,



                csvstat --names --quotechar '~' <file.csv | wc -l


                This would likewise read the first line of the input file and return a listing of the headers (the first line of a CSV file usually contains column headers), one per line. The wc -l counts the number of lines returned.




                When you later parse the CSV file, I suggest that you use one of these approaches, or look for a proper parser in the programming language that you are most used to. awk and sed may be used on simple CSV data, but in this case your data is using some of the CSV format features that these tools would have difficulties to cope with without taking great care.






                share|improve this answer
























                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  This looks like a CSV file that is using comma as field delimiters and tilde as quoting character.



                  Using a proper CSV parser, like the one provided by the Text::CSV Perl module:



                  perl -MText::CSV -e 'print scalar(@Text::CSV->new(quote_char=>"~")->getline(*STDIN))' <file.csv


                  This would read the first line of the CSV file file.csv and print the number of columns in it. We instantiate a parser that understands that the quote character is a tilde before reading the first line with this parser. The getline() method on this parser would read a line from the given filehandle and return a reference to an array of data, one item per parsed column. The print scalar(...) is a fairly common way to print the length of an array in Perl.



                  Another way, using the CSVKit command line CSV parser toolkit:



                  csvstat -n -q '~' <file.csv | wc -l


                  or equivalently, using long options,



                  csvstat --names --quotechar '~' <file.csv | wc -l


                  This would likewise read the first line of the input file and return a listing of the headers (the first line of a CSV file usually contains column headers), one per line. The wc -l counts the number of lines returned.




                  When you later parse the CSV file, I suggest that you use one of these approaches, or look for a proper parser in the programming language that you are most used to. awk and sed may be used on simple CSV data, but in this case your data is using some of the CSV format features that these tools would have difficulties to cope with without taking great care.






                  share|improve this answer














                  This looks like a CSV file that is using comma as field delimiters and tilde as quoting character.



                  Using a proper CSV parser, like the one provided by the Text::CSV Perl module:



                  perl -MText::CSV -e 'print scalar(@Text::CSV->new(quote_char=>"~")->getline(*STDIN))' <file.csv


                  This would read the first line of the CSV file file.csv and print the number of columns in it. We instantiate a parser that understands that the quote character is a tilde before reading the first line with this parser. The getline() method on this parser would read a line from the given filehandle and return a reference to an array of data, one item per parsed column. The print scalar(...) is a fairly common way to print the length of an array in Perl.



                  Another way, using the CSVKit command line CSV parser toolkit:



                  csvstat -n -q '~' <file.csv | wc -l


                  or equivalently, using long options,



                  csvstat --names --quotechar '~' <file.csv | wc -l


                  This would likewise read the first line of the input file and return a listing of the headers (the first line of a CSV file usually contains column headers), one per line. The wc -l counts the number of lines returned.




                  When you later parse the CSV file, I suggest that you use one of these approaches, or look for a proper parser in the programming language that you are most used to. awk and sed may be used on simple CSV data, but in this case your data is using some of the CSV format features that these tools would have difficulties to cope with without taking great care.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 5 mins ago

























                  answered 19 mins ago









                  Kusalananda

                  109k14211334




                  109k14211334




















                      Mathew Linton is a new contributor. Be nice, and check out our Code of Conduct.









                       

                      draft saved


                      draft discarded


















                      Mathew Linton is a new contributor. Be nice, and check out our Code of Conduct.












                      Mathew Linton is a new contributor. Be nice, and check out our Code of Conduct.











                      Mathew Linton is a new contributor. Be nice, and check out our Code of Conduct.













                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f474609%2ffield-separator-part-of-a-column-incorrect-parsing-unix%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      Peggy Mitchell

                      Palaiologos

                      The Forum (Inglewood, California)