Extracting columns from a text file with no delimiters

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite
1












I have a large text file which is basically a stream of data all pretty much compressed together for each row. I've been asked to look into the failure of certain data in some columns. The data is not delimited in any way. I do however have a list of "column" lengths and comments on whether there's relevant data in each "column".



I'd use Excel, but the limit of Excel to delimit by columns is restricted to 1000 characters per row, and each row goes well beyond this. A number of these fields have strings of 30 spaces that act as filler and there's at least a good 15 or so of these... I'm hoping to parse these designated "empty" fields out.



What I need is a way that I can feed my file in and with an array that I can provide which has the column lengths and maybe a marker like "X" to ignore the respective columns I want to ignore, have it spit out a new file with delimiters, which I can then feed back into Excel for analysis.



For example if I had a file with a row like aaaaaabbbbbccccdddddeeeffffff and I feed this file in with an array of [6 5 4X 5 3X 6] it would spit out a file with aaaaaa^bbbbb^ddddd^ffffff in that row.



Is there a way this can be done with grep, awk or sed?



Thanks in advance.







share|improve this question






















  • do you want ^ to be the exact delimiter in resulting rows?
    – RomanPerekhrest
    Oct 20 '17 at 7:34










  • It was an arbitrary character I used but that would be fine!
    – Eliseo d'Annunzio
    Oct 20 '17 at 7:36














up vote
1
down vote

favorite
1












I have a large text file which is basically a stream of data all pretty much compressed together for each row. I've been asked to look into the failure of certain data in some columns. The data is not delimited in any way. I do however have a list of "column" lengths and comments on whether there's relevant data in each "column".



I'd use Excel, but the limit of Excel to delimit by columns is restricted to 1000 characters per row, and each row goes well beyond this. A number of these fields have strings of 30 spaces that act as filler and there's at least a good 15 or so of these... I'm hoping to parse these designated "empty" fields out.



What I need is a way that I can feed my file in and with an array that I can provide which has the column lengths and maybe a marker like "X" to ignore the respective columns I want to ignore, have it spit out a new file with delimiters, which I can then feed back into Excel for analysis.



For example if I had a file with a row like aaaaaabbbbbccccdddddeeeffffff and I feed this file in with an array of [6 5 4X 5 3X 6] it would spit out a file with aaaaaa^bbbbb^ddddd^ffffff in that row.



Is there a way this can be done with grep, awk or sed?



Thanks in advance.







share|improve this question






















  • do you want ^ to be the exact delimiter in resulting rows?
    – RomanPerekhrest
    Oct 20 '17 at 7:34










  • It was an arbitrary character I used but that would be fine!
    – Eliseo d'Annunzio
    Oct 20 '17 at 7:36












up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





I have a large text file which is basically a stream of data all pretty much compressed together for each row. I've been asked to look into the failure of certain data in some columns. The data is not delimited in any way. I do however have a list of "column" lengths and comments on whether there's relevant data in each "column".



I'd use Excel, but the limit of Excel to delimit by columns is restricted to 1000 characters per row, and each row goes well beyond this. A number of these fields have strings of 30 spaces that act as filler and there's at least a good 15 or so of these... I'm hoping to parse these designated "empty" fields out.



What I need is a way that I can feed my file in and with an array that I can provide which has the column lengths and maybe a marker like "X" to ignore the respective columns I want to ignore, have it spit out a new file with delimiters, which I can then feed back into Excel for analysis.



For example if I had a file with a row like aaaaaabbbbbccccdddddeeeffffff and I feed this file in with an array of [6 5 4X 5 3X 6] it would spit out a file with aaaaaa^bbbbb^ddddd^ffffff in that row.



Is there a way this can be done with grep, awk or sed?



Thanks in advance.







share|improve this question














I have a large text file which is basically a stream of data all pretty much compressed together for each row. I've been asked to look into the failure of certain data in some columns. The data is not delimited in any way. I do however have a list of "column" lengths and comments on whether there's relevant data in each "column".



I'd use Excel, but the limit of Excel to delimit by columns is restricted to 1000 characters per row, and each row goes well beyond this. A number of these fields have strings of 30 spaces that act as filler and there's at least a good 15 or so of these... I'm hoping to parse these designated "empty" fields out.



What I need is a way that I can feed my file in and with an array that I can provide which has the column lengths and maybe a marker like "X" to ignore the respective columns I want to ignore, have it spit out a new file with delimiters, which I can then feed back into Excel for analysis.



For example if I had a file with a row like aaaaaabbbbbccccdddddeeeffffff and I feed this file in with an array of [6 5 4X 5 3X 6] it would spit out a file with aaaaaa^bbbbb^ddddd^ffffff in that row.



Is there a way this can be done with grep, awk or sed?



Thanks in advance.









share|improve this question













share|improve this question




share|improve this question








edited Oct 20 '17 at 1:56

























asked Oct 20 '17 at 1:32









Eliseo d'Annunzio

1185




1185











  • do you want ^ to be the exact delimiter in resulting rows?
    – RomanPerekhrest
    Oct 20 '17 at 7:34










  • It was an arbitrary character I used but that would be fine!
    – Eliseo d'Annunzio
    Oct 20 '17 at 7:36
















  • do you want ^ to be the exact delimiter in resulting rows?
    – RomanPerekhrest
    Oct 20 '17 at 7:34










  • It was an arbitrary character I used but that would be fine!
    – Eliseo d'Annunzio
    Oct 20 '17 at 7:36















do you want ^ to be the exact delimiter in resulting rows?
– RomanPerekhrest
Oct 20 '17 at 7:34




do you want ^ to be the exact delimiter in resulting rows?
– RomanPerekhrest
Oct 20 '17 at 7:34












It was an arbitrary character I used but that would be fine!
– Eliseo d'Annunzio
Oct 20 '17 at 7:36




It was an arbitrary character I used but that would be fine!
– Eliseo d'Annunzio
Oct 20 '17 at 7:36










5 Answers
5






active

oldest

votes

















up vote
1
down vote



accepted










If you have GNU awk, you can specify explicit fieldwidths e.g.



$ printf 'aaaaaabbbbbccccdddddeeeffffffn' | 
gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
aaaaaa^bbbbb^ddddd^ffffff


Starting with version 4.2, you can skip characters using a n:m syntax e.g.



printf 'aaaaaabbbbbccccdddddeeeffffffn' |
gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
aaaaaa^bbbbb^ddddd^ffffff


(the $1=$ just forces re-evaluation of $0 with the specified fieldwidths).



See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data






share|improve this answer






















  • This is closer to what I had in mind... Thanks!
    – Eliseo d'Annunzio
    Oct 20 '17 at 7:38

















up vote
5
down vote













Short cut command approach:



Sample input.txt contents:



aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr



The job:



cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt


  • -c - to select only characters


  • 1-6,7-11,16-20,24-29 - consecutive ranges of character positions, flexibly adjustable


  • --output-delimiter=^ - output field delimiter, you can adjust it to whatever you want



The output:



aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr





share|improve this answer






















  • Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
    – agc
    Oct 20 '17 at 8:46











  • @agc, yes, forgot to edit. Thanks
    – RomanPerekhrest
    Oct 20 '17 at 8:48

















up vote
1
down vote













Hard to say without seeing your exact input and desired output, but...



sed -e "$(
printf '%dn' 6 5 4 5 3 6 |
awk '

f[NR] = f[NR-1] + $1

END
for (i=NR; i>0; i--)
printf "s/./&^/%dn", f[i]


'
)" infile.txt | cut -d^ -f1,2,4,6


Untested. No bugs, I promise. ;)




Okay, I tested. I was missing the end brace for END. No other bugs. Works perfectly on example input. Output is:



aaaaaa^bbbbb^ddddd^ffffff





share|improve this answer



























    up vote
    0
    down vote













    With sed, one could write (using _ as delimiter):



    sed "$(echo s/./&_/29,23,20,15,11,6;)"


    But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:



    sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"





    share|improve this answer



























      up vote
      0
      down vote













      Improved version of RomanPerekhrest's cut answer, with column array parser, including X suffixes to show how many columns to skip.



      Load array $n, and make a function to parse array into numbers for cut -c:



      n=(6 5 4X 5 3X 6)
      col_array() j=$(h=0;
      for f in $@; do
      g=$f/[Xx];
      i=$((h+1));
      h=$((h+g));
      [ $g = $f ] && echo -n $i-$h,
      done;) ;
      echo $j%,;


      The file input.txt contains:



      aaaaaabbbbbccccdddddeeeffffff
      wwwwwwddddd111133333xxxaaaaaa
      ffffff00000sssszzzzz000rrrrrr


      Use col_array() with cut:



      cut -c $(col_array $n[@]) --output-delimiter=^ input.txt


      Output:



      aaaaaa^bbbbb^ddddd^ffffff
      wwwwww^ddddd^33333^aaaaaa
      ffffff^00000^zzzzz^rrrrrr


      There's no strict need for an array, since col_array() parses its parameters:



      cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt


      Output:



      aaa^bbbcccc
      www^ddd1111
      fff^000ssss





      share|improve this answer






















        Your Answer







        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "106"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        convertImagesToLinks: false,
        noModals: false,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













         

        draft saved


        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399248%2fextracting-columns-from-a-text-file-with-no-delimiters%23new-answer', 'question_page');

        );

        Post as a guest






























        5 Answers
        5






        active

        oldest

        votes








        5 Answers
        5






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        1
        down vote



        accepted










        If you have GNU awk, you can specify explicit fieldwidths e.g.



        $ printf 'aaaaaabbbbbccccdddddeeeffffffn' | 
        gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
        aaaaaa^bbbbb^ddddd^ffffff


        Starting with version 4.2, you can skip characters using a n:m syntax e.g.



        printf 'aaaaaabbbbbccccdddddeeeffffffn' |
        gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
        aaaaaa^bbbbb^ddddd^ffffff


        (the $1=$ just forces re-evaluation of $0 with the specified fieldwidths).



        See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data






        share|improve this answer






















        • This is closer to what I had in mind... Thanks!
          – Eliseo d'Annunzio
          Oct 20 '17 at 7:38














        up vote
        1
        down vote



        accepted










        If you have GNU awk, you can specify explicit fieldwidths e.g.



        $ printf 'aaaaaabbbbbccccdddddeeeffffffn' | 
        gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
        aaaaaa^bbbbb^ddddd^ffffff


        Starting with version 4.2, you can skip characters using a n:m syntax e.g.



        printf 'aaaaaabbbbbccccdddddeeeffffffn' |
        gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
        aaaaaa^bbbbb^ddddd^ffffff


        (the $1=$ just forces re-evaluation of $0 with the specified fieldwidths).



        See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data






        share|improve this answer






















        • This is closer to what I had in mind... Thanks!
          – Eliseo d'Annunzio
          Oct 20 '17 at 7:38












        up vote
        1
        down vote



        accepted







        up vote
        1
        down vote



        accepted






        If you have GNU awk, you can specify explicit fieldwidths e.g.



        $ printf 'aaaaaabbbbbccccdddddeeeffffffn' | 
        gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
        aaaaaa^bbbbb^ddddd^ffffff


        Starting with version 4.2, you can skip characters using a n:m syntax e.g.



        printf 'aaaaaabbbbbccccdddddeeeffffffn' |
        gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
        aaaaaa^bbbbb^ddddd^ffffff


        (the $1=$ just forces re-evaluation of $0 with the specified fieldwidths).



        See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data






        share|improve this answer














        If you have GNU awk, you can specify explicit fieldwidths e.g.



        $ printf 'aaaaaabbbbbccccdddddeeeffffffn' | 
        gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
        aaaaaa^bbbbb^ddddd^ffffff


        Starting with version 4.2, you can skip characters using a n:m syntax e.g.



        printf 'aaaaaabbbbbccccdddddeeeffffffn' |
        gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
        aaaaaa^bbbbb^ddddd^ffffff


        (the $1=$ just forces re-evaluation of $0 with the specified fieldwidths).



        See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Oct 20 '17 at 12:07

























        answered Oct 20 '17 at 1:59









        steeldriver

        32.1k34979




        32.1k34979











        • This is closer to what I had in mind... Thanks!
          – Eliseo d'Annunzio
          Oct 20 '17 at 7:38
















        • This is closer to what I had in mind... Thanks!
          – Eliseo d'Annunzio
          Oct 20 '17 at 7:38















        This is closer to what I had in mind... Thanks!
        – Eliseo d'Annunzio
        Oct 20 '17 at 7:38




        This is closer to what I had in mind... Thanks!
        – Eliseo d'Annunzio
        Oct 20 '17 at 7:38












        up vote
        5
        down vote













        Short cut command approach:



        Sample input.txt contents:



        aaaaaabbbbbccccdddddeeeffffff
        wwwwwwddddd111133333xxxaaaaaa
        ffffff00000sssszzzzz000rrrrrr



        The job:



        cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt


        • -c - to select only characters


        • 1-6,7-11,16-20,24-29 - consecutive ranges of character positions, flexibly adjustable


        • --output-delimiter=^ - output field delimiter, you can adjust it to whatever you want



        The output:



        aaaaaa^bbbbb^ddddd^ffffff
        wwwwww^ddddd^33333^aaaaaa
        ffffff^00000^zzzzz^rrrrrr





        share|improve this answer






















        • Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
          – agc
          Oct 20 '17 at 8:46











        • @agc, yes, forgot to edit. Thanks
          – RomanPerekhrest
          Oct 20 '17 at 8:48














        up vote
        5
        down vote













        Short cut command approach:



        Sample input.txt contents:



        aaaaaabbbbbccccdddddeeeffffff
        wwwwwwddddd111133333xxxaaaaaa
        ffffff00000sssszzzzz000rrrrrr



        The job:



        cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt


        • -c - to select only characters


        • 1-6,7-11,16-20,24-29 - consecutive ranges of character positions, flexibly adjustable


        • --output-delimiter=^ - output field delimiter, you can adjust it to whatever you want



        The output:



        aaaaaa^bbbbb^ddddd^ffffff
        wwwwww^ddddd^33333^aaaaaa
        ffffff^00000^zzzzz^rrrrrr





        share|improve this answer






















        • Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
          – agc
          Oct 20 '17 at 8:46











        • @agc, yes, forgot to edit. Thanks
          – RomanPerekhrest
          Oct 20 '17 at 8:48












        up vote
        5
        down vote










        up vote
        5
        down vote









        Short cut command approach:



        Sample input.txt contents:



        aaaaaabbbbbccccdddddeeeffffff
        wwwwwwddddd111133333xxxaaaaaa
        ffffff00000sssszzzzz000rrrrrr



        The job:



        cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt


        • -c - to select only characters


        • 1-6,7-11,16-20,24-29 - consecutive ranges of character positions, flexibly adjustable


        • --output-delimiter=^ - output field delimiter, you can adjust it to whatever you want



        The output:



        aaaaaa^bbbbb^ddddd^ffffff
        wwwwww^ddddd^33333^aaaaaa
        ffffff^00000^zzzzz^rrrrrr





        share|improve this answer














        Short cut command approach:



        Sample input.txt contents:



        aaaaaabbbbbccccdddddeeeffffff
        wwwwwwddddd111133333xxxaaaaaa
        ffffff00000sssszzzzz000rrrrrr



        The job:



        cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt


        • -c - to select only characters


        • 1-6,7-11,16-20,24-29 - consecutive ranges of character positions, flexibly adjustable


        • --output-delimiter=^ - output field delimiter, you can adjust it to whatever you want



        The output:



        aaaaaa^bbbbb^ddddd^ffffff
        wwwwww^ddddd^33333^aaaaaa
        ffffff^00000^zzzzz^rrrrrr






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Oct 20 '17 at 8:47

























        answered Oct 20 '17 at 7:46









        RomanPerekhrest

        22.5k12145




        22.5k12145











        • Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
          – agc
          Oct 20 '17 at 8:46











        • @agc, yes, forgot to edit. Thanks
          – RomanPerekhrest
          Oct 20 '17 at 8:48
















        • Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
          – agc
          Oct 20 '17 at 8:46











        • @agc, yes, forgot to edit. Thanks
          – RomanPerekhrest
          Oct 20 '17 at 8:48















        Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
        – agc
        Oct 20 '17 at 8:46





        Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
        – agc
        Oct 20 '17 at 8:46













        @agc, yes, forgot to edit. Thanks
        – RomanPerekhrest
        Oct 20 '17 at 8:48




        @agc, yes, forgot to edit. Thanks
        – RomanPerekhrest
        Oct 20 '17 at 8:48










        up vote
        1
        down vote













        Hard to say without seeing your exact input and desired output, but...



        sed -e "$(
        printf '%dn' 6 5 4 5 3 6 |
        awk '

        f[NR] = f[NR-1] + $1

        END
        for (i=NR; i>0; i--)
        printf "s/./&^/%dn", f[i]


        '
        )" infile.txt | cut -d^ -f1,2,4,6


        Untested. No bugs, I promise. ;)




        Okay, I tested. I was missing the end brace for END. No other bugs. Works perfectly on example input. Output is:



        aaaaaa^bbbbb^ddddd^ffffff





        share|improve this answer
























          up vote
          1
          down vote













          Hard to say without seeing your exact input and desired output, but...



          sed -e "$(
          printf '%dn' 6 5 4 5 3 6 |
          awk '

          f[NR] = f[NR-1] + $1

          END
          for (i=NR; i>0; i--)
          printf "s/./&^/%dn", f[i]


          '
          )" infile.txt | cut -d^ -f1,2,4,6


          Untested. No bugs, I promise. ;)




          Okay, I tested. I was missing the end brace for END. No other bugs. Works perfectly on example input. Output is:



          aaaaaa^bbbbb^ddddd^ffffff





          share|improve this answer






















            up vote
            1
            down vote










            up vote
            1
            down vote









            Hard to say without seeing your exact input and desired output, but...



            sed -e "$(
            printf '%dn' 6 5 4 5 3 6 |
            awk '

            f[NR] = f[NR-1] + $1

            END
            for (i=NR; i>0; i--)
            printf "s/./&^/%dn", f[i]


            '
            )" infile.txt | cut -d^ -f1,2,4,6


            Untested. No bugs, I promise. ;)




            Okay, I tested. I was missing the end brace for END. No other bugs. Works perfectly on example input. Output is:



            aaaaaa^bbbbb^ddddd^ffffff





            share|improve this answer












            Hard to say without seeing your exact input and desired output, but...



            sed -e "$(
            printf '%dn' 6 5 4 5 3 6 |
            awk '

            f[NR] = f[NR-1] + $1

            END
            for (i=NR; i>0; i--)
            printf "s/./&^/%dn", f[i]


            '
            )" infile.txt | cut -d^ -f1,2,4,6


            Untested. No bugs, I promise. ;)




            Okay, I tested. I was missing the end brace for END. No other bugs. Works perfectly on example input. Output is:



            aaaaaa^bbbbb^ddddd^ffffff






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Oct 20 '17 at 2:02









            Wildcard

            22k856154




            22k856154




















                up vote
                0
                down vote













                With sed, one could write (using _ as delimiter):



                sed "$(echo s/./&_/29,23,20,15,11,6;)"


                But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:



                sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"





                share|improve this answer
























                  up vote
                  0
                  down vote













                  With sed, one could write (using _ as delimiter):



                  sed "$(echo s/./&_/29,23,20,15,11,6;)"


                  But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:



                  sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"





                  share|improve this answer






















                    up vote
                    0
                    down vote










                    up vote
                    0
                    down vote









                    With sed, one could write (using _ as delimiter):



                    sed "$(echo s/./&_/29,23,20,15,11,6;)"


                    But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:



                    sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"





                    share|improve this answer












                    With sed, one could write (using _ as delimiter):



                    sed "$(echo s/./&_/29,23,20,15,11,6;)"


                    But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:



                    sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Oct 20 '17 at 8:43









                    Philippos

                    5,93211546




                    5,93211546




















                        up vote
                        0
                        down vote













                        Improved version of RomanPerekhrest's cut answer, with column array parser, including X suffixes to show how many columns to skip.



                        Load array $n, and make a function to parse array into numbers for cut -c:



                        n=(6 5 4X 5 3X 6)
                        col_array() j=$(h=0;
                        for f in $@; do
                        g=$f/[Xx];
                        i=$((h+1));
                        h=$((h+g));
                        [ $g = $f ] && echo -n $i-$h,
                        done;) ;
                        echo $j%,;


                        The file input.txt contains:



                        aaaaaabbbbbccccdddddeeeffffff
                        wwwwwwddddd111133333xxxaaaaaa
                        ffffff00000sssszzzzz000rrrrrr


                        Use col_array() with cut:



                        cut -c $(col_array $n[@]) --output-delimiter=^ input.txt


                        Output:



                        aaaaaa^bbbbb^ddddd^ffffff
                        wwwwww^ddddd^33333^aaaaaa
                        ffffff^00000^zzzzz^rrrrrr


                        There's no strict need for an array, since col_array() parses its parameters:



                        cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt


                        Output:



                        aaa^bbbcccc
                        www^ddd1111
                        fff^000ssss





                        share|improve this answer


























                          up vote
                          0
                          down vote













                          Improved version of RomanPerekhrest's cut answer, with column array parser, including X suffixes to show how many columns to skip.



                          Load array $n, and make a function to parse array into numbers for cut -c:



                          n=(6 5 4X 5 3X 6)
                          col_array() j=$(h=0;
                          for f in $@; do
                          g=$f/[Xx];
                          i=$((h+1));
                          h=$((h+g));
                          [ $g = $f ] && echo -n $i-$h,
                          done;) ;
                          echo $j%,;


                          The file input.txt contains:



                          aaaaaabbbbbccccdddddeeeffffff
                          wwwwwwddddd111133333xxxaaaaaa
                          ffffff00000sssszzzzz000rrrrrr


                          Use col_array() with cut:



                          cut -c $(col_array $n[@]) --output-delimiter=^ input.txt


                          Output:



                          aaaaaa^bbbbb^ddddd^ffffff
                          wwwwww^ddddd^33333^aaaaaa
                          ffffff^00000^zzzzz^rrrrrr


                          There's no strict need for an array, since col_array() parses its parameters:



                          cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt


                          Output:



                          aaa^bbbcccc
                          www^ddd1111
                          fff^000ssss





                          share|improve this answer
























                            up vote
                            0
                            down vote










                            up vote
                            0
                            down vote









                            Improved version of RomanPerekhrest's cut answer, with column array parser, including X suffixes to show how many columns to skip.



                            Load array $n, and make a function to parse array into numbers for cut -c:



                            n=(6 5 4X 5 3X 6)
                            col_array() j=$(h=0;
                            for f in $@; do
                            g=$f/[Xx];
                            i=$((h+1));
                            h=$((h+g));
                            [ $g = $f ] && echo -n $i-$h,
                            done;) ;
                            echo $j%,;


                            The file input.txt contains:



                            aaaaaabbbbbccccdddddeeeffffff
                            wwwwwwddddd111133333xxxaaaaaa
                            ffffff00000sssszzzzz000rrrrrr


                            Use col_array() with cut:



                            cut -c $(col_array $n[@]) --output-delimiter=^ input.txt


                            Output:



                            aaaaaa^bbbbb^ddddd^ffffff
                            wwwwww^ddddd^33333^aaaaaa
                            ffffff^00000^zzzzz^rrrrrr


                            There's no strict need for an array, since col_array() parses its parameters:



                            cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt


                            Output:



                            aaa^bbbcccc
                            www^ddd1111
                            fff^000ssss





                            share|improve this answer














                            Improved version of RomanPerekhrest's cut answer, with column array parser, including X suffixes to show how many columns to skip.



                            Load array $n, and make a function to parse array into numbers for cut -c:



                            n=(6 5 4X 5 3X 6)
                            col_array() j=$(h=0;
                            for f in $@; do
                            g=$f/[Xx];
                            i=$((h+1));
                            h=$((h+g));
                            [ $g = $f ] && echo -n $i-$h,
                            done;) ;
                            echo $j%,;


                            The file input.txt contains:



                            aaaaaabbbbbccccdddddeeeffffff
                            wwwwwwddddd111133333xxxaaaaaa
                            ffffff00000sssszzzzz000rrrrrr


                            Use col_array() with cut:



                            cut -c $(col_array $n[@]) --output-delimiter=^ input.txt


                            Output:



                            aaaaaa^bbbbb^ddddd^ffffff
                            wwwwww^ddddd^33333^aaaaaa
                            ffffff^00000^zzzzz^rrrrrr


                            There's no strict need for an array, since col_array() parses its parameters:



                            cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt


                            Output:



                            aaa^bbbcccc
                            www^ddd1111
                            fff^000ssss






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Oct 20 '17 at 9:06

























                            answered Oct 20 '17 at 8:41









                            agc

                            4,1501935




                            4,1501935



























                                 

                                draft saved


                                draft discarded















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399248%2fextracting-columns-from-a-text-file-with-no-delimiters%23new-answer', 'question_page');

                                );

                                Post as a guest













































































                                Popular posts from this blog

                                How to check contact read email or not when send email to Individual?

                                Displaying single band from multi-band raster using QGIS

                                How many registers does an x86_64 CPU actually have?