Extracting columns from a text file with no delimiters

up vote
1
down vote

favorite

I have a large text file which is basically a stream of data all pretty much compressed together for each row. I've been asked to look into the failure of certain data in some columns. The data is not delimited in any way. I do however have a list of "column" lengths and comments on whether there's relevant data in each "column".

I'd use Excel, but the limit of Excel to delimit by columns is restricted to 1000 characters per row, and each row goes well beyond this. A number of these fields have strings of 30 spaces that act as filler and there's at least a good 15 or so of these... I'm hoping to parse these designated "empty" fields out.

What I need is a way that I can feed my file in and with an array that I can provide which has the column lengths and maybe a marker like "X" to ignore the respective columns I want to ignore, have it spit out a new file with delimiters, which I can then feed back into Excel for analysis.

For example if I had a file with a row like aaaaaabbbbbccccdddddeeeffffff and I feed this file in with an array of [6 5 4X 5 3X 6] it would spit out a file with aaaaaa^bbbbb^ddddd^ffffff in that row.

Is there a way this can be done with grep, awk or sed?

Thanks in advance.

edited Oct 20 '17 at 1:56

asked Oct 20 '17 at 1:32

Eliseo d'Annunzio

1185

do you want ^ to be the exact delimiter in resulting rows?
â€“Â RomanPerekhrest
Oct 20 '17 at 7:34

It was an arbitrary character I used but that would be fine!
â€“Â Eliseo d'Annunzio
Oct 20 '17 at 7:36

add a commentÂ |Â

up vote
1
down vote

favorite

Is there a way this can be done with grep, awk or sed?

Thanks in advance.

edited Oct 20 '17 at 1:56

asked Oct 20 '17 at 1:32

Eliseo d'Annunzio

1185

do you want ^ to be the exact delimiter in resulting rows?
â€“Â RomanPerekhrest
Oct 20 '17 at 7:34

It was an arbitrary character I used but that would be fine!
â€“Â Eliseo d'Annunzio
Oct 20 '17 at 7:36

add a commentÂ |Â

up vote
1
down vote

favorite

Is there a way this can be done with grep, awk or sed?

Thanks in advance.

edited Oct 20 '17 at 1:56

asked Oct 20 '17 at 1:32

Eliseo d'Annunzio

1185

Is there a way this can be done with grep, awk or sed?

Thanks in advance.

edited Oct 20 '17 at 1:56

asked Oct 20 '17 at 1:32

Eliseo d'Annunzio

1185

edited Oct 20 '17 at 1:56

asked Oct 20 '17 at 1:32

Eliseo d'Annunzio

1185

asked Oct 20 '17 at 1:32

Eliseo d'Annunzio

1185

asked Oct 20 '17 at 1:32

Eliseo d'Annunzio

1185

do you want ^ to be the exact delimiter in resulting rows?
â€“Â RomanPerekhrest
Oct 20 '17 at 7:34

It was an arbitrary character I used but that would be fine!
â€“Â Eliseo d'Annunzio
Oct 20 '17 at 7:36

add a commentÂ |Â

do you want ^ to be the exact delimiter in resulting rows?
â€“Â RomanPerekhrest
Oct 20 '17 at 7:34

It was an arbitrary character I used but that would be fine!
â€“Â Eliseo d'Annunzio
Oct 20 '17 at 7:36

do you want ^ to be the exact delimiter in resulting rows?
â€“Â RomanPerekhrest
Oct 20 '17 at 7:34

It was an arbitrary character I used but that would be fine!
â€“Â Eliseo d'Annunzio
Oct 20 '17 at 7:36

add a commentÂ |Â

5 Answers
5

active

oldest

votes

up vote
1
down vote

accepted

If you have GNU awk, you can specify explicit fieldwidths e.g.

$ printf 'aaaaaabbbbbccccdddddeeeffffffn' | 
 gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
aaaaaa^bbbbb^ddddd^ffffff

Starting with version 4.2, you can skip characters using a n:m syntax e.g.

printf 'aaaaaabbbbbccccdddddeeeffffffn' |
 gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
aaaaaa^bbbbb^ddddd^ffffff

(the $1=$ just forces re-evaluation of $0 with the specified fieldwidths).

See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data

edited Oct 20 '17 at 12:07

answered Oct 20 '17 at 1:59

steeldriver

32.1k34979

This is closer to what I had in mind... Thanks!
â€“Â Eliseo d'Annunzio
Oct 20 '17 at 7:38

add a commentÂ |Â

up vote
5
down vote

Short cut command approach:

Sample input.txt contents:

aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr

The job:

cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt

-c - to select only characters

1-6,7-11,16-20,24-29 - consecutive ranges of character positions, flexibly adjustable

--output-delimiter=^ - output field delimiter, you can adjust it to whatever you want

The output:

aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr

edited Oct 20 '17 at 8:47

answered Oct 20 '17 at 7:46

RomanPerekhrest

22.5k12145

Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
â€“Â agc
Oct 20 '17 at 8:46

@agc, yes, forgot to edit. Thanks
â€“Â RomanPerekhrest
Oct 20 '17 at 8:48

add a commentÂ |Â

up vote
1
down vote

Hard to say without seeing your exact input and desired output, but...

sed -e "$(
 printf '%dn' 6 5 4 5 3 6 |
 awk '
 
 f[NR] = f[NR-1] + $1
 
 END 
 for (i=NR; i>0; i--) 
 printf "s/./&^/%dn", f[i]
 
 
 '
)" infile.txt | cut -d^ -f1,2,4,6

Untested. No bugs, I promise. ;)

Okay, I tested. I was missing the end brace for END. No other bugs. Works perfectly on example input. Output is:

aaaaaa^bbbbb^ddddd^ffffff

answered Oct 20 '17 at 2:02

Wildcard

22k856154

add a commentÂ |Â

up vote
0
down vote

With sed, one could write (using _ as delimiter):

sed "$(echo s/./&_/29,23,20,15,11,6;)"

But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:

sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"

answered Oct 20 '17 at 8:43

Philippos

5,93211546

add a commentÂ |Â

up vote
0
down vote

Improved version of RomanPerekhrest's cut answer, with column array parser, including X suffixes to show how many columns to skip.

Load array $n, and make a function to parse array into numbers for cut -c:

n=(6 5 4X 5 3X 6)
col_array() j=$(h=0; 
 for f in $@; do 
 g=$f/[Xx];
 i=$((h+1));
 h=$((h+g));
 [ $g = $f ] && echo -n $i-$h,
 done;) ; 
 echo $j%,;

The file input.txt contains:

aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr

Use col_array() with cut:

cut -c $(col_array $n[@]) --output-delimiter=^ input.txt

Output:

aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr

There's no strict need for an array, since col_array() parses its parameters:

cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt

Output:

aaa^bbbcccc
www^ddd1111
fff^000ssss

edited Oct 20 '17 at 9:06

answered Oct 20 '17 at 8:41

agc

4,1501935

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399248%2fextracting-columns-from-a-text-file-with-no-delimiters%23new-answer', 'question_page');

);

Post as a guest

Name

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

up vote
1
down vote

accepted

If you have GNU awk, you can specify explicit fieldwidths e.g.

$ printf 'aaaaaabbbbbccccdddddeeeffffffn' | 
 gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
aaaaaa^bbbbb^ddddd^ffffff

Starting with version 4.2, you can skip characters using a n:m syntax e.g.

printf 'aaaaaabbbbbccccdddddeeeffffffn' |
 gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
aaaaaa^bbbbb^ddddd^ffffff

(the $1=$ just forces re-evaluation of $0 with the specified fieldwidths).

See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data

edited Oct 20 '17 at 12:07

answered Oct 20 '17 at 1:59

steeldriver

32.1k34979

This is closer to what I had in mind... Thanks!
â€“Â Eliseo d'Annunzio
Oct 20 '17 at 7:38

add a commentÂ |Â

up vote
1
down vote

accepted

If you have GNU awk, you can specify explicit fieldwidths e.g.

$ printf 'aaaaaabbbbbccccdddddeeeffffffn' | 
 gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
aaaaaa^bbbbb^ddddd^ffffff

Starting with version 4.2, you can skip characters using a n:m syntax e.g.

printf 'aaaaaabbbbbccccdddddeeeffffffn' |
 gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
aaaaaa^bbbbb^ddddd^ffffff

(the $1=$ just forces re-evaluation of $0 with the specified fieldwidths).

See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data

edited Oct 20 '17 at 12:07

answered Oct 20 '17 at 1:59

steeldriver

32.1k34979

This is closer to what I had in mind... Thanks!
â€“Â Eliseo d'Annunzio
Oct 20 '17 at 7:38

add a commentÂ |Â

up vote
1
down vote

accepted

If you have GNU awk, you can specify explicit fieldwidths e.g.

$ printf 'aaaaaabbbbbccccdddddeeeffffffn' | 
 gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
aaaaaa^bbbbb^ddddd^ffffff

Starting with version 4.2, you can skip characters using a n:m syntax e.g.

printf 'aaaaaabbbbbccccdddddeeeffffffn' |
 gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
aaaaaa^bbbbb^ddddd^ffffff

(the $1=$ just forces re-evaluation of $0 with the specified fieldwidths).

See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data

edited Oct 20 '17 at 12:07

answered Oct 20 '17 at 1:59

steeldriver

32.1k34979

If you have GNU awk, you can specify explicit fieldwidths e.g.

$ printf 'aaaaaabbbbbccccdddddeeeffffffn' | 
 gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
aaaaaa^bbbbb^ddddd^ffffff

Starting with version 4.2, you can skip characters using a n:m syntax e.g.

printf 'aaaaaabbbbbccccdddddeeeffffffn' |
 gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
aaaaaa^bbbbb^ddddd^ffffff

(the $1=$ just forces re-evaluation of $0 with the specified fieldwidths).

See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data

edited Oct 20 '17 at 12:07

answered Oct 20 '17 at 1:59

steeldriver

32.1k34979

edited Oct 20 '17 at 12:07

answered Oct 20 '17 at 1:59

steeldriver

32.1k34979

answered Oct 20 '17 at 1:59

steeldriver

32.1k34979

answered Oct 20 '17 at 1:59

steeldriver

32.1k34979

This is closer to what I had in mind... Thanks!
â€“Â Eliseo d'Annunzio
Oct 20 '17 at 7:38

add a commentÂ |Â

This is closer to what I had in mind... Thanks!
â€“Â Eliseo d'Annunzio
Oct 20 '17 at 7:38

This is closer to what I had in mind... Thanks!
â€“Â Eliseo d'Annunzio
Oct 20 '17 at 7:38

add a commentÂ |Â

up vote
5
down vote

Short cut command approach:

Sample input.txt contents:

aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr

The job:

cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt

-c - to select only characters

1-6,7-11,16-20,24-29 - consecutive ranges of character positions, flexibly adjustable

--output-delimiter=^ - output field delimiter, you can adjust it to whatever you want

The output:

aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr

edited Oct 20 '17 at 8:47

answered Oct 20 '17 at 7:46

RomanPerekhrest

22.5k12145

Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
â€“Â agc
Oct 20 '17 at 8:46

@agc, yes, forgot to edit. Thanks
â€“Â RomanPerekhrest
Oct 20 '17 at 8:48

add a commentÂ |Â

up vote
5
down vote

Short cut command approach:

Sample input.txt contents:

aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr

The job:

cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt

-c - to select only characters

1-6,7-11,16-20,24-29 - consecutive ranges of character positions, flexibly adjustable

--output-delimiter=^ - output field delimiter, you can adjust it to whatever you want

The output:

aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr

edited Oct 20 '17 at 8:47

answered Oct 20 '17 at 7:46

RomanPerekhrest

22.5k12145

Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
â€“Â agc
Oct 20 '17 at 8:46

@agc, yes, forgot to edit. Thanks
â€“Â RomanPerekhrest
Oct 20 '17 at 8:48

add a commentÂ |Â

up vote
5
down vote

Short cut command approach:

Sample input.txt contents:

aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr

The job:

cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt

-c - to select only characters

1-6,7-11,16-20,24-29 - consecutive ranges of character positions, flexibly adjustable

--output-delimiter=^ - output field delimiter, you can adjust it to whatever you want

The output:

aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr

edited Oct 20 '17 at 8:47

answered Oct 20 '17 at 7:46

RomanPerekhrest

22.5k12145

Short cut command approach:

Sample input.txt contents:

aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr

The job:

cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt

-c - to select only characters

1-6,7-11,16-20,24-29 - consecutive ranges of character positions, flexibly adjustable

--output-delimiter=^ - output field delimiter, you can adjust it to whatever you want

The output:

aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr

edited Oct 20 '17 at 8:47

answered Oct 20 '17 at 7:46

RomanPerekhrest

22.5k12145

edited Oct 20 '17 at 8:47

answered Oct 20 '17 at 7:46

RomanPerekhrest

22.5k12145

answered Oct 20 '17 at 7:46

RomanPerekhrest

22.5k12145

answered Oct 20 '17 at 7:46

RomanPerekhrest

22.5k12145

Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
â€“Â agc
Oct 20 '17 at 8:46

@agc, yes, forgot to edit. Thanks
â€“Â RomanPerekhrest
Oct 20 '17 at 8:48

add a commentÂ |Â

Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
â€“Â agc
Oct 20 '17 at 8:46

@agc, yes, forgot to edit. Thanks
â€“Â RomanPerekhrest
Oct 20 '17 at 8:48

Fencepost error. The numbers -c 1-6,7-12,17-22,26-31 don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff.
â€“Â agc
Oct 20 '17 at 8:46

@agc, yes, forgot to edit. Thanks
â€“Â RomanPerekhrest
Oct 20 '17 at 8:48

add a commentÂ |Â

up vote
1
down vote

Hard to say without seeing your exact input and desired output, but...

sed -e "$(
 printf '%dn' 6 5 4 5 3 6 |
 awk '
 
 f[NR] = f[NR-1] + $1
 
 END 
 for (i=NR; i>0; i--) 
 printf "s/./&^/%dn", f[i]
 
 
 '
)" infile.txt | cut -d^ -f1,2,4,6

Untested. No bugs, I promise. ;)

Okay, I tested. I was missing the end brace for END. No other bugs. Works perfectly on example input. Output is:

aaaaaa^bbbbb^ddddd^ffffff

answered Oct 20 '17 at 2:02

Wildcard

22k856154

add a commentÂ |Â

up vote
1
down vote

Hard to say without seeing your exact input and desired output, but...

sed -e "$(
 printf '%dn' 6 5 4 5 3 6 |
 awk '
 
 f[NR] = f[NR-1] + $1
 
 END 
 for (i=NR; i>0; i--) 
 printf "s/./&^/%dn", f[i]
 
 
 '
)" infile.txt | cut -d^ -f1,2,4,6

Untested. No bugs, I promise. ;)

Okay, I tested. I was missing the end brace for END. No other bugs. Works perfectly on example input. Output is:

aaaaaa^bbbbb^ddddd^ffffff

answered Oct 20 '17 at 2:02

Wildcard

22k856154

add a commentÂ |Â

up vote
1
down vote

Hard to say without seeing your exact input and desired output, but...

sed -e "$(
 printf '%dn' 6 5 4 5 3 6 |
 awk '
 
 f[NR] = f[NR-1] + $1
 
 END 
 for (i=NR; i>0; i--) 
 printf "s/./&^/%dn", f[i]
 
 
 '
)" infile.txt | cut -d^ -f1,2,4,6

Untested. No bugs, I promise. ;)

Okay, I tested. I was missing the end brace for END. No other bugs. Works perfectly on example input. Output is:

aaaaaa^bbbbb^ddddd^ffffff

answered Oct 20 '17 at 2:02

Wildcard

22k856154

Hard to say without seeing your exact input and desired output, but...

sed -e "$(
 printf '%dn' 6 5 4 5 3 6 |
 awk '
 
 f[NR] = f[NR-1] + $1
 
 END 
 for (i=NR; i>0; i--) 
 printf "s/./&^/%dn", f[i]
 
 
 '
)" infile.txt | cut -d^ -f1,2,4,6

Untested. No bugs, I promise. ;)

Okay, I tested. I was missing the end brace for END. No other bugs. Works perfectly on example input. Output is:

aaaaaa^bbbbb^ddddd^ffffff

answered Oct 20 '17 at 2:02

Wildcard

22k856154

answered Oct 20 '17 at 2:02

Wildcard

22k856154

answered Oct 20 '17 at 2:02

Wildcard

22k856154

answered Oct 20 '17 at 2:02

Wildcard

22k856154

add a commentÂ |Â

up vote
0
down vote

With sed, one could write (using _ as delimiter):

sed "$(echo s/./&_/29,23,20,15,11,6;)"

But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:

sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"

answered Oct 20 '17 at 8:43

Philippos

5,93211546

add a commentÂ |Â

up vote
0
down vote

With sed, one could write (using _ as delimiter):

sed "$(echo s/./&_/29,23,20,15,11,6;)"

But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:

sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"

answered Oct 20 '17 at 8:43

Philippos

5,93211546

add a commentÂ |Â

up vote
0
down vote

With sed, one could write (using _ as delimiter):

sed "$(echo s/./&_/29,23,20,15,11,6;)"

But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:

sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"

answered Oct 20 '17 at 8:43

Philippos

5,93211546

With sed, one could write (using _ as delimiter):

sed "$(echo s/./&_/29,23,20,15,11,6;)"

But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:

sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"

answered Oct 20 '17 at 8:43

Philippos

5,93211546

answered Oct 20 '17 at 8:43

Philippos

5,93211546

answered Oct 20 '17 at 8:43

Philippos

5,93211546

answered Oct 20 '17 at 8:43

Philippos

5,93211546

add a commentÂ |Â

up vote
0
down vote

Improved version of RomanPerekhrest's cut answer, with column array parser, including X suffixes to show how many columns to skip.

Load array $n, and make a function to parse array into numbers for cut -c:

n=(6 5 4X 5 3X 6)
col_array() j=$(h=0; 
 for f in $@; do 
 g=$f/[Xx];
 i=$((h+1));
 h=$((h+g));
 [ $g = $f ] && echo -n $i-$h,
 done;) ; 
 echo $j%,;

The file input.txt contains:

aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr

Use col_array() with cut:

cut -c $(col_array $n[@]) --output-delimiter=^ input.txt

Output:

aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr

There's no strict need for an array, since col_array() parses its parameters:

cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt

Output:

aaa^bbbcccc
www^ddd1111
fff^000ssss

edited Oct 20 '17 at 9:06

answered Oct 20 '17 at 8:41

agc

4,1501935

add a commentÂ |Â

up vote
0
down vote

Improved version of RomanPerekhrest's cut answer, with column array parser, including X suffixes to show how many columns to skip.

Load array $n, and make a function to parse array into numbers for cut -c:

n=(6 5 4X 5 3X 6)
col_array() j=$(h=0; 
 for f in $@; do 
 g=$f/[Xx];
 i=$((h+1));
 h=$((h+g));
 [ $g = $f ] && echo -n $i-$h,
 done;) ; 
 echo $j%,;

The file input.txt contains:

aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr

Use col_array() with cut:

cut -c $(col_array $n[@]) --output-delimiter=^ input.txt

Output:

aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr

There's no strict need for an array, since col_array() parses its parameters:

cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt

Output:

aaa^bbbcccc
www^ddd1111
fff^000ssss

edited Oct 20 '17 at 9:06

answered Oct 20 '17 at 8:41

agc

4,1501935

add a commentÂ |Â

up vote
0
down vote

Improved version of RomanPerekhrest's cut answer, with column array parser, including X suffixes to show how many columns to skip.

Load array $n, and make a function to parse array into numbers for cut -c:

n=(6 5 4X 5 3X 6)
col_array() j=$(h=0; 
 for f in $@; do 
 g=$f/[Xx];
 i=$((h+1));
 h=$((h+g));
 [ $g = $f ] && echo -n $i-$h,
 done;) ; 
 echo $j%,;

The file input.txt contains:

aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr

Use col_array() with cut:

cut -c $(col_array $n[@]) --output-delimiter=^ input.txt

Output:

aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr

There's no strict need for an array, since col_array() parses its parameters:

cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt

Output:

aaa^bbbcccc
www^ddd1111
fff^000ssss

edited Oct 20 '17 at 9:06

answered Oct 20 '17 at 8:41

agc

4,1501935

Improved version of RomanPerekhrest's cut answer, with column array parser, including X suffixes to show how many columns to skip.

Load array $n, and make a function to parse array into numbers for cut -c:

n=(6 5 4X 5 3X 6)
col_array() j=$(h=0; 
 for f in $@; do 
 g=$f/[Xx];
 i=$((h+1));
 h=$((h+g));
 [ $g = $f ] && echo -n $i-$h,
 done;) ; 
 echo $j%,;

The file input.txt contains:

aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr

Use col_array() with cut:

cut -c $(col_array $n[@]) --output-delimiter=^ input.txt

Output:

aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr

There's no strict need for an array, since col_array() parses its parameters:

cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt

Output:

aaa^bbbcccc
www^ddd1111
fff^000ssss

edited Oct 20 '17 at 9:06

answered Oct 20 '17 at 8:41

agc

4,1501935

edited Oct 20 '17 at 9:06

answered Oct 20 '17 at 8:41

agc

4,1501935

answered Oct 20 '17 at 8:41

agc

4,1501935

answered Oct 20 '17 at 8:41

agc

4,1501935

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu