Extracting columns from a text file with no delimiters
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I have a large text file which is basically a stream of data all pretty much compressed together for each row. I've been asked to look into the failure of certain data in some columns. The data is not delimited in any way. I do however have a list of "column" lengths and comments on whether there's relevant data in each "column".
I'd use Excel, but the limit of Excel to delimit by columns is restricted to 1000 characters per row, and each row goes well beyond this. A number of these fields have strings of 30 spaces that act as filler and there's at least a good 15 or so of these... I'm hoping to parse these designated "empty" fields out.
What I need is a way that I can feed my file in and with an array that I can provide which has the column lengths and maybe a marker like "X" to ignore the respective columns I want to ignore, have it spit out a new file with delimiters, which I can then feed back into Excel for analysis.
For example if I had a file with a row like aaaaaabbbbbccccdddddeeeffffff
and I feed this file in with an array of [6 5 4X 5 3X 6]
it would spit out a file with aaaaaa^bbbbb^ddddd^ffffff
in that row.
Is there a way this can be done with grep
, awk
or sed
?
Thanks in advance.
text-processing
add a comment |Â
up vote
1
down vote
favorite
I have a large text file which is basically a stream of data all pretty much compressed together for each row. I've been asked to look into the failure of certain data in some columns. The data is not delimited in any way. I do however have a list of "column" lengths and comments on whether there's relevant data in each "column".
I'd use Excel, but the limit of Excel to delimit by columns is restricted to 1000 characters per row, and each row goes well beyond this. A number of these fields have strings of 30 spaces that act as filler and there's at least a good 15 or so of these... I'm hoping to parse these designated "empty" fields out.
What I need is a way that I can feed my file in and with an array that I can provide which has the column lengths and maybe a marker like "X" to ignore the respective columns I want to ignore, have it spit out a new file with delimiters, which I can then feed back into Excel for analysis.
For example if I had a file with a row like aaaaaabbbbbccccdddddeeeffffff
and I feed this file in with an array of [6 5 4X 5 3X 6]
it would spit out a file with aaaaaa^bbbbb^ddddd^ffffff
in that row.
Is there a way this can be done with grep
, awk
or sed
?
Thanks in advance.
text-processing
do you want^
to be the exact delimiter in resulting rows?
â RomanPerekhrest
Oct 20 '17 at 7:34
It was an arbitrary character I used but that would be fine!
â Eliseo d'Annunzio
Oct 20 '17 at 7:36
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a large text file which is basically a stream of data all pretty much compressed together for each row. I've been asked to look into the failure of certain data in some columns. The data is not delimited in any way. I do however have a list of "column" lengths and comments on whether there's relevant data in each "column".
I'd use Excel, but the limit of Excel to delimit by columns is restricted to 1000 characters per row, and each row goes well beyond this. A number of these fields have strings of 30 spaces that act as filler and there's at least a good 15 or so of these... I'm hoping to parse these designated "empty" fields out.
What I need is a way that I can feed my file in and with an array that I can provide which has the column lengths and maybe a marker like "X" to ignore the respective columns I want to ignore, have it spit out a new file with delimiters, which I can then feed back into Excel for analysis.
For example if I had a file with a row like aaaaaabbbbbccccdddddeeeffffff
and I feed this file in with an array of [6 5 4X 5 3X 6]
it would spit out a file with aaaaaa^bbbbb^ddddd^ffffff
in that row.
Is there a way this can be done with grep
, awk
or sed
?
Thanks in advance.
text-processing
I have a large text file which is basically a stream of data all pretty much compressed together for each row. I've been asked to look into the failure of certain data in some columns. The data is not delimited in any way. I do however have a list of "column" lengths and comments on whether there's relevant data in each "column".
I'd use Excel, but the limit of Excel to delimit by columns is restricted to 1000 characters per row, and each row goes well beyond this. A number of these fields have strings of 30 spaces that act as filler and there's at least a good 15 or so of these... I'm hoping to parse these designated "empty" fields out.
What I need is a way that I can feed my file in and with an array that I can provide which has the column lengths and maybe a marker like "X" to ignore the respective columns I want to ignore, have it spit out a new file with delimiters, which I can then feed back into Excel for analysis.
For example if I had a file with a row like aaaaaabbbbbccccdddddeeeffffff
and I feed this file in with an array of [6 5 4X 5 3X 6]
it would spit out a file with aaaaaa^bbbbb^ddddd^ffffff
in that row.
Is there a way this can be done with grep
, awk
or sed
?
Thanks in advance.
text-processing
edited Oct 20 '17 at 1:56
asked Oct 20 '17 at 1:32
Eliseo d'Annunzio
1185
1185
do you want^
to be the exact delimiter in resulting rows?
â RomanPerekhrest
Oct 20 '17 at 7:34
It was an arbitrary character I used but that would be fine!
â Eliseo d'Annunzio
Oct 20 '17 at 7:36
add a comment |Â
do you want^
to be the exact delimiter in resulting rows?
â RomanPerekhrest
Oct 20 '17 at 7:34
It was an arbitrary character I used but that would be fine!
â Eliseo d'Annunzio
Oct 20 '17 at 7:36
do you want
^
to be the exact delimiter in resulting rows?â RomanPerekhrest
Oct 20 '17 at 7:34
do you want
^
to be the exact delimiter in resulting rows?â RomanPerekhrest
Oct 20 '17 at 7:34
It was an arbitrary character I used but that would be fine!
â Eliseo d'Annunzio
Oct 20 '17 at 7:36
It was an arbitrary character I used but that would be fine!
â Eliseo d'Annunzio
Oct 20 '17 at 7:36
add a comment |Â
5 Answers
5
active
oldest
votes
up vote
1
down vote
accepted
If you have GNU awk, you can specify explicit fieldwidths e.g.
$ printf 'aaaaaabbbbbccccdddddeeeffffffn' |
gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
aaaaaa^bbbbb^ddddd^ffffff
Starting with version 4.2, you can skip characters using a n:m
syntax e.g.
printf 'aaaaaabbbbbccccdddddeeeffffffn' |
gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
aaaaaa^bbbbb^ddddd^ffffff
(the $1=$
just forces re-evaluation of $0
with the specified fieldwidths).
See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data
This is closer to what I had in mind... Thanks!
â Eliseo d'Annunzio
Oct 20 '17 at 7:38
add a comment |Â
up vote
5
down vote
Short cut
command approach:
Sample input.txt
contents:
aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr
The job:
cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt
-c
- to select only characters1-6,7-11,16-20,24-29
- consecutive ranges of character positions, flexibly adjustable--output-delimiter=^
- output field delimiter, you can adjust it to whatever you want
The output:
aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr
Fencepost error. The numbers-c 1-6,7-12,17-22,26-31
don't match the output, for example with those numbers the first output line would be:aaaaaa^bbbbbc^ddddee^ffff
.
â agc
Oct 20 '17 at 8:46
@agc, yes, forgot to edit. Thanks
â RomanPerekhrest
Oct 20 '17 at 8:48
add a comment |Â
up vote
1
down vote
Hard to say without seeing your exact input and desired output, but...
sed -e "$(
printf '%dn' 6 5 4 5 3 6 |
awk '
f[NR] = f[NR-1] + $1
END
for (i=NR; i>0; i--)
printf "s/./&^/%dn", f[i]
'
)" infile.txt | cut -d^ -f1,2,4,6
Untested. No bugs, I promise. ;)
Okay, I tested. I was missing the end brace for END
. No other bugs. Works perfectly on example input. Output is:
aaaaaa^bbbbb^ddddd^ffffff
add a comment |Â
up vote
0
down vote
With sed
, one could write (using _
as delimiter):
sed "$(echo s/./&_/29,23,20,15,11,6;)"
But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:
sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"
add a comment |Â
up vote
0
down vote
Improved version of RomanPerekhrest's cut
answer, with column array parser, including X
suffixes to show how many columns to skip.
Load array $n
, and make a function to parse array into numbers for cut -c
:
n=(6 5 4X 5 3X 6)
col_array() j=$(h=0;
for f in $@; do
g=$f/[Xx];
i=$((h+1));
h=$((h+g));
[ $g = $f ] && echo -n $i-$h,
done;) ;
echo $j%,;
The file input.txt contains:
aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr
Use col_array()
with cut
:
cut -c $(col_array $n[@]) --output-delimiter=^ input.txt
Output:
aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr
There's no strict need for an array, since col_array()
parses its parameters:
cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt
Output:
aaa^bbbcccc
www^ddd1111
fff^000ssss
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
If you have GNU awk, you can specify explicit fieldwidths e.g.
$ printf 'aaaaaabbbbbccccdddddeeeffffffn' |
gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
aaaaaa^bbbbb^ddddd^ffffff
Starting with version 4.2, you can skip characters using a n:m
syntax e.g.
printf 'aaaaaabbbbbccccdddddeeeffffffn' |
gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
aaaaaa^bbbbb^ddddd^ffffff
(the $1=$
just forces re-evaluation of $0
with the specified fieldwidths).
See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data
This is closer to what I had in mind... Thanks!
â Eliseo d'Annunzio
Oct 20 '17 at 7:38
add a comment |Â
up vote
1
down vote
accepted
If you have GNU awk, you can specify explicit fieldwidths e.g.
$ printf 'aaaaaabbbbbccccdddddeeeffffffn' |
gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
aaaaaa^bbbbb^ddddd^ffffff
Starting with version 4.2, you can skip characters using a n:m
syntax e.g.
printf 'aaaaaabbbbbccccdddddeeeffffffn' |
gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
aaaaaa^bbbbb^ddddd^ffffff
(the $1=$
just forces re-evaluation of $0
with the specified fieldwidths).
See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data
This is closer to what I had in mind... Thanks!
â Eliseo d'Annunzio
Oct 20 '17 at 7:38
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
If you have GNU awk, you can specify explicit fieldwidths e.g.
$ printf 'aaaaaabbbbbccccdddddeeeffffffn' |
gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
aaaaaa^bbbbb^ddddd^ffffff
Starting with version 4.2, you can skip characters using a n:m
syntax e.g.
printf 'aaaaaabbbbbccccdddddeeeffffffn' |
gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
aaaaaa^bbbbb^ddddd^ffffff
(the $1=$
just forces re-evaluation of $0
with the specified fieldwidths).
See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data
If you have GNU awk, you can specify explicit fieldwidths e.g.
$ printf 'aaaaaabbbbbccccdddddeeeffffffn' |
gawk -v FIELDWIDTHS="6 5 4 5 3 6" -v OFS="^" 'print $1, $2, $4, $6'
aaaaaa^bbbbb^ddddd^ffffff
Starting with version 4.2, you can skip characters using a n:m
syntax e.g.
printf 'aaaaaabbbbbccccdddddeeeffffffn' |
gawk -v FIELDWIDTHS="6 5 4:5 3:6" -v OFS="^" '$1=$1 1'
aaaaaa^bbbbb^ddddd^ffffff
(the $1=$
just forces re-evaluation of $0
with the specified fieldwidths).
See for example The GNU Awk User's Guide: 4.6 Reading Fixed-Width Data
edited Oct 20 '17 at 12:07
answered Oct 20 '17 at 1:59
steeldriver
32.1k34979
32.1k34979
This is closer to what I had in mind... Thanks!
â Eliseo d'Annunzio
Oct 20 '17 at 7:38
add a comment |Â
This is closer to what I had in mind... Thanks!
â Eliseo d'Annunzio
Oct 20 '17 at 7:38
This is closer to what I had in mind... Thanks!
â Eliseo d'Annunzio
Oct 20 '17 at 7:38
This is closer to what I had in mind... Thanks!
â Eliseo d'Annunzio
Oct 20 '17 at 7:38
add a comment |Â
up vote
5
down vote
Short cut
command approach:
Sample input.txt
contents:
aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr
The job:
cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt
-c
- to select only characters1-6,7-11,16-20,24-29
- consecutive ranges of character positions, flexibly adjustable--output-delimiter=^
- output field delimiter, you can adjust it to whatever you want
The output:
aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr
Fencepost error. The numbers-c 1-6,7-12,17-22,26-31
don't match the output, for example with those numbers the first output line would be:aaaaaa^bbbbbc^ddddee^ffff
.
â agc
Oct 20 '17 at 8:46
@agc, yes, forgot to edit. Thanks
â RomanPerekhrest
Oct 20 '17 at 8:48
add a comment |Â
up vote
5
down vote
Short cut
command approach:
Sample input.txt
contents:
aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr
The job:
cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt
-c
- to select only characters1-6,7-11,16-20,24-29
- consecutive ranges of character positions, flexibly adjustable--output-delimiter=^
- output field delimiter, you can adjust it to whatever you want
The output:
aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr
Fencepost error. The numbers-c 1-6,7-12,17-22,26-31
don't match the output, for example with those numbers the first output line would be:aaaaaa^bbbbbc^ddddee^ffff
.
â agc
Oct 20 '17 at 8:46
@agc, yes, forgot to edit. Thanks
â RomanPerekhrest
Oct 20 '17 at 8:48
add a comment |Â
up vote
5
down vote
up vote
5
down vote
Short cut
command approach:
Sample input.txt
contents:
aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr
The job:
cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt
-c
- to select only characters1-6,7-11,16-20,24-29
- consecutive ranges of character positions, flexibly adjustable--output-delimiter=^
- output field delimiter, you can adjust it to whatever you want
The output:
aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr
Short cut
command approach:
Sample input.txt
contents:
aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr
The job:
cut -c 1-6,7-11,16-20,24-29 --output-delimiter=^ input.txt
-c
- to select only characters1-6,7-11,16-20,24-29
- consecutive ranges of character positions, flexibly adjustable--output-delimiter=^
- output field delimiter, you can adjust it to whatever you want
The output:
aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr
edited Oct 20 '17 at 8:47
answered Oct 20 '17 at 7:46
RomanPerekhrest
22.5k12145
22.5k12145
Fencepost error. The numbers-c 1-6,7-12,17-22,26-31
don't match the output, for example with those numbers the first output line would be:aaaaaa^bbbbbc^ddddee^ffff
.
â agc
Oct 20 '17 at 8:46
@agc, yes, forgot to edit. Thanks
â RomanPerekhrest
Oct 20 '17 at 8:48
add a comment |Â
Fencepost error. The numbers-c 1-6,7-12,17-22,26-31
don't match the output, for example with those numbers the first output line would be:aaaaaa^bbbbbc^ddddee^ffff
.
â agc
Oct 20 '17 at 8:46
@agc, yes, forgot to edit. Thanks
â RomanPerekhrest
Oct 20 '17 at 8:48
Fencepost error. The numbers
-c 1-6,7-12,17-22,26-31
don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff
.â agc
Oct 20 '17 at 8:46
Fencepost error. The numbers
-c 1-6,7-12,17-22,26-31
don't match the output, for example with those numbers the first output line would be: aaaaaa^bbbbbc^ddddee^ffff
.â agc
Oct 20 '17 at 8:46
@agc, yes, forgot to edit. Thanks
â RomanPerekhrest
Oct 20 '17 at 8:48
@agc, yes, forgot to edit. Thanks
â RomanPerekhrest
Oct 20 '17 at 8:48
add a comment |Â
up vote
1
down vote
Hard to say without seeing your exact input and desired output, but...
sed -e "$(
printf '%dn' 6 5 4 5 3 6 |
awk '
f[NR] = f[NR-1] + $1
END
for (i=NR; i>0; i--)
printf "s/./&^/%dn", f[i]
'
)" infile.txt | cut -d^ -f1,2,4,6
Untested. No bugs, I promise. ;)
Okay, I tested. I was missing the end brace for END
. No other bugs. Works perfectly on example input. Output is:
aaaaaa^bbbbb^ddddd^ffffff
add a comment |Â
up vote
1
down vote
Hard to say without seeing your exact input and desired output, but...
sed -e "$(
printf '%dn' 6 5 4 5 3 6 |
awk '
f[NR] = f[NR-1] + $1
END
for (i=NR; i>0; i--)
printf "s/./&^/%dn", f[i]
'
)" infile.txt | cut -d^ -f1,2,4,6
Untested. No bugs, I promise. ;)
Okay, I tested. I was missing the end brace for END
. No other bugs. Works perfectly on example input. Output is:
aaaaaa^bbbbb^ddddd^ffffff
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Hard to say without seeing your exact input and desired output, but...
sed -e "$(
printf '%dn' 6 5 4 5 3 6 |
awk '
f[NR] = f[NR-1] + $1
END
for (i=NR; i>0; i--)
printf "s/./&^/%dn", f[i]
'
)" infile.txt | cut -d^ -f1,2,4,6
Untested. No bugs, I promise. ;)
Okay, I tested. I was missing the end brace for END
. No other bugs. Works perfectly on example input. Output is:
aaaaaa^bbbbb^ddddd^ffffff
Hard to say without seeing your exact input and desired output, but...
sed -e "$(
printf '%dn' 6 5 4 5 3 6 |
awk '
f[NR] = f[NR-1] + $1
END
for (i=NR; i>0; i--)
printf "s/./&^/%dn", f[i]
'
)" infile.txt | cut -d^ -f1,2,4,6
Untested. No bugs, I promise. ;)
Okay, I tested. I was missing the end brace for END
. No other bugs. Works perfectly on example input. Output is:
aaaaaa^bbbbb^ddddd^ffffff
answered Oct 20 '17 at 2:02
Wildcard
22k856154
22k856154
add a comment |Â
add a comment |Â
up vote
0
down vote
With sed
, one could write (using _
as delimiter):
sed "$(echo s/./&_/29,23,20,15,11,6;)"
But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:
sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"
add a comment |Â
up vote
0
down vote
With sed
, one could write (using _
as delimiter):
sed "$(echo s/./&_/29,23,20,15,11,6;)"
But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:
sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"
add a comment |Â
up vote
0
down vote
up vote
0
down vote
With sed
, one could write (using _
as delimiter):
sed "$(echo s/./&_/29,23,20,15,11,6;)"
But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:
sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"
With sed
, one could write (using _
as delimiter):
sed "$(echo s/./&_/29,23,20,15,11,6;)"
But this means to sum up the absolute positions from the column widths. TO directly use the widths, we need some ugly escaping for the command substitution:
sed -E "s/./&_/6;$(echo s/.*_(.)5,4,5,3,6/&_/;)"
answered Oct 20 '17 at 8:43
Philippos
5,93211546
5,93211546
add a comment |Â
add a comment |Â
up vote
0
down vote
Improved version of RomanPerekhrest's cut
answer, with column array parser, including X
suffixes to show how many columns to skip.
Load array $n
, and make a function to parse array into numbers for cut -c
:
n=(6 5 4X 5 3X 6)
col_array() j=$(h=0;
for f in $@; do
g=$f/[Xx];
i=$((h+1));
h=$((h+g));
[ $g = $f ] && echo -n $i-$h,
done;) ;
echo $j%,;
The file input.txt contains:
aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr
Use col_array()
with cut
:
cut -c $(col_array $n[@]) --output-delimiter=^ input.txt
Output:
aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr
There's no strict need for an array, since col_array()
parses its parameters:
cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt
Output:
aaa^bbbcccc
www^ddd1111
fff^000ssss
add a comment |Â
up vote
0
down vote
Improved version of RomanPerekhrest's cut
answer, with column array parser, including X
suffixes to show how many columns to skip.
Load array $n
, and make a function to parse array into numbers for cut -c
:
n=(6 5 4X 5 3X 6)
col_array() j=$(h=0;
for f in $@; do
g=$f/[Xx];
i=$((h+1));
h=$((h+g));
[ $g = $f ] && echo -n $i-$h,
done;) ;
echo $j%,;
The file input.txt contains:
aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr
Use col_array()
with cut
:
cut -c $(col_array $n[@]) --output-delimiter=^ input.txt
Output:
aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr
There's no strict need for an array, since col_array()
parses its parameters:
cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt
Output:
aaa^bbbcccc
www^ddd1111
fff^000ssss
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Improved version of RomanPerekhrest's cut
answer, with column array parser, including X
suffixes to show how many columns to skip.
Load array $n
, and make a function to parse array into numbers for cut -c
:
n=(6 5 4X 5 3X 6)
col_array() j=$(h=0;
for f in $@; do
g=$f/[Xx];
i=$((h+1));
h=$((h+g));
[ $g = $f ] && echo -n $i-$h,
done;) ;
echo $j%,;
The file input.txt contains:
aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr
Use col_array()
with cut
:
cut -c $(col_array $n[@]) --output-delimiter=^ input.txt
Output:
aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr
There's no strict need for an array, since col_array()
parses its parameters:
cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt
Output:
aaa^bbbcccc
www^ddd1111
fff^000ssss
Improved version of RomanPerekhrest's cut
answer, with column array parser, including X
suffixes to show how many columns to skip.
Load array $n
, and make a function to parse array into numbers for cut -c
:
n=(6 5 4X 5 3X 6)
col_array() j=$(h=0;
for f in $@; do
g=$f/[Xx];
i=$((h+1));
h=$((h+g));
[ $g = $f ] && echo -n $i-$h,
done;) ;
echo $j%,;
The file input.txt contains:
aaaaaabbbbbccccdddddeeeffffff
wwwwwwddddd111133333xxxaaaaaa
ffffff00000sssszzzzz000rrrrrr
Use col_array()
with cut
:
cut -c $(col_array $n[@]) --output-delimiter=^ input.txt
Output:
aaaaaa^bbbbb^ddddd^ffffff
wwwwww^ddddd^33333^aaaaaa
ffffff^00000^zzzzz^rrrrrr
There's no strict need for an array, since col_array()
parses its parameters:
cut -c $(col_array 3 5X 7) --output-delimiter=^ input.txt
Output:
aaa^bbbcccc
www^ddd1111
fff^000ssss
edited Oct 20 '17 at 9:06
answered Oct 20 '17 at 8:41
agc
4,1501935
4,1501935
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399248%2fextracting-columns-from-a-text-file-with-no-delimiters%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
do you want
^
to be the exact delimiter in resulting rows?â RomanPerekhrest
Oct 20 '17 at 7:34
It was an arbitrary character I used but that would be fine!
â Eliseo d'Annunzio
Oct 20 '17 at 7:36