Cut the SUBSTRINGS to a specific length in a CSV file
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I have a file like below,
cat Test.csv
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
After modification :
cat Test.csv
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
The 3rd field containing substrings delimited with ";
" has to be replaced with their substrings
text-processing awk sed
add a comment |Â
up vote
1
down vote
favorite
I have a file like below,
cat Test.csv
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
After modification :
cat Test.csv
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
The 3rd field containing substrings delimited with ";
" has to be replaced with their substrings
text-processing awk sed
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a file like below,
cat Test.csv
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
After modification :
cat Test.csv
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
The 3rd field containing substrings delimited with ";
" has to be replaced with their substrings
text-processing awk sed
I have a file like below,
cat Test.csv
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
After modification :
cat Test.csv
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
The 3rd field containing substrings delimited with ";
" has to be replaced with their substrings
text-processing awk sed
edited Oct 22 '17 at 10:37
RomanPerekhrest
22.5k12145
22.5k12145
asked Oct 22 '17 at 6:56
Pavan
61
61
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53
add a comment |Â
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
0
down vote
The following is using csvkit
, because parsing CSV data that contains commas in quoted fields with awk
directly is error prone.
This will get column three on the correct format:
csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd
For the given input, this produces
"AB;xy;15"
"C4;x2;rt"
csvcut
will cut out the third column.sed
will remove any double quotes from the data, if they appear first or last on the line.- The
awk
program will go through the;
-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it. - The output is written to the file
tmp-3rd
.
Then it's just a matter of reassembling this with the original data (this is assuming bash
or any other shell that can do process substitutions with <(...)
):
paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat
paste
will put the columns together with commas in-between.- The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.
- As an optional step, we pass the data through
csvformat
which will quote or unquote fields as needed.
The output will be
pav,12345,AB;xy;15,,
xyz,,C4;x2;rt,,
Bypassing the need for the temporary file:
paste -d,
<( csvcut -c 1,2 file.csv )
<( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
<( csvcut -c 4,5 file.csv ) | csvformat
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavanpip install --user csvkit
will install the commands in$HOME/.local/bin
.
â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
 |Â
show 1 more comment
up vote
0
down vote
With perl
Assuming ;
is only in third field
$ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"K
to match"
before string of interest and(?=")
to match"
after string of interest. But the"
themselves not part of captured string as these are lookarounds[^;"]*;[^"]*
match any non;
or"
characters followed by;
followed by non"
characters$&=~s|([^;]2)[^;]+|$1|gr
to perform another substitution on the matched stringe
modifier allows to use Perl code in substitution section
To restrict only for 3rd field
$ cat ip.txt
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
$ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
add a comment |Â
up vote
0
down vote
Accurate and robust Python 3.x solution (based on csv.reader object):
parse_csv.py
script:
import csv, sys
with open(sys.argv[1]) as f:
reader = csv.reader(f)
for l in reader:
l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))
Usage:
python3 parse_csv.py Test.csv
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Python's csv
module provides robust and flexible support for csv
data.
add a comment |Â
up vote
0
down vote
Complex GNU AWK
solution (parsing csv
data):
awk -v FPAT='"[^"]+"|[^",]+|,,' '
for (i=1;i<=NF;i++)
if ($i~/^".*;./)
len=split($i,a,";"); v=substr(a[1],1,3);
for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
v=v"42"
printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
(i==NF? ORS:OFS); v=""
' OFS=',' Test.csv
FPAT='"[^"]+"|[^",]+|,,'
- complex regex pattern defining field valueif ($i~/^".*;./) ...
- if the current field$i
contains;
character(s)len=split($i,a,";")
- split the field value$i
into arraya
by separator;
.len
is assigned with number of elements/chunks createdv=substr(a[1],1,3);
- capturing the first chunk of the needed length including leading"
char, for ex."AB
will be extracted from"ABCD,EF
for (j=2;j<=len;j++) ...
- iterating through remaining chunks/itemsv=v"42"
- add trailing double quote"
to the processed sequencev
.43
is ASCII octal code representing the double quote char"
.($i~/^,,/? (i==NF? ",":""):$i )
- each empty field,,
is recreated with single comma,
and common delimiter (also,
). This is to avoid redundant comma cluttering like"pav",,,
(i==NF? ORS:OFS)
- on encountering the last fieldi==NF
- print output record separatorORS
, otherwise - print output filed separatorOFS
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
The following is using csvkit
, because parsing CSV data that contains commas in quoted fields with awk
directly is error prone.
This will get column three on the correct format:
csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd
For the given input, this produces
"AB;xy;15"
"C4;x2;rt"
csvcut
will cut out the third column.sed
will remove any double quotes from the data, if they appear first or last on the line.- The
awk
program will go through the;
-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it. - The output is written to the file
tmp-3rd
.
Then it's just a matter of reassembling this with the original data (this is assuming bash
or any other shell that can do process substitutions with <(...)
):
paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat
paste
will put the columns together with commas in-between.- The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.
- As an optional step, we pass the data through
csvformat
which will quote or unquote fields as needed.
The output will be
pav,12345,AB;xy;15,,
xyz,,C4;x2;rt,,
Bypassing the need for the temporary file:
paste -d,
<( csvcut -c 1,2 file.csv )
<( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
<( csvcut -c 4,5 file.csv ) | csvformat
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavanpip install --user csvkit
will install the commands in$HOME/.local/bin
.
â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
 |Â
show 1 more comment
up vote
0
down vote
The following is using csvkit
, because parsing CSV data that contains commas in quoted fields with awk
directly is error prone.
This will get column three on the correct format:
csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd
For the given input, this produces
"AB;xy;15"
"C4;x2;rt"
csvcut
will cut out the third column.sed
will remove any double quotes from the data, if they appear first or last on the line.- The
awk
program will go through the;
-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it. - The output is written to the file
tmp-3rd
.
Then it's just a matter of reassembling this with the original data (this is assuming bash
or any other shell that can do process substitutions with <(...)
):
paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat
paste
will put the columns together with commas in-between.- The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.
- As an optional step, we pass the data through
csvformat
which will quote or unquote fields as needed.
The output will be
pav,12345,AB;xy;15,,
xyz,,C4;x2;rt,,
Bypassing the need for the temporary file:
paste -d,
<( csvcut -c 1,2 file.csv )
<( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
<( csvcut -c 4,5 file.csv ) | csvformat
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavanpip install --user csvkit
will install the commands in$HOME/.local/bin
.
â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
 |Â
show 1 more comment
up vote
0
down vote
up vote
0
down vote
The following is using csvkit
, because parsing CSV data that contains commas in quoted fields with awk
directly is error prone.
This will get column three on the correct format:
csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd
For the given input, this produces
"AB;xy;15"
"C4;x2;rt"
csvcut
will cut out the third column.sed
will remove any double quotes from the data, if they appear first or last on the line.- The
awk
program will go through the;
-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it. - The output is written to the file
tmp-3rd
.
Then it's just a matter of reassembling this with the original data (this is assuming bash
or any other shell that can do process substitutions with <(...)
):
paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat
paste
will put the columns together with commas in-between.- The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.
- As an optional step, we pass the data through
csvformat
which will quote or unquote fields as needed.
The output will be
pav,12345,AB;xy;15,,
xyz,,C4;x2;rt,,
Bypassing the need for the temporary file:
paste -d,
<( csvcut -c 1,2 file.csv )
<( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
<( csvcut -c 4,5 file.csv ) | csvformat
The following is using csvkit
, because parsing CSV data that contains commas in quoted fields with awk
directly is error prone.
This will get column three on the correct format:
csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd
For the given input, this produces
"AB;xy;15"
"C4;x2;rt"
csvcut
will cut out the third column.sed
will remove any double quotes from the data, if they appear first or last on the line.- The
awk
program will go through the;
-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it. - The output is written to the file
tmp-3rd
.
Then it's just a matter of reassembling this with the original data (this is assuming bash
or any other shell that can do process substitutions with <(...)
):
paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat
paste
will put the columns together with commas in-between.- The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.
- As an optional step, we pass the data through
csvformat
which will quote or unquote fields as needed.
The output will be
pav,12345,AB;xy;15,,
xyz,,C4;x2;rt,,
Bypassing the need for the temporary file:
paste -d,
<( csvcut -c 1,2 file.csv )
<( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
<( csvcut -c 4,5 file.csv ) | csvformat
answered Oct 22 '17 at 7:51
Kusalananda
105k14209326
105k14209326
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavanpip install --user csvkit
will install the commands in$HOME/.local/bin
.
â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
 |Â
show 1 more comment
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavanpip install --user csvkit
will install the commands in$HOME/.local/bin
.
â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavan
pip install --user csvkit
will install the commands in $HOME/.local/bin
.â Kusalananda
Oct 22 '17 at 8:29
@Pavan
pip install --user csvkit
will install the commands in $HOME/.local/bin
.â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
 |Â
show 1 more comment
up vote
0
down vote
With perl
Assuming ;
is only in third field
$ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"K
to match"
before string of interest and(?=")
to match"
after string of interest. But the"
themselves not part of captured string as these are lookarounds[^;"]*;[^"]*
match any non;
or"
characters followed by;
followed by non"
characters$&=~s|([^;]2)[^;]+|$1|gr
to perform another substitution on the matched stringe
modifier allows to use Perl code in substitution section
To restrict only for 3rd field
$ cat ip.txt
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
$ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
add a comment |Â
up vote
0
down vote
With perl
Assuming ;
is only in third field
$ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"K
to match"
before string of interest and(?=")
to match"
after string of interest. But the"
themselves not part of captured string as these are lookarounds[^;"]*;[^"]*
match any non;
or"
characters followed by;
followed by non"
characters$&=~s|([^;]2)[^;]+|$1|gr
to perform another substitution on the matched stringe
modifier allows to use Perl code in substitution section
To restrict only for 3rd field
$ cat ip.txt
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
$ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
add a comment |Â
up vote
0
down vote
up vote
0
down vote
With perl
Assuming ;
is only in third field
$ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"K
to match"
before string of interest and(?=")
to match"
after string of interest. But the"
themselves not part of captured string as these are lookarounds[^;"]*;[^"]*
match any non;
or"
characters followed by;
followed by non"
characters$&=~s|([^;]2)[^;]+|$1|gr
to perform another substitution on the matched stringe
modifier allows to use Perl code in substitution section
To restrict only for 3rd field
$ cat ip.txt
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
$ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
With perl
Assuming ;
is only in third field
$ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"K
to match"
before string of interest and(?=")
to match"
after string of interest. But the"
themselves not part of captured string as these are lookarounds[^;"]*;[^"]*
match any non;
or"
characters followed by;
followed by non"
characters$&=~s|([^;]2)[^;]+|$1|gr
to perform another substitution on the matched stringe
modifier allows to use Perl code in substitution section
To restrict only for 3rd field
$ cat ip.txt
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
$ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
answered Oct 22 '17 at 8:46
Sundeep
6,9611826
6,9611826
add a comment |Â
add a comment |Â
up vote
0
down vote
Accurate and robust Python 3.x solution (based on csv.reader object):
parse_csv.py
script:
import csv, sys
with open(sys.argv[1]) as f:
reader = csv.reader(f)
for l in reader:
l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))
Usage:
python3 parse_csv.py Test.csv
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Python's csv
module provides robust and flexible support for csv
data.
add a comment |Â
up vote
0
down vote
Accurate and robust Python 3.x solution (based on csv.reader object):
parse_csv.py
script:
import csv, sys
with open(sys.argv[1]) as f:
reader = csv.reader(f)
for l in reader:
l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))
Usage:
python3 parse_csv.py Test.csv
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Python's csv
module provides robust and flexible support for csv
data.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Accurate and robust Python 3.x solution (based on csv.reader object):
parse_csv.py
script:
import csv, sys
with open(sys.argv[1]) as f:
reader = csv.reader(f)
for l in reader:
l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))
Usage:
python3 parse_csv.py Test.csv
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Python's csv
module provides robust and flexible support for csv
data.
Accurate and robust Python 3.x solution (based on csv.reader object):
parse_csv.py
script:
import csv, sys
with open(sys.argv[1]) as f:
reader = csv.reader(f)
for l in reader:
l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))
Usage:
python3 parse_csv.py Test.csv
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Python's csv
module provides robust and flexible support for csv
data.
edited Oct 22 '17 at 10:32
answered Oct 22 '17 at 10:13
RomanPerekhrest
22.5k12145
22.5k12145
add a comment |Â
add a comment |Â
up vote
0
down vote
Complex GNU AWK
solution (parsing csv
data):
awk -v FPAT='"[^"]+"|[^",]+|,,' '
for (i=1;i<=NF;i++)
if ($i~/^".*;./)
len=split($i,a,";"); v=substr(a[1],1,3);
for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
v=v"42"
printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
(i==NF? ORS:OFS); v=""
' OFS=',' Test.csv
FPAT='"[^"]+"|[^",]+|,,'
- complex regex pattern defining field valueif ($i~/^".*;./) ...
- if the current field$i
contains;
character(s)len=split($i,a,";")
- split the field value$i
into arraya
by separator;
.len
is assigned with number of elements/chunks createdv=substr(a[1],1,3);
- capturing the first chunk of the needed length including leading"
char, for ex."AB
will be extracted from"ABCD,EF
for (j=2;j<=len;j++) ...
- iterating through remaining chunks/itemsv=v"42"
- add trailing double quote"
to the processed sequencev
.43
is ASCII octal code representing the double quote char"
.($i~/^,,/? (i==NF? ",":""):$i )
- each empty field,,
is recreated with single comma,
and common delimiter (also,
). This is to avoid redundant comma cluttering like"pav",,,
(i==NF? ORS:OFS)
- on encountering the last fieldi==NF
- print output record separatorORS
, otherwise - print output filed separatorOFS
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
add a comment |Â
up vote
0
down vote
Complex GNU AWK
solution (parsing csv
data):
awk -v FPAT='"[^"]+"|[^",]+|,,' '
for (i=1;i<=NF;i++)
if ($i~/^".*;./)
len=split($i,a,";"); v=substr(a[1],1,3);
for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
v=v"42"
printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
(i==NF? ORS:OFS); v=""
' OFS=',' Test.csv
FPAT='"[^"]+"|[^",]+|,,'
- complex regex pattern defining field valueif ($i~/^".*;./) ...
- if the current field$i
contains;
character(s)len=split($i,a,";")
- split the field value$i
into arraya
by separator;
.len
is assigned with number of elements/chunks createdv=substr(a[1],1,3);
- capturing the first chunk of the needed length including leading"
char, for ex."AB
will be extracted from"ABCD,EF
for (j=2;j<=len;j++) ...
- iterating through remaining chunks/itemsv=v"42"
- add trailing double quote"
to the processed sequencev
.43
is ASCII octal code representing the double quote char"
.($i~/^,,/? (i==NF? ",":""):$i )
- each empty field,,
is recreated with single comma,
and common delimiter (also,
). This is to avoid redundant comma cluttering like"pav",,,
(i==NF? ORS:OFS)
- on encountering the last fieldi==NF
- print output record separatorORS
, otherwise - print output filed separatorOFS
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Complex GNU AWK
solution (parsing csv
data):
awk -v FPAT='"[^"]+"|[^",]+|,,' '
for (i=1;i<=NF;i++)
if ($i~/^".*;./)
len=split($i,a,";"); v=substr(a[1],1,3);
for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
v=v"42"
printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
(i==NF? ORS:OFS); v=""
' OFS=',' Test.csv
FPAT='"[^"]+"|[^",]+|,,'
- complex regex pattern defining field valueif ($i~/^".*;./) ...
- if the current field$i
contains;
character(s)len=split($i,a,";")
- split the field value$i
into arraya
by separator;
.len
is assigned with number of elements/chunks createdv=substr(a[1],1,3);
- capturing the first chunk of the needed length including leading"
char, for ex."AB
will be extracted from"ABCD,EF
for (j=2;j<=len;j++) ...
- iterating through remaining chunks/itemsv=v"42"
- add trailing double quote"
to the processed sequencev
.43
is ASCII octal code representing the double quote char"
.($i~/^,,/? (i==NF? ",":""):$i )
- each empty field,,
is recreated with single comma,
and common delimiter (also,
). This is to avoid redundant comma cluttering like"pav",,,
(i==NF? ORS:OFS)
- on encountering the last fieldi==NF
- print output record separatorORS
, otherwise - print output filed separatorOFS
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Complex GNU AWK
solution (parsing csv
data):
awk -v FPAT='"[^"]+"|[^",]+|,,' '
for (i=1;i<=NF;i++)
if ($i~/^".*;./)
len=split($i,a,";"); v=substr(a[1],1,3);
for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
v=v"42"
printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
(i==NF? ORS:OFS); v=""
' OFS=',' Test.csv
FPAT='"[^"]+"|[^",]+|,,'
- complex regex pattern defining field valueif ($i~/^".*;./) ...
- if the current field$i
contains;
character(s)len=split($i,a,";")
- split the field value$i
into arraya
by separator;
.len
is assigned with number of elements/chunks createdv=substr(a[1],1,3);
- capturing the first chunk of the needed length including leading"
char, for ex."AB
will be extracted from"ABCD,EF
for (j=2;j<=len;j++) ...
- iterating through remaining chunks/itemsv=v"42"
- add trailing double quote"
to the processed sequencev
.43
is ASCII octal code representing the double quote char"
.($i~/^,,/? (i==NF? ",":""):$i )
- each empty field,,
is recreated with single comma,
and common delimiter (also,
). This is to avoid redundant comma cluttering like"pav",,,
(i==NF? ORS:OFS)
- on encountering the last fieldi==NF
- print output record separatorORS
, otherwise - print output filed separatorOFS
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
edited Oct 22 '17 at 10:38
answered Oct 22 '17 at 7:45
RomanPerekhrest
22.5k12145
22.5k12145
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
add a comment |Â
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399655%2fcut-the-substrings-to-a-specific-length-in-a-csv-file%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53