Cut the SUBSTRINGS to a specific length in a CSV file

Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I have a file like below,
cat Test.csv
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
After modification :
cat Test.csv
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
The 3rd field containing substrings delimited with ";" has to be replaced with their substrings
text-processing awk sed
add a comment |Â
up vote
1
down vote
favorite
I have a file like below,
cat Test.csv
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
After modification :
cat Test.csv
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
The 3rd field containing substrings delimited with ";" has to be replaced with their substrings
text-processing awk sed
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a file like below,
cat Test.csv
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
After modification :
cat Test.csv
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
The 3rd field containing substrings delimited with ";" has to be replaced with their substrings
text-processing awk sed
I have a file like below,
cat Test.csv
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
After modification :
cat Test.csv
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
The 3rd field containing substrings delimited with ";" has to be replaced with their substrings
text-processing awk sed
edited Oct 22 '17 at 10:37
RomanPerekhrest
22.5k12145
22.5k12145
asked Oct 22 '17 at 6:56
Pavan
61
61
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53
add a comment |Â
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
0
down vote
The following is using csvkit, because parsing CSV data that contains commas in quoted fields with awk directly is error prone.
This will get column three on the correct format:
csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd
For the given input, this produces
"AB;xy;15"
"C4;x2;rt"
csvcutwill cut out the third column.sedwill remove any double quotes from the data, if they appear first or last on the line.- The
awkprogram will go through the;-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it. - The output is written to the file
tmp-3rd.
Then it's just a matter of reassembling this with the original data (this is assuming bash or any other shell that can do process substitutions with <(...)):
paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat
pastewill put the columns together with commas in-between.- The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.
- As an optional step, we pass the data through
csvformatwhich will quote or unquote fields as needed.
The output will be
pav,12345,AB;xy;15,,
xyz,,C4;x2;rt,,
Bypassing the need for the temporary file:
paste -d,
<( csvcut -c 1,2 file.csv )
<( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
<( csvcut -c 4,5 file.csv ) | csvformat
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavanpip install --user csvkitwill install the commands in$HOME/.local/bin.
â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
 |Â
show 1 more comment
up vote
0
down vote
With perl
Assuming ; is only in third field
$ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"Kto match"before string of interest and(?=")to match"after string of interest. But the"themselves not part of captured string as these are lookarounds[^;"]*;[^"]*match any non;or"characters followed by;followed by non"characters$&=~s|([^;]2)[^;]+|$1|grto perform another substitution on the matched stringemodifier allows to use Perl code in substitution section
To restrict only for 3rd field
$ cat ip.txt
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
$ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
add a comment |Â
up vote
0
down vote
Accurate and robust Python 3.x solution (based on csv.reader object):
parse_csv.py script:
import csv, sys
with open(sys.argv[1]) as f:
reader = csv.reader(f)
for l in reader:
l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))
Usage:
python3 parse_csv.py Test.csv
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Python's csv module provides robust and flexible support for csv data.
add a comment |Â
up vote
0
down vote
Complex GNU AWK solution (parsing csv data):
awk -v FPAT='"[^"]+"|[^",]+|,,' '
for (i=1;i<=NF;i++)
if ($i~/^".*;./)
len=split($i,a,";"); v=substr(a[1],1,3);
for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
v=v"42"
printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
(i==NF? ORS:OFS); v=""
' OFS=',' Test.csv
FPAT='"[^"]+"|[^",]+|,,'- complex regex pattern defining field valueif ($i~/^".*;./) ...- if the current field$icontains;character(s)len=split($i,a,";")- split the field value$iinto arrayaby separator;.lenis assigned with number of elements/chunks createdv=substr(a[1],1,3);- capturing the first chunk of the needed length including leading"char, for ex."ABwill be extracted from"ABCD,EFfor (j=2;j<=len;j++) ...- iterating through remaining chunks/itemsv=v"42"- add trailing double quote"to the processed sequencev.43is ASCII octal code representing the double quote char".($i~/^,,/? (i==NF? ",":""):$i )- each empty field,,is recreated with single comma,and common delimiter (also,). This is to avoid redundant comma cluttering like"pav",,,(i==NF? ORS:OFS)- on encountering the last fieldi==NF- print output record separatorORS, otherwise - print output filed separatorOFS
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
The following is using csvkit, because parsing CSV data that contains commas in quoted fields with awk directly is error prone.
This will get column three on the correct format:
csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd
For the given input, this produces
"AB;xy;15"
"C4;x2;rt"
csvcutwill cut out the third column.sedwill remove any double quotes from the data, if they appear first or last on the line.- The
awkprogram will go through the;-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it. - The output is written to the file
tmp-3rd.
Then it's just a matter of reassembling this with the original data (this is assuming bash or any other shell that can do process substitutions with <(...)):
paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat
pastewill put the columns together with commas in-between.- The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.
- As an optional step, we pass the data through
csvformatwhich will quote or unquote fields as needed.
The output will be
pav,12345,AB;xy;15,,
xyz,,C4;x2;rt,,
Bypassing the need for the temporary file:
paste -d,
<( csvcut -c 1,2 file.csv )
<( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
<( csvcut -c 4,5 file.csv ) | csvformat
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavanpip install --user csvkitwill install the commands in$HOME/.local/bin.
â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
 |Â
show 1 more comment
up vote
0
down vote
The following is using csvkit, because parsing CSV data that contains commas in quoted fields with awk directly is error prone.
This will get column three on the correct format:
csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd
For the given input, this produces
"AB;xy;15"
"C4;x2;rt"
csvcutwill cut out the third column.sedwill remove any double quotes from the data, if they appear first or last on the line.- The
awkprogram will go through the;-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it. - The output is written to the file
tmp-3rd.
Then it's just a matter of reassembling this with the original data (this is assuming bash or any other shell that can do process substitutions with <(...)):
paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat
pastewill put the columns together with commas in-between.- The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.
- As an optional step, we pass the data through
csvformatwhich will quote or unquote fields as needed.
The output will be
pav,12345,AB;xy;15,,
xyz,,C4;x2;rt,,
Bypassing the need for the temporary file:
paste -d,
<( csvcut -c 1,2 file.csv )
<( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
<( csvcut -c 4,5 file.csv ) | csvformat
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavanpip install --user csvkitwill install the commands in$HOME/.local/bin.
â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
 |Â
show 1 more comment
up vote
0
down vote
up vote
0
down vote
The following is using csvkit, because parsing CSV data that contains commas in quoted fields with awk directly is error prone.
This will get column three on the correct format:
csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd
For the given input, this produces
"AB;xy;15"
"C4;x2;rt"
csvcutwill cut out the third column.sedwill remove any double quotes from the data, if they appear first or last on the line.- The
awkprogram will go through the;-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it. - The output is written to the file
tmp-3rd.
Then it's just a matter of reassembling this with the original data (this is assuming bash or any other shell that can do process substitutions with <(...)):
paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat
pastewill put the columns together with commas in-between.- The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.
- As an optional step, we pass the data through
csvformatwhich will quote or unquote fields as needed.
The output will be
pav,12345,AB;xy;15,,
xyz,,C4;x2;rt,,
Bypassing the need for the temporary file:
paste -d,
<( csvcut -c 1,2 file.csv )
<( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
<( csvcut -c 4,5 file.csv ) | csvformat
The following is using csvkit, because parsing CSV data that contains commas in quoted fields with awk directly is error prone.
This will get column three on the correct format:
csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' >tmp-3rd
For the given input, this produces
"AB;xy;15"
"C4;x2;rt"
csvcutwill cut out the third column.sedwill remove any double quotes from the data, if they appear first or last on the line.- The
awkprogram will go through the;-delimited fields and cut them down to a length of two characters per field. It prints out the data with double quotes around it. - The output is written to the file
tmp-3rd.
Then it's just a matter of reassembling this with the original data (this is assuming bash or any other shell that can do process substitutions with <(...)):
paste -d, <( csvcut -c 1,2 file.csv ) tmp-3rd <( csvcut -c 4,5 file.csv ) | csvformat
pastewill put the columns together with commas in-between.- The first process substitution produces the first two columns from the original file, and the second produces the last two columns. In the middle, we provide the modified third column.
- As an optional step, we pass the data through
csvformatwhich will quote or unquote fields as needed.
The output will be
pav,12345,AB;xy;15,,
xyz,,C4;x2;rt,,
Bypassing the need for the temporary file:
paste -d,
<( csvcut -c 1,2 file.csv )
<( csvcut -c 3 file.csv | sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' )
<( csvcut -c 4,5 file.csv ) | csvformat
answered Oct 22 '17 at 7:51
Kusalananda
105k14209326
105k14209326
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavanpip install --user csvkitwill install the commands in$HOME/.local/bin.
â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
 |Â
show 1 more comment
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavanpip install --user csvkitwill install the commands in$HOME/.local/bin.
â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
I doesn't seem to have a csvkit , will try to use thae AWK as the rest had concquered some how, Thanks :)
â Pavan
Oct 22 '17 at 8:18
@Pavan
pip install --user csvkit will install the commands in $HOME/.local/bin.â Kusalananda
Oct 22 '17 at 8:29
@Pavan
pip install --user csvkit will install the commands in $HOME/.local/bin.â Kusalananda
Oct 22 '17 at 8:29
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
I tried with this which works fine , but what i actually wasnt to is, the third field in above example has to be replaced with the modified one in he file could you help me : awk -F',"' 'print $5 "t" ' test.csv|awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 25) printf(""%s"n", $0) '
â Pavan
Oct 23 '17 at 15:10
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
@Pavan I don't understand. Please update the question with the appropriate information.
â Kusalananda
Oct 23 '17 at 15:14
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
I mean now i am able to see the output on the screen as expected, But my actual need is to replace that in the actual file. awk -F',"' 'print $3 "t" ' test.csv --> helping me to print the 3rd field eg. "ABCD,EF;xyz23;15rtg" and awk -F';' -vOFS=';' ' for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) printf(""%s"n", $0) ' --> helping me to limit the substrings to 2 char. the desired output is seen on the screen, but i want that to be updated in the file and the file wold have like 50k lines and the action should happen to all lines
â Pavan
Oct 23 '17 at 15:22
 |Â
show 1 more comment
up vote
0
down vote
With perl
Assuming ; is only in third field
$ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"Kto match"before string of interest and(?=")to match"after string of interest. But the"themselves not part of captured string as these are lookarounds[^;"]*;[^"]*match any non;or"characters followed by;followed by non"characters$&=~s|([^;]2)[^;]+|$1|grto perform another substitution on the matched stringemodifier allows to use Perl code in substitution section
To restrict only for 3rd field
$ cat ip.txt
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
$ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
add a comment |Â
up vote
0
down vote
With perl
Assuming ; is only in third field
$ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"Kto match"before string of interest and(?=")to match"after string of interest. But the"themselves not part of captured string as these are lookarounds[^;"]*;[^"]*match any non;or"characters followed by;followed by non"characters$&=~s|([^;]2)[^;]+|$1|grto perform another substitution on the matched stringemodifier allows to use Perl code in substitution section
To restrict only for 3rd field
$ cat ip.txt
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
$ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
add a comment |Â
up vote
0
down vote
up vote
0
down vote
With perl
Assuming ; is only in third field
$ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"Kto match"before string of interest and(?=")to match"after string of interest. But the"themselves not part of captured string as these are lookarounds[^;"]*;[^"]*match any non;or"characters followed by;followed by non"characters$&=~s|([^;]2)[^;]+|$1|grto perform another substitution on the matched stringemodifier allows to use Perl code in substitution section
To restrict only for 3rd field
$ cat ip.txt
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
$ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
With perl
Assuming ; is only in third field
$ perl -pe 's/"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"Kto match"before string of interest and(?=")to match"after string of interest. But the"themselves not part of captured string as these are lookarounds[^;"]*;[^"]*match any non;or"characters followed by;followed by non"characters$&=~s|([^;]2)[^;]+|$1|grto perform another substitution on the matched stringemodifier allows to use Perl code in substitution section
To restrict only for 3rd field
$ cat ip.txt
"pav",12345,"ABCD,EF;xyz23;15rtg",,
"xyz",,"C4DEF;x23yu;rtg",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
$ perl -pe 's/^("[^"]*",|[^,]*,)2"K[^;"]*;[^"]*(?=")/$&=~s|([^;]2)[^;]+|$1|gr/e' ip.txt
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
"foo;12,23;good",124,253
12,5232,"xyz","ijk;5545;62"
answered Oct 22 '17 at 8:46
Sundeep
6,9611826
6,9611826
add a comment |Â
add a comment |Â
up vote
0
down vote
Accurate and robust Python 3.x solution (based on csv.reader object):
parse_csv.py script:
import csv, sys
with open(sys.argv[1]) as f:
reader = csv.reader(f)
for l in reader:
l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))
Usage:
python3 parse_csv.py Test.csv
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Python's csv module provides robust and flexible support for csv data.
add a comment |Â
up vote
0
down vote
Accurate and robust Python 3.x solution (based on csv.reader object):
parse_csv.py script:
import csv, sys
with open(sys.argv[1]) as f:
reader = csv.reader(f)
for l in reader:
l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))
Usage:
python3 parse_csv.py Test.csv
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Python's csv module provides robust and flexible support for csv data.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Accurate and robust Python 3.x solution (based on csv.reader object):
parse_csv.py script:
import csv, sys
with open(sys.argv[1]) as f:
reader = csv.reader(f)
for l in reader:
l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))
Usage:
python3 parse_csv.py Test.csv
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Python's csv module provides robust and flexible support for csv data.
Accurate and robust Python 3.x solution (based on csv.reader object):
parse_csv.py script:
import csv, sys
with open(sys.argv[1]) as f:
reader = csv.reader(f)
for l in reader:
l = [s if ';' not in s else ';'.join(_[:2] for _ in s.split(';')) for s in l]
print(','.join(i if not i or i.isnumeric() else '""'.format(i) for i in l))
Usage:
python3 parse_csv.py Test.csv
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Python's csv module provides robust and flexible support for csv data.
edited Oct 22 '17 at 10:32
answered Oct 22 '17 at 10:13
RomanPerekhrest
22.5k12145
22.5k12145
add a comment |Â
add a comment |Â
up vote
0
down vote
Complex GNU AWK solution (parsing csv data):
awk -v FPAT='"[^"]+"|[^",]+|,,' '
for (i=1;i<=NF;i++)
if ($i~/^".*;./)
len=split($i,a,";"); v=substr(a[1],1,3);
for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
v=v"42"
printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
(i==NF? ORS:OFS); v=""
' OFS=',' Test.csv
FPAT='"[^"]+"|[^",]+|,,'- complex regex pattern defining field valueif ($i~/^".*;./) ...- if the current field$icontains;character(s)len=split($i,a,";")- split the field value$iinto arrayaby separator;.lenis assigned with number of elements/chunks createdv=substr(a[1],1,3);- capturing the first chunk of the needed length including leading"char, for ex."ABwill be extracted from"ABCD,EFfor (j=2;j<=len;j++) ...- iterating through remaining chunks/itemsv=v"42"- add trailing double quote"to the processed sequencev.43is ASCII octal code representing the double quote char".($i~/^,,/? (i==NF? ",":""):$i )- each empty field,,is recreated with single comma,and common delimiter (also,). This is to avoid redundant comma cluttering like"pav",,,(i==NF? ORS:OFS)- on encountering the last fieldi==NF- print output record separatorORS, otherwise - print output filed separatorOFS
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
add a comment |Â
up vote
0
down vote
Complex GNU AWK solution (parsing csv data):
awk -v FPAT='"[^"]+"|[^",]+|,,' '
for (i=1;i<=NF;i++)
if ($i~/^".*;./)
len=split($i,a,";"); v=substr(a[1],1,3);
for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
v=v"42"
printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
(i==NF? ORS:OFS); v=""
' OFS=',' Test.csv
FPAT='"[^"]+"|[^",]+|,,'- complex regex pattern defining field valueif ($i~/^".*;./) ...- if the current field$icontains;character(s)len=split($i,a,";")- split the field value$iinto arrayaby separator;.lenis assigned with number of elements/chunks createdv=substr(a[1],1,3);- capturing the first chunk of the needed length including leading"char, for ex."ABwill be extracted from"ABCD,EFfor (j=2;j<=len;j++) ...- iterating through remaining chunks/itemsv=v"42"- add trailing double quote"to the processed sequencev.43is ASCII octal code representing the double quote char".($i~/^,,/? (i==NF? ",":""):$i )- each empty field,,is recreated with single comma,and common delimiter (also,). This is to avoid redundant comma cluttering like"pav",,,(i==NF? ORS:OFS)- on encountering the last fieldi==NF- print output record separatorORS, otherwise - print output filed separatorOFS
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Complex GNU AWK solution (parsing csv data):
awk -v FPAT='"[^"]+"|[^",]+|,,' '
for (i=1;i<=NF;i++)
if ($i~/^".*;./)
len=split($i,a,";"); v=substr(a[1],1,3);
for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
v=v"42"
printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
(i==NF? ORS:OFS); v=""
' OFS=',' Test.csv
FPAT='"[^"]+"|[^",]+|,,'- complex regex pattern defining field valueif ($i~/^".*;./) ...- if the current field$icontains;character(s)len=split($i,a,";")- split the field value$iinto arrayaby separator;.lenis assigned with number of elements/chunks createdv=substr(a[1],1,3);- capturing the first chunk of the needed length including leading"char, for ex."ABwill be extracted from"ABCD,EFfor (j=2;j<=len;j++) ...- iterating through remaining chunks/itemsv=v"42"- add trailing double quote"to the processed sequencev.43is ASCII octal code representing the double quote char".($i~/^,,/? (i==NF? ",":""):$i )- each empty field,,is recreated with single comma,and common delimiter (also,). This is to avoid redundant comma cluttering like"pav",,,(i==NF? ORS:OFS)- on encountering the last fieldi==NF- print output record separatorORS, otherwise - print output filed separatorOFS
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
Complex GNU AWK solution (parsing csv data):
awk -v FPAT='"[^"]+"|[^",]+|,,' '
for (i=1;i<=NF;i++)
if ($i~/^".*;./)
len=split($i,a,";"); v=substr(a[1],1,3);
for (j=2;j<=len;j++) v= v";"substr(a[j],1,2);
v=v"42"
printf "%s%s",(v? v: ($i~/^,,/? (i==NF? ",":""):$i )),
(i==NF? ORS:OFS); v=""
' OFS=',' Test.csv
FPAT='"[^"]+"|[^",]+|,,'- complex regex pattern defining field valueif ($i~/^".*;./) ...- if the current field$icontains;character(s)len=split($i,a,";")- split the field value$iinto arrayaby separator;.lenis assigned with number of elements/chunks createdv=substr(a[1],1,3);- capturing the first chunk of the needed length including leading"char, for ex."ABwill be extracted from"ABCD,EFfor (j=2;j<=len;j++) ...- iterating through remaining chunks/itemsv=v"42"- add trailing double quote"to the processed sequencev.43is ASCII octal code representing the double quote char".($i~/^,,/? (i==NF? ",":""):$i )- each empty field,,is recreated with single comma,and common delimiter (also,). This is to avoid redundant comma cluttering like"pav",,,(i==NF? ORS:OFS)- on encountering the last fieldi==NF- print output record separatorORS, otherwise - print output filed separatorOFS
The output:
"pav",12345,"AB;xy;15",,
"xyz",,"C4;x2;rt",,
edited Oct 22 '17 at 10:38
answered Oct 22 '17 at 7:45
RomanPerekhrest
22.5k12145
22.5k12145
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
add a comment |Â
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
Thanks Roman, Works perfect. Could you help me in understanding bit better, like the 042 and arrays look bit complex.
â Pavan
Oct 22 '17 at 8:25
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
@Pavan, welcome, see my explanation
â RomanPerekhrest
Oct 22 '17 at 10:30
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399655%2fcut-the-substrings-to-a-specific-length-in-a-csv-file%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Only 5 columns ?and only third column should be replaced?or it's can appear in different columns too?
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:51
@ñÃÂsýù÷ The data has 5 columns, with the 4th and 5th being empty, it seems.
â Kusalananda
Oct 22 '17 at 7:52
ah, yes, edited my comment
â Ã±ÃÂsýù÷
Oct 22 '17 at 7:53