Separating runon text in script
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have the following csv input:
XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854
I would like it to output the following:
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
(FirstName LastName PhoneNumber UserID@Email State Zip
)
This is what I have so far
awk -F "," ' print $1, $4, $3, $6' data3
I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?
I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?
shell-script awk
add a comment |Â
up vote
0
down vote
favorite
I have the following csv input:
XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854
I would like it to output the following:
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
(FirstName LastName PhoneNumber UserID@Email State Zip
)
This is what I have so far
awk -F "," ' print $1, $4, $3, $6' data3
I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?
I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?
shell-script awk
1
Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
â Jesse_b
Mar 22 at 23:04
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have the following csv input:
XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854
I would like it to output the following:
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
(FirstName LastName PhoneNumber UserID@Email State Zip
)
This is what I have so far
awk -F "," ' print $1, $4, $3, $6' data3
I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?
I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?
shell-script awk
I have the following csv input:
XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854
I would like it to output the following:
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
(FirstName LastName PhoneNumber UserID@Email State Zip
)
This is what I have so far
awk -F "," ' print $1, $4, $3, $6' data3
I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?
I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?
shell-script awk
edited Mar 23 at 1:53
user1404316
2,314520
2,314520
asked Mar 22 at 22:46
kittensfurdays
133
133
1
Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
â Jesse_b
Mar 22 at 23:04
add a comment |Â
1
Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
â Jesse_b
Mar 22 at 23:04
1
1
Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
â Jesse_b
Mar 22 at 23:04
Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
â Jesse_b
Mar 22 at 23:04
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
I see that user steeldriver's answer has already been accepted, but I thought to offer what I think is a shorter, simpler, and easier to read option. At the very least, it demonstrates some other features of awk (and the OP can always change his/her mind):
awk '
gsub(","," ")
$0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
$0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
print
' file.csv
add a comment |Â
up vote
0
down vote
At least with gawk
(GNU awk) and mawk
, you could use the match
function to find the index of a lowercase-uppercase or uppercase-digit transition, and then use substr
to cut-and-shut the string:
awk -F, '
c = match($1,/[a-z][A-Z]/)
c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
c = match($4,/[A-Z][0-9]/)
c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
1' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
If your $4
really is a US zipcode, then AFAIK the format is fixed and you could skip the second match
and just do
awk -F, '
c = match($1,/[a-z][A-Z]/)
c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
$4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
1' file.csv
It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:
perl -F, -ne '
print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
I see that user steeldriver's answer has already been accepted, but I thought to offer what I think is a shorter, simpler, and easier to read option. At the very least, it demonstrates some other features of awk (and the OP can always change his/her mind):
awk '
gsub(","," ")
$0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
$0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
print
' file.csv
add a comment |Â
up vote
0
down vote
accepted
I see that user steeldriver's answer has already been accepted, but I thought to offer what I think is a shorter, simpler, and easier to read option. At the very least, it demonstrates some other features of awk (and the OP can always change his/her mind):
awk '
gsub(","," ")
$0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
$0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
print
' file.csv
add a comment |Â
up vote
0
down vote
accepted
up vote
0
down vote
accepted
I see that user steeldriver's answer has already been accepted, but I thought to offer what I think is a shorter, simpler, and easier to read option. At the very least, it demonstrates some other features of awk (and the OP can always change his/her mind):
awk '
gsub(","," ")
$0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
$0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
print
' file.csv
I see that user steeldriver's answer has already been accepted, but I thought to offer what I think is a shorter, simpler, and easier to read option. At the very least, it demonstrates some other features of awk (and the OP can always change his/her mind):
awk '
gsub(","," ")
$0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
$0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
print
' file.csv
answered Mar 23 at 1:51
user1404316
2,314520
2,314520
add a comment |Â
add a comment |Â
up vote
0
down vote
At least with gawk
(GNU awk) and mawk
, you could use the match
function to find the index of a lowercase-uppercase or uppercase-digit transition, and then use substr
to cut-and-shut the string:
awk -F, '
c = match($1,/[a-z][A-Z]/)
c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
c = match($4,/[A-Z][0-9]/)
c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
1' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
If your $4
really is a US zipcode, then AFAIK the format is fixed and you could skip the second match
and just do
awk -F, '
c = match($1,/[a-z][A-Z]/)
c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
$4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
1' file.csv
It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:
perl -F, -ne '
print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
add a comment |Â
up vote
0
down vote
At least with gawk
(GNU awk) and mawk
, you could use the match
function to find the index of a lowercase-uppercase or uppercase-digit transition, and then use substr
to cut-and-shut the string:
awk -F, '
c = match($1,/[a-z][A-Z]/)
c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
c = match($4,/[A-Z][0-9]/)
c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
1' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
If your $4
really is a US zipcode, then AFAIK the format is fixed and you could skip the second match
and just do
awk -F, '
c = match($1,/[a-z][A-Z]/)
c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
$4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
1' file.csv
It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:
perl -F, -ne '
print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
add a comment |Â
up vote
0
down vote
up vote
0
down vote
At least with gawk
(GNU awk) and mawk
, you could use the match
function to find the index of a lowercase-uppercase or uppercase-digit transition, and then use substr
to cut-and-shut the string:
awk -F, '
c = match($1,/[a-z][A-Z]/)
c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
c = match($4,/[A-Z][0-9]/)
c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
1' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
If your $4
really is a US zipcode, then AFAIK the format is fixed and you could skip the second match
and just do
awk -F, '
c = match($1,/[a-z][A-Z]/)
c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
$4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
1' file.csv
It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:
perl -F, -ne '
print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
At least with gawk
(GNU awk) and mawk
, you could use the match
function to find the index of a lowercase-uppercase or uppercase-digit transition, and then use substr
to cut-and-shut the string:
awk -F, '
c = match($1,/[a-z][A-Z]/)
c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
c = match($4,/[A-Z][0-9]/)
c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
1' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
If your $4
really is a US zipcode, then AFAIK the format is fixed and you could skip the second match
and just do
awk -F, '
c = match($1,/[a-z][A-Z]/)
c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
$4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
1' file.csv
It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:
perl -F, -ne '
print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854
answered Mar 23 at 0:50
steeldriver
31.5k34978
31.5k34978
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f432956%2fseparating-runon-text-in-script%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
â Jesse_b
Mar 22 at 23:04