Separating runon text in script

up vote
0
down vote

favorite

I have the following csv input:

XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854

I would like it to output the following:

Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

(FirstName LastName PhoneNumber UserID@Email State Zip)

This is what I have so far

 awk -F "," ' print $1, $4, $3, $6' data3

I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?

I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?

edited Mar 23 at 1:53

user1404316

2,314520

asked Mar 22 at 22:46

kittensfurdays

133

1

Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
â€“Â Jesse_b
Mar 22 at 23:04

add a commentÂ |Â

up vote
0
down vote

favorite

I have the following csv input:

XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854

I would like it to output the following:

Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

(FirstName LastName PhoneNumber UserID@Email State Zip)

This is what I have so far

 awk -F "," ' print $1, $4, $3, $6' data3

I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?

I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?

edited Mar 23 at 1:53

user1404316

2,314520

asked Mar 22 at 22:46

kittensfurdays

133

1

Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
â€“Â Jesse_b
Mar 22 at 23:04

add a commentÂ |Â

up vote
0
down vote

favorite

I have the following csv input:

XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854

I would like it to output the following:

Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

(FirstName LastName PhoneNumber UserID@Email State Zip)

This is what I have so far

 awk -F "," ' print $1, $4, $3, $6' data3

I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?

I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?

edited Mar 23 at 1:53

user1404316

2,314520

asked Mar 22 at 22:46

kittensfurdays

133

I have the following csv input:

XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854

I would like it to output the following:

Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

(FirstName LastName PhoneNumber UserID@Email State Zip)

This is what I have so far

 awk -F "," ' print $1, $4, $3, $6' data3

I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?

I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?

edited Mar 23 at 1:53

user1404316

2,314520

asked Mar 22 at 22:46

kittensfurdays

133

edited Mar 23 at 1:53

user1404316

2,314520

edited Mar 23 at 1:53

user1404316

2,314520

edited Mar 23 at 1:53

user1404316

2,314520

asked Mar 22 at 22:46

kittensfurdays

133

asked Mar 22 at 22:46

kittensfurdays

133

asked Mar 22 at 22:46

kittensfurdays

133

1

Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
â€“Â Jesse_b
Mar 22 at 23:04

add a commentÂ |Â

1

Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
â€“Â Jesse_b
Mar 22 at 23:04

Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
â€“Â Jesse_b
Mar 22 at 23:04

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

I see that user steeldriver's answer has already been accepted, but I thought to offer what I think is a shorter, simpler, and easier to read option. At the very least, it demonstrates some other features of awk (and the OP can always change his/her mind):

awk '
 gsub(","," ")
 $0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
 $0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
 print
 ' file.csv

answered Mar 23 at 1:51

user1404316

2,314520

add a commentÂ |Â

up vote
0
down vote

At least with gawk (GNU awk) and mawk, you could use the match function to find the index of a lowercase-uppercase or uppercase-digit transition, and then use substr to cut-and-shut the string:

awk -F, '
 c = match($1,/[a-z][A-Z]/) 
 c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
 c = match($4,/[A-Z][0-9]/) 
 c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
 1' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

If your $4 really is a US zipcode, then AFAIK the format is fixed and you could skip the second match and just do

awk -F, ' 
 c = match($1,/[a-z][A-Z]/) 
 c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1)) 
 $4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
 1' file.csv

It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:

perl -F, -ne '
 print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

answered Mar 23 at 0:50

steeldriver

31.5k34978

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f432956%2fseparating-runon-text-in-script%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

awk '
 gsub(","," ")
 $0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
 $0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
 print
 ' file.csv

answered Mar 23 at 1:51

user1404316

2,314520

add a commentÂ |Â

up vote
0
down vote

accepted

awk '
 gsub(","," ")
 $0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
 $0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
 print
 ' file.csv

answered Mar 23 at 1:51

user1404316

2,314520

add a commentÂ |Â

up vote
0
down vote

accepted

awk '
 gsub(","," ")
 $0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
 $0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
 print
 ' file.csv

answered Mar 23 at 1:51

user1404316

2,314520

awk '
 gsub(","," ")
 $0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
 $0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
 print
 ' file.csv

answered Mar 23 at 1:51

user1404316

2,314520

answered Mar 23 at 1:51

user1404316

2,314520

answered Mar 23 at 1:51

user1404316

2,314520

answered Mar 23 at 1:51

user1404316

2,314520

add a commentÂ |Â

up vote
0
down vote

awk -F, '
 c = match($1,/[a-z][A-Z]/) 
 c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
 c = match($4,/[A-Z][0-9]/) 
 c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
 1' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

If your $4 really is a US zipcode, then AFAIK the format is fixed and you could skip the second match and just do

awk -F, ' 
 c = match($1,/[a-z][A-Z]/) 
 c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1)) 
 $4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
 1' file.csv

It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:

perl -F, -ne '
 print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

answered Mar 23 at 0:50

steeldriver

31.5k34978

add a commentÂ |Â

up vote
0
down vote

awk -F, '
 c = match($1,/[a-z][A-Z]/) 
 c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
 c = match($4,/[A-Z][0-9]/) 
 c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
 1' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

If your $4 really is a US zipcode, then AFAIK the format is fixed and you could skip the second match and just do

awk -F, ' 
 c = match($1,/[a-z][A-Z]/) 
 c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1)) 
 $4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
 1' file.csv

It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:

perl -F, -ne '
 print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

answered Mar 23 at 0:50

steeldriver

31.5k34978

add a commentÂ |Â

up vote
0
down vote

awk -F, '
 c = match($1,/[a-z][A-Z]/) 
 c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
 c = match($4,/[A-Z][0-9]/) 
 c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
 1' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

If your $4 really is a US zipcode, then AFAIK the format is fixed and you could skip the second match and just do

awk -F, ' 
 c = match($1,/[a-z][A-Z]/) 
 c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1)) 
 $4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
 1' file.csv

It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:

perl -F, -ne '
 print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

answered Mar 23 at 0:50

steeldriver

31.5k34978

awk -F, '
 c = match($1,/[a-z][A-Z]/) 
 c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
 c = match($4,/[A-Z][0-9]/) 
 c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
 1' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

If your $4 really is a US zipcode, then AFAIK the format is fixed and you could skip the second match and just do

awk -F, ' 
 c = match($1,/[a-z][A-Z]/) 
 c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1)) 
 $4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
 1' file.csv

It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:

perl -F, -ne '
 print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
' file.csv
Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854

answered Mar 23 at 0:50

steeldriver

31.5k34978

answered Mar 23 at 0:50

steeldriver

31.5k34978

answered Mar 23 at 0:50

steeldriver

31.5k34978

answered Mar 23 at 0:50

steeldriver

31.5k34978

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu