Separating runon text in script

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have the following csv input:



XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854


I would like it to output the following:



Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854


(FirstName LastName PhoneNumber UserID@Email State Zip)



This is what I have so far



 awk -F "," ' print $1, $4, $3, $6' data3


I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?



I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?







share|improve this question


















  • 1




    Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
    – Jesse_b
    Mar 22 at 23:04















up vote
0
down vote

favorite












I have the following csv input:



XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854


I would like it to output the following:



Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854


(FirstName LastName PhoneNumber UserID@Email State Zip)



This is what I have so far



 awk -F "," ' print $1, $4, $3, $6' data3


I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?



I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?







share|improve this question


















  • 1




    Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
    – Jesse_b
    Mar 22 at 23:04













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have the following csv input:



XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854


I would like it to output the following:



Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854


(FirstName LastName PhoneNumber UserID@Email State Zip)



This is what I have so far



 awk -F "," ' print $1, $4, $3, $6' data3


I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?



I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?







share|improve this question














I have the following csv input:



XiaoLi,6705462234,lxiao@unc.edu,NC764
NatkinPook,8044344528,wnatkin@vcu.edu,VA22345
EliziMoe,5208534566,emoe@ncsu.edu,AZ85282
MaTa,4345667345,mta@yahoo.com,TX91030
DianaCheng,5203456789,dcheng@asu.edu,WY4587
JacksonFive,5206564573,jfive@ncsu.edu,AZ85483
AdiSrikanthReddy,6578904566,sadi1@asu.edu,WS67854


I would like it to output the following:



Xiao Li 6705462234 lxiao@unc.edu NC 764
Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
Ma Ta 4345667345 mta@yahoo.com TX 91030
Diana Cheng 5203456789 dcheng@asu.edu WY 4587
Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854


(FirstName LastName PhoneNumber UserID@Email State Zip)



This is what I have so far



 awk -F "," ' print $1, $4, $3, $6' data3


I am having trouble separating Firstname and Lastname from each other, and State and zipcode are also running together. How would I be able to separate these two cases?



I'm wanting to use awk, is there a way I can use something like [A-Z] to separate them at their uppercase letter?









share|improve this question













share|improve this question




share|improve this question








edited Mar 23 at 1:53









user1404316

2,314520




2,314520










asked Mar 22 at 22:46









kittensfurdays

133




133







  • 1




    Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
    – Jesse_b
    Mar 22 at 23:04













  • 1




    Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
    – Jesse_b
    Mar 22 at 23:04








1




1




Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
– Jesse_b
Mar 22 at 23:04





Some of the "zip codes" in your last column don't appear to be zip codes. Also how should it handle a name like "AdiSrikanthReddy" where there are three capital letters? Also I hope those aren't real phone numbers and email addresses.
– Jesse_b
Mar 22 at 23:04











2 Answers
2






active

oldest

votes

















up vote
0
down vote



accepted










I see that user steeldriver's answer has already been accepted, but I thought to offer what I think is a shorter, simpler, and easier to read option. At the very least, it demonstrates some other features of awk (and the OP can always change his/her mind):



awk '
gsub(","," ")
$0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
$0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
print
' file.csv





share|improve this answer



























    up vote
    0
    down vote













    At least with gawk (GNU awk) and mawk, you could use the match function to find the index of a lowercase-uppercase or uppercase-digit transition, and then use substr to cut-and-shut the string:



    awk -F, '
    c = match($1,/[a-z][A-Z]/)
    c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
    c = match($4,/[A-Z][0-9]/)
    c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
    1' file.csv
    Xiao Li 6705462234 lxiao@unc.edu NC 764
    Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
    Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
    Ma Ta 4345667345 mta@yahoo.com TX 91030
    Diana Cheng 5203456789 dcheng@asu.edu WY 4587
    Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
    Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854


    If your $4 really is a US zipcode, then AFAIK the format is fixed and you could skip the second match and just do



    awk -F, ' 
    c = match($1,/[a-z][A-Z]/)
    c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
    $4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
    1' file.csv


    It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:



    perl -F, -ne '
    print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
    ' file.csv
    Xiao Li 6705462234 lxiao@unc.edu NC 764
    Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
    Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
    Ma Ta 4345667345 mta@yahoo.com TX 91030
    Diana Cheng 5203456789 dcheng@asu.edu WY 4587
    Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
    Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854





    share|improve this answer




















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f432956%2fseparating-runon-text-in-script%23new-answer', 'question_page');

      );

      Post as a guest






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      0
      down vote



      accepted










      I see that user steeldriver's answer has already been accepted, but I thought to offer what I think is a shorter, simpler, and easier to read option. At the very least, it demonstrates some other features of awk (and the OP can always change his/her mind):



      awk '
      gsub(","," ")
      $0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
      $0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
      print
      ' file.csv





      share|improve this answer
























        up vote
        0
        down vote



        accepted










        I see that user steeldriver's answer has already been accepted, but I thought to offer what I think is a shorter, simpler, and easier to read option. At the very least, it demonstrates some other features of awk (and the OP can always change his/her mind):



        awk '
        gsub(","," ")
        $0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
        $0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
        print
        ' file.csv





        share|improve this answer






















          up vote
          0
          down vote



          accepted







          up vote
          0
          down vote



          accepted






          I see that user steeldriver's answer has already been accepted, but I thought to offer what I think is a shorter, simpler, and easier to read option. At the very least, it demonstrates some other features of awk (and the OP can always change his/her mind):



          awk '
          gsub(","," ")
          $0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
          $0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
          print
          ' file.csv





          share|improve this answer












          I see that user steeldriver's answer has already been accepted, but I thought to offer what I think is a shorter, simpler, and easier to read option. At the very least, it demonstrates some other features of awk (and the OP can always change his/her mind):



          awk '
          gsub(","," ")
          $0=gensub("([[:upper:]])([[:digit:]])","\1 \2","g")
          $0=gensub("([[:lower:]])([[:upper:]])","\1 \2","g")
          print
          ' file.csv






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 23 at 1:51









          user1404316

          2,314520




          2,314520






















              up vote
              0
              down vote













              At least with gawk (GNU awk) and mawk, you could use the match function to find the index of a lowercase-uppercase or uppercase-digit transition, and then use substr to cut-and-shut the string:



              awk -F, '
              c = match($1,/[a-z][A-Z]/)
              c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
              c = match($4,/[A-Z][0-9]/)
              c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
              1' file.csv
              Xiao Li 6705462234 lxiao@unc.edu NC 764
              Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
              Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
              Ma Ta 4345667345 mta@yahoo.com TX 91030
              Diana Cheng 5203456789 dcheng@asu.edu WY 4587
              Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
              Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854


              If your $4 really is a US zipcode, then AFAIK the format is fixed and you could skip the second match and just do



              awk -F, ' 
              c = match($1,/[a-z][A-Z]/)
              c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
              $4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
              1' file.csv


              It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:



              perl -F, -ne '
              print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
              ' file.csv
              Xiao Li 6705462234 lxiao@unc.edu NC 764
              Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
              Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
              Ma Ta 4345667345 mta@yahoo.com TX 91030
              Diana Cheng 5203456789 dcheng@asu.edu WY 4587
              Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
              Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854





              share|improve this answer
























                up vote
                0
                down vote













                At least with gawk (GNU awk) and mawk, you could use the match function to find the index of a lowercase-uppercase or uppercase-digit transition, and then use substr to cut-and-shut the string:



                awk -F, '
                c = match($1,/[a-z][A-Z]/)
                c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
                c = match($4,/[A-Z][0-9]/)
                c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
                1' file.csv
                Xiao Li 6705462234 lxiao@unc.edu NC 764
                Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
                Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
                Ma Ta 4345667345 mta@yahoo.com TX 91030
                Diana Cheng 5203456789 dcheng@asu.edu WY 4587
                Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
                Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854


                If your $4 really is a US zipcode, then AFAIK the format is fixed and you could skip the second match and just do



                awk -F, ' 
                c = match($1,/[a-z][A-Z]/)
                c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
                $4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
                1' file.csv


                It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:



                perl -F, -ne '
                print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
                ' file.csv
                Xiao Li 6705462234 lxiao@unc.edu NC 764
                Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
                Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
                Ma Ta 4345667345 mta@yahoo.com TX 91030
                Diana Cheng 5203456789 dcheng@asu.edu WY 4587
                Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
                Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854





                share|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  At least with gawk (GNU awk) and mawk, you could use the match function to find the index of a lowercase-uppercase or uppercase-digit transition, and then use substr to cut-and-shut the string:



                  awk -F, '
                  c = match($1,/[a-z][A-Z]/)
                  c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
                  c = match($4,/[A-Z][0-9]/)
                  c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
                  1' file.csv
                  Xiao Li 6705462234 lxiao@unc.edu NC 764
                  Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
                  Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
                  Ma Ta 4345667345 mta@yahoo.com TX 91030
                  Diana Cheng 5203456789 dcheng@asu.edu WY 4587
                  Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
                  Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854


                  If your $4 really is a US zipcode, then AFAIK the format is fixed and you could skip the second match and just do



                  awk -F, ' 
                  c = match($1,/[a-z][A-Z]/)
                  c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
                  $4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
                  1' file.csv


                  It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:



                  perl -F, -ne '
                  print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
                  ' file.csv
                  Xiao Li 6705462234 lxiao@unc.edu NC 764
                  Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
                  Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
                  Ma Ta 4345667345 mta@yahoo.com TX 91030
                  Diana Cheng 5203456789 dcheng@asu.edu WY 4587
                  Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
                  Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854





                  share|improve this answer












                  At least with gawk (GNU awk) and mawk, you could use the match function to find the index of a lowercase-uppercase or uppercase-digit transition, and then use substr to cut-and-shut the string:



                  awk -F, '
                  c = match($1,/[a-z][A-Z]/)
                  c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
                  c = match($4,/[A-Z][0-9]/)
                  c>0 $4 = sprintf("%s %s", substr($4,1,c), substr($4,c+1))
                  1' file.csv
                  Xiao Li 6705462234 lxiao@unc.edu NC 764
                  Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
                  Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
                  Ma Ta 4345667345 mta@yahoo.com TX 91030
                  Diana Cheng 5203456789 dcheng@asu.edu WY 4587
                  Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
                  Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854


                  If your $4 really is a US zipcode, then AFAIK the format is fixed and you could skip the second match and just do



                  awk -F, ' 
                  c = match($1,/[a-z][A-Z]/)
                  c>0 $1 = sprintf("%s %s", substr($1,1,c), substr($1,c+1))
                  $4 = sprintf("%s %s", substr($4,1,2), substr($4,3))
                  1' file.csv


                  It's a little tidier if you have a regex engine that allows zero-length assertions - such as Perl:



                  perl -F, -ne '
                  print join " ", map s/(?<=[[:lower:]])(?=[[:upper:]]) @F
                  ' file.csv
                  Xiao Li 6705462234 lxiao@unc.edu NC 764
                  Natkin Pook 8044344528 wnatkin@vcu.edu VA 22345
                  Elizi Moe 5208534566 emoe@ncsu.edu AZ 85282
                  Ma Ta 4345667345 mta@yahoo.com TX 91030
                  Diana Cheng 5203456789 dcheng@asu.edu WY 4587
                  Jackson Five 5206564573 jfive@ncsu.edu AZ 85483
                  Adi SrikanthReddy 6578904566 sadi1@asu.edu WS 67854






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Mar 23 at 0:50









                  steeldriver

                  31.5k34978




                  31.5k34978






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f432956%2fseparating-runon-text-in-script%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Bahrain

                      Postfix configuration issue with fips on centos 7; mailgun relay