Extract a number in each line and add another number to it

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












I have a file whose content is



(bookmarks
("Cover"
"#01.djvu" )
("Title page"
"#all_24223_to_00243.cpc0002.djvu" )
("Preface"
"#all_24223_to_00243.cpc0004.djvu" )
...


I want to change its content to be



(bookmarks
("Cover"
"#2" )
("Title page"
"#3" )
("Preface"
"#5" )
...


by keeping the number just before .djvu, removing leading zeros, and adding one to it. I wonder how you would do that using awk?



Thanks.







share|improve this question




















  • Thanks. How does perl handle the testing for whether a line matches the pattern (if no, print the line, and if yes, replace, add and print)?
    – Tim
    Mar 16 at 2:15











  • Thanks. I still don't know how -p make it print the lines without match, and print the lines with match after replacement. That is, why is there no explicit testing each line if it has any match. In awk, I remember such testing is needed.
    – Tim
    Mar 16 at 2:40











  • Additionally, if I store a number in a shell variable, and want to add it to the extracted number of each line, how can I pass the shell variable into the perl command to be added to $1?
    – Tim
    Mar 16 at 3:05














up vote
2
down vote

favorite












I have a file whose content is



(bookmarks
("Cover"
"#01.djvu" )
("Title page"
"#all_24223_to_00243.cpc0002.djvu" )
("Preface"
"#all_24223_to_00243.cpc0004.djvu" )
...


I want to change its content to be



(bookmarks
("Cover"
"#2" )
("Title page"
"#3" )
("Preface"
"#5" )
...


by keeping the number just before .djvu, removing leading zeros, and adding one to it. I wonder how you would do that using awk?



Thanks.







share|improve this question




















  • Thanks. How does perl handle the testing for whether a line matches the pattern (if no, print the line, and if yes, replace, add and print)?
    – Tim
    Mar 16 at 2:15











  • Thanks. I still don't know how -p make it print the lines without match, and print the lines with match after replacement. That is, why is there no explicit testing each line if it has any match. In awk, I remember such testing is needed.
    – Tim
    Mar 16 at 2:40











  • Additionally, if I store a number in a shell variable, and want to add it to the extracted number of each line, how can I pass the shell variable into the perl command to be added to $1?
    – Tim
    Mar 16 at 3:05












up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have a file whose content is



(bookmarks
("Cover"
"#01.djvu" )
("Title page"
"#all_24223_to_00243.cpc0002.djvu" )
("Preface"
"#all_24223_to_00243.cpc0004.djvu" )
...


I want to change its content to be



(bookmarks
("Cover"
"#2" )
("Title page"
"#3" )
("Preface"
"#5" )
...


by keeping the number just before .djvu, removing leading zeros, and adding one to it. I wonder how you would do that using awk?



Thanks.







share|improve this question












I have a file whose content is



(bookmarks
("Cover"
"#01.djvu" )
("Title page"
"#all_24223_to_00243.cpc0002.djvu" )
("Preface"
"#all_24223_to_00243.cpc0004.djvu" )
...


I want to change its content to be



(bookmarks
("Cover"
"#2" )
("Title page"
"#3" )
("Preface"
"#5" )
...


by keeping the number just before .djvu, removing leading zeros, and adding one to it. I wonder how you would do that using awk?



Thanks.









share|improve this question











share|improve this question




share|improve this question










asked Mar 16 at 1:58









Tim

22.7k64224401




22.7k64224401











  • Thanks. How does perl handle the testing for whether a line matches the pattern (if no, print the line, and if yes, replace, add and print)?
    – Tim
    Mar 16 at 2:15











  • Thanks. I still don't know how -p make it print the lines without match, and print the lines with match after replacement. That is, why is there no explicit testing each line if it has any match. In awk, I remember such testing is needed.
    – Tim
    Mar 16 at 2:40











  • Additionally, if I store a number in a shell variable, and want to add it to the extracted number of each line, how can I pass the shell variable into the perl command to be added to $1?
    – Tim
    Mar 16 at 3:05
















  • Thanks. How does perl handle the testing for whether a line matches the pattern (if no, print the line, and if yes, replace, add and print)?
    – Tim
    Mar 16 at 2:15











  • Thanks. I still don't know how -p make it print the lines without match, and print the lines with match after replacement. That is, why is there no explicit testing each line if it has any match. In awk, I remember such testing is needed.
    – Tim
    Mar 16 at 2:40











  • Additionally, if I store a number in a shell variable, and want to add it to the extracted number of each line, how can I pass the shell variable into the perl command to be added to $1?
    – Tim
    Mar 16 at 3:05















Thanks. How does perl handle the testing for whether a line matches the pattern (if no, print the line, and if yes, replace, add and print)?
– Tim
Mar 16 at 2:15





Thanks. How does perl handle the testing for whether a line matches the pattern (if no, print the line, and if yes, replace, add and print)?
– Tim
Mar 16 at 2:15













Thanks. I still don't know how -p make it print the lines without match, and print the lines with match after replacement. That is, why is there no explicit testing each line if it has any match. In awk, I remember such testing is needed.
– Tim
Mar 16 at 2:40





Thanks. I still don't know how -p make it print the lines without match, and print the lines with match after replacement. That is, why is there no explicit testing each line if it has any match. In awk, I remember such testing is needed.
– Tim
Mar 16 at 2:40













Additionally, if I store a number in a shell variable, and want to add it to the extracted number of each line, how can I pass the shell variable into the perl command to be added to $1?
– Tim
Mar 16 at 3:05




Additionally, if I store a number in a shell variable, and want to add it to the extracted number of each line, how can I pass the shell variable into the perl command to be added to $1?
– Tim
Mar 16 at 3:05










3 Answers
3






active

oldest

votes

















up vote
5
down vote













That's more a job for perl:



perl -pe 's/"#K.*?(d+).djvu(?=")/$1+1/ge' <file


With variable:



INCR=1 perl -pe 's/"#K.*?(d+).djvu(?=")/$1+$ENVINCR/ge' <file


Or:



perl -spe 's/"#K.*?(d+).djvu(?=")/$1+$incr/ge' -- -incr=1 <file





share|improve this answer






















  • could also use positive lookahead to avoid adding the quote back in replacement section..
    – Sundeep
    Mar 16 at 9:17






  • 1




    @Sundeep, true, that makes it a bit shorter and more pleasant to the eye. I've added it in. Thanks.
    – Stéphane Chazelas
    Mar 16 at 9:20


















up vote
1
down vote













Here's a GNU awk solution:



awk '/^ *(/print!/^ *(/split($1,aa,"[0-9]+",bb);printf ""#%s" )n", bb[length(bb)]+1'


or the identical, but spread over a few lines for readability:



awk '/^ *(/ print 
!/^ *(/ split( $1, aa, "[0-9]+", bb )
printf ""#%s" )n", bb[length(bb)]+1 '


  1. /^ * and !/^ *(/ are two address rules, covering lines beginning with optional spaces and an open parenthesis ... and lines that don't.


  2. split( $1, aa, "[0-9]+", bb ) For lines that don't, split the line into two arrays. aa is the line content delimited by the regex "[0-9]+", and bb are the delimiters that matched the regex. The final element of bb is what interests you.


  3. printf ""#%s" )n" formats the output line, waiting for a single variable...


  4. bb[length(bb)]+1 one plus the value of the last element of bb.






share|improve this answer





























    up vote
    1
    down vote













    gawk '
    sub(/#.*.djvu/, "#" $1 + 1 ".djvu")
    print
    ' FPAT='[0-9]+.djvu' input.txt


    The idea is following:



    • extract the .djvu extension with leading numbers from the djvu filename, using the [0-9]+.djvu pattern (FPAT). Example: original filename is #all_24223_to_00243.cpc0002.djvu, the extracted part will be 0002.djvu.

    • substitute the previous djvu filename #.*.djvu to the extracted one, increasing it by 1 previously. Example: take whole line $0 and substitute #all_24223_to_00243.cpc0002.djvu inside it, to the 0002.djvu + 1 (this expression results to the plain number 3, because of how the string to number conversion works in the gawk). Add the # sign and .djvu extension to it. Result: #3.djvu.

    This solution will work only for lines with one djvu filename, as in your sample input.



    Input



    (bookmarks
    ("Cover"
    "#01.djvu" )
    ("Title page"
    "#all_24223_to_00243.cpc0002.djvu" )
    ("Preface"
    "#all_24223_to_00243.cpc0004.djvu" )


    Output



    (bookmarks
    ("Cover"
    "#2.djvu" )
    ("Title page"
    "#3.djvu" )
    ("Preface"
    "#5.djvu" )





    share|improve this answer






















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f430536%2fextract-a-number-in-each-line-and-add-another-number-to-it%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      5
      down vote













      That's more a job for perl:



      perl -pe 's/"#K.*?(d+).djvu(?=")/$1+1/ge' <file


      With variable:



      INCR=1 perl -pe 's/"#K.*?(d+).djvu(?=")/$1+$ENVINCR/ge' <file


      Or:



      perl -spe 's/"#K.*?(d+).djvu(?=")/$1+$incr/ge' -- -incr=1 <file





      share|improve this answer






















      • could also use positive lookahead to avoid adding the quote back in replacement section..
        – Sundeep
        Mar 16 at 9:17






      • 1




        @Sundeep, true, that makes it a bit shorter and more pleasant to the eye. I've added it in. Thanks.
        – Stéphane Chazelas
        Mar 16 at 9:20















      up vote
      5
      down vote













      That's more a job for perl:



      perl -pe 's/"#K.*?(d+).djvu(?=")/$1+1/ge' <file


      With variable:



      INCR=1 perl -pe 's/"#K.*?(d+).djvu(?=")/$1+$ENVINCR/ge' <file


      Or:



      perl -spe 's/"#K.*?(d+).djvu(?=")/$1+$incr/ge' -- -incr=1 <file





      share|improve this answer






















      • could also use positive lookahead to avoid adding the quote back in replacement section..
        – Sundeep
        Mar 16 at 9:17






      • 1




        @Sundeep, true, that makes it a bit shorter and more pleasant to the eye. I've added it in. Thanks.
        – Stéphane Chazelas
        Mar 16 at 9:20













      up vote
      5
      down vote










      up vote
      5
      down vote









      That's more a job for perl:



      perl -pe 's/"#K.*?(d+).djvu(?=")/$1+1/ge' <file


      With variable:



      INCR=1 perl -pe 's/"#K.*?(d+).djvu(?=")/$1+$ENVINCR/ge' <file


      Or:



      perl -spe 's/"#K.*?(d+).djvu(?=")/$1+$incr/ge' -- -incr=1 <file





      share|improve this answer














      That's more a job for perl:



      perl -pe 's/"#K.*?(d+).djvu(?=")/$1+1/ge' <file


      With variable:



      INCR=1 perl -pe 's/"#K.*?(d+).djvu(?=")/$1+$ENVINCR/ge' <file


      Or:



      perl -spe 's/"#K.*?(d+).djvu(?=")/$1+$incr/ge' -- -incr=1 <file






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Mar 16 at 9:24

























      answered Mar 16 at 9:14









      Stéphane Chazelas

      280k53515847




      280k53515847











      • could also use positive lookahead to avoid adding the quote back in replacement section..
        – Sundeep
        Mar 16 at 9:17






      • 1




        @Sundeep, true, that makes it a bit shorter and more pleasant to the eye. I've added it in. Thanks.
        – Stéphane Chazelas
        Mar 16 at 9:20

















      • could also use positive lookahead to avoid adding the quote back in replacement section..
        – Sundeep
        Mar 16 at 9:17






      • 1




        @Sundeep, true, that makes it a bit shorter and more pleasant to the eye. I've added it in. Thanks.
        – Stéphane Chazelas
        Mar 16 at 9:20
















      could also use positive lookahead to avoid adding the quote back in replacement section..
      – Sundeep
      Mar 16 at 9:17




      could also use positive lookahead to avoid adding the quote back in replacement section..
      – Sundeep
      Mar 16 at 9:17




      1




      1




      @Sundeep, true, that makes it a bit shorter and more pleasant to the eye. I've added it in. Thanks.
      – Stéphane Chazelas
      Mar 16 at 9:20





      @Sundeep, true, that makes it a bit shorter and more pleasant to the eye. I've added it in. Thanks.
      – Stéphane Chazelas
      Mar 16 at 9:20













      up vote
      1
      down vote













      Here's a GNU awk solution:



      awk '/^ *(/print!/^ *(/split($1,aa,"[0-9]+",bb);printf ""#%s" )n", bb[length(bb)]+1'


      or the identical, but spread over a few lines for readability:



      awk '/^ *(/ print 
      !/^ *(/ split( $1, aa, "[0-9]+", bb )
      printf ""#%s" )n", bb[length(bb)]+1 '


      1. /^ * and !/^ *(/ are two address rules, covering lines beginning with optional spaces and an open parenthesis ... and lines that don't.


      2. split( $1, aa, "[0-9]+", bb ) For lines that don't, split the line into two arrays. aa is the line content delimited by the regex "[0-9]+", and bb are the delimiters that matched the regex. The final element of bb is what interests you.


      3. printf ""#%s" )n" formats the output line, waiting for a single variable...


      4. bb[length(bb)]+1 one plus the value of the last element of bb.






      share|improve this answer


























        up vote
        1
        down vote













        Here's a GNU awk solution:



        awk '/^ *(/print!/^ *(/split($1,aa,"[0-9]+",bb);printf ""#%s" )n", bb[length(bb)]+1'


        or the identical, but spread over a few lines for readability:



        awk '/^ *(/ print 
        !/^ *(/ split( $1, aa, "[0-9]+", bb )
        printf ""#%s" )n", bb[length(bb)]+1 '


        1. /^ * and !/^ *(/ are two address rules, covering lines beginning with optional spaces and an open parenthesis ... and lines that don't.


        2. split( $1, aa, "[0-9]+", bb ) For lines that don't, split the line into two arrays. aa is the line content delimited by the regex "[0-9]+", and bb are the delimiters that matched the regex. The final element of bb is what interests you.


        3. printf ""#%s" )n" formats the output line, waiting for a single variable...


        4. bb[length(bb)]+1 one plus the value of the last element of bb.






        share|improve this answer
























          up vote
          1
          down vote










          up vote
          1
          down vote









          Here's a GNU awk solution:



          awk '/^ *(/print!/^ *(/split($1,aa,"[0-9]+",bb);printf ""#%s" )n", bb[length(bb)]+1'


          or the identical, but spread over a few lines for readability:



          awk '/^ *(/ print 
          !/^ *(/ split( $1, aa, "[0-9]+", bb )
          printf ""#%s" )n", bb[length(bb)]+1 '


          1. /^ * and !/^ *(/ are two address rules, covering lines beginning with optional spaces and an open parenthesis ... and lines that don't.


          2. split( $1, aa, "[0-9]+", bb ) For lines that don't, split the line into two arrays. aa is the line content delimited by the regex "[0-9]+", and bb are the delimiters that matched the regex. The final element of bb is what interests you.


          3. printf ""#%s" )n" formats the output line, waiting for a single variable...


          4. bb[length(bb)]+1 one plus the value of the last element of bb.






          share|improve this answer














          Here's a GNU awk solution:



          awk '/^ *(/print!/^ *(/split($1,aa,"[0-9]+",bb);printf ""#%s" )n", bb[length(bb)]+1'


          or the identical, but spread over a few lines for readability:



          awk '/^ *(/ print 
          !/^ *(/ split( $1, aa, "[0-9]+", bb )
          printf ""#%s" )n", bb[length(bb)]+1 '


          1. /^ * and !/^ *(/ are two address rules, covering lines beginning with optional spaces and an open parenthesis ... and lines that don't.


          2. split( $1, aa, "[0-9]+", bb ) For lines that don't, split the line into two arrays. aa is the line content delimited by the regex "[0-9]+", and bb are the delimiters that matched the regex. The final element of bb is what interests you.


          3. printf ""#%s" )n" formats the output line, waiting for a single variable...


          4. bb[length(bb)]+1 one plus the value of the last element of bb.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 16 at 13:13









          Stéphane Chazelas

          280k53515847




          280k53515847










          answered Mar 16 at 8:33









          user1404316

          2,314520




          2,314520




















              up vote
              1
              down vote













              gawk '
              sub(/#.*.djvu/, "#" $1 + 1 ".djvu")
              print
              ' FPAT='[0-9]+.djvu' input.txt


              The idea is following:



              • extract the .djvu extension with leading numbers from the djvu filename, using the [0-9]+.djvu pattern (FPAT). Example: original filename is #all_24223_to_00243.cpc0002.djvu, the extracted part will be 0002.djvu.

              • substitute the previous djvu filename #.*.djvu to the extracted one, increasing it by 1 previously. Example: take whole line $0 and substitute #all_24223_to_00243.cpc0002.djvu inside it, to the 0002.djvu + 1 (this expression results to the plain number 3, because of how the string to number conversion works in the gawk). Add the # sign and .djvu extension to it. Result: #3.djvu.

              This solution will work only for lines with one djvu filename, as in your sample input.



              Input



              (bookmarks
              ("Cover"
              "#01.djvu" )
              ("Title page"
              "#all_24223_to_00243.cpc0002.djvu" )
              ("Preface"
              "#all_24223_to_00243.cpc0004.djvu" )


              Output



              (bookmarks
              ("Cover"
              "#2.djvu" )
              ("Title page"
              "#3.djvu" )
              ("Preface"
              "#5.djvu" )





              share|improve this answer


























                up vote
                1
                down vote













                gawk '
                sub(/#.*.djvu/, "#" $1 + 1 ".djvu")
                print
                ' FPAT='[0-9]+.djvu' input.txt


                The idea is following:



                • extract the .djvu extension with leading numbers from the djvu filename, using the [0-9]+.djvu pattern (FPAT). Example: original filename is #all_24223_to_00243.cpc0002.djvu, the extracted part will be 0002.djvu.

                • substitute the previous djvu filename #.*.djvu to the extracted one, increasing it by 1 previously. Example: take whole line $0 and substitute #all_24223_to_00243.cpc0002.djvu inside it, to the 0002.djvu + 1 (this expression results to the plain number 3, because of how the string to number conversion works in the gawk). Add the # sign and .djvu extension to it. Result: #3.djvu.

                This solution will work only for lines with one djvu filename, as in your sample input.



                Input



                (bookmarks
                ("Cover"
                "#01.djvu" )
                ("Title page"
                "#all_24223_to_00243.cpc0002.djvu" )
                ("Preface"
                "#all_24223_to_00243.cpc0004.djvu" )


                Output



                (bookmarks
                ("Cover"
                "#2.djvu" )
                ("Title page"
                "#3.djvu" )
                ("Preface"
                "#5.djvu" )





                share|improve this answer
























                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  gawk '
                  sub(/#.*.djvu/, "#" $1 + 1 ".djvu")
                  print
                  ' FPAT='[0-9]+.djvu' input.txt


                  The idea is following:



                  • extract the .djvu extension with leading numbers from the djvu filename, using the [0-9]+.djvu pattern (FPAT). Example: original filename is #all_24223_to_00243.cpc0002.djvu, the extracted part will be 0002.djvu.

                  • substitute the previous djvu filename #.*.djvu to the extracted one, increasing it by 1 previously. Example: take whole line $0 and substitute #all_24223_to_00243.cpc0002.djvu inside it, to the 0002.djvu + 1 (this expression results to the plain number 3, because of how the string to number conversion works in the gawk). Add the # sign and .djvu extension to it. Result: #3.djvu.

                  This solution will work only for lines with one djvu filename, as in your sample input.



                  Input



                  (bookmarks
                  ("Cover"
                  "#01.djvu" )
                  ("Title page"
                  "#all_24223_to_00243.cpc0002.djvu" )
                  ("Preface"
                  "#all_24223_to_00243.cpc0004.djvu" )


                  Output



                  (bookmarks
                  ("Cover"
                  "#2.djvu" )
                  ("Title page"
                  "#3.djvu" )
                  ("Preface"
                  "#5.djvu" )





                  share|improve this answer














                  gawk '
                  sub(/#.*.djvu/, "#" $1 + 1 ".djvu")
                  print
                  ' FPAT='[0-9]+.djvu' input.txt


                  The idea is following:



                  • extract the .djvu extension with leading numbers from the djvu filename, using the [0-9]+.djvu pattern (FPAT). Example: original filename is #all_24223_to_00243.cpc0002.djvu, the extracted part will be 0002.djvu.

                  • substitute the previous djvu filename #.*.djvu to the extracted one, increasing it by 1 previously. Example: take whole line $0 and substitute #all_24223_to_00243.cpc0002.djvu inside it, to the 0002.djvu + 1 (this expression results to the plain number 3, because of how the string to number conversion works in the gawk). Add the # sign and .djvu extension to it. Result: #3.djvu.

                  This solution will work only for lines with one djvu filename, as in your sample input.



                  Input



                  (bookmarks
                  ("Cover"
                  "#01.djvu" )
                  ("Title page"
                  "#all_24223_to_00243.cpc0002.djvu" )
                  ("Preface"
                  "#all_24223_to_00243.cpc0004.djvu" )


                  Output



                  (bookmarks
                  ("Cover"
                  "#2.djvu" )
                  ("Title page"
                  "#3.djvu" )
                  ("Preface"
                  "#5.djvu" )






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Mar 17 at 10:34

























                  answered Mar 17 at 10:05









                  MiniMax

                  2,681718




                  2,681718






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f430536%2fextract-a-number-in-each-line-and-add-another-number-to-it%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Displaying single band from multi-band raster using QGIS

                      How many registers does an x86_64 CPU actually have?