$1 not working with sed

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have a bunch of files that contain XML tags like:



<h> PIDAT <h> O



I need to delete everything what comes after the first <h> in that line, so I can get this:



<h>



For that I'm using



sed -i -e 's/(^<.*?>).+/$1/' *.conll



But it seems that sed is not recognizing the $1. (As I understand, $1 should delete everything what is not contained in the group). Is there a way I can achieve this? I'd really appreciate if you could point me in the right direction.



PS: I tested those expressions on a regex app and they worked, but it is not working from the command line.







share|improve this question























    up vote
    0
    down vote

    favorite












    I have a bunch of files that contain XML tags like:



    <h> PIDAT <h> O



    I need to delete everything what comes after the first <h> in that line, so I can get this:



    <h>



    For that I'm using



    sed -i -e 's/(^<.*?>).+/$1/' *.conll



    But it seems that sed is not recognizing the $1. (As I understand, $1 should delete everything what is not contained in the group). Is there a way I can achieve this? I'd really appreciate if you could point me in the right direction.



    PS: I tested those expressions on a regex app and they worked, but it is not working from the command line.







    share|improve this question





















      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have a bunch of files that contain XML tags like:



      <h> PIDAT <h> O



      I need to delete everything what comes after the first <h> in that line, so I can get this:



      <h>



      For that I'm using



      sed -i -e 's/(^<.*?>).+/$1/' *.conll



      But it seems that sed is not recognizing the $1. (As I understand, $1 should delete everything what is not contained in the group). Is there a way I can achieve this? I'd really appreciate if you could point me in the right direction.



      PS: I tested those expressions on a regex app and they worked, but it is not working from the command line.







      share|improve this question











      I have a bunch of files that contain XML tags like:



      <h> PIDAT <h> O



      I need to delete everything what comes after the first <h> in that line, so I can get this:



      <h>



      For that I'm using



      sed -i -e 's/(^<.*?>).+/$1/' *.conll



      But it seems that sed is not recognizing the $1. (As I understand, $1 should delete everything what is not contained in the group). Is there a way I can achieve this? I'd really appreciate if you could point me in the right direction.



      PS: I tested those expressions on a regex app and they worked, but it is not working from the command line.









      share|improve this question










      share|improve this question




      share|improve this question









      asked Jul 5 at 5:16









      Carolina Cárdenas

      31




      31




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          sed backreferences have the form 1, 2, etc. $1 is more Perl-like. Also, if using basic regular expressions (BRE), you need to escape the parentheses (...) forming a group, as well as ? and +. Or you can use extended regular expressions with the -E option.



          Note that sed regexes are greedy, so <.*> will match <h> PIDAT <h> in that line, instead of stopping at the first >. And .*? does not make sense (.* already can match nothing, so making it optional via ? is unnecessary).



          This might work:



          sed -i -Ee 's/^(<[^>]*>).*/1/' *.conll


          [^>] matches everything except >, so <[^>]*> will match <h> but not <h> PIDAT <h>.






          share|improve this answer

















          • 1




            Another possibility is that the OP might be thinking that .*? is a non-greedy match. Your solution, of course, has that covered.
            – John1024
            Jul 5 at 6:01






          • 1




            @John1024 ah, yes, I forgot PCRE uses *? for non-greedy match. That makes sense now.
            – muru
            Jul 5 at 6:02










          • Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the (), but that was also given in your answer ;) Anyway, thanks again!!
            – Carolina Cárdenas
            Jul 5 at 6:03






          • 1




            @CarolinaCárdenas did you get that error when using the -E option?
            – muru
            Jul 5 at 6:04










          • @muru yes. Exactly.
            – Carolina Cárdenas
            Jul 5 at 6:35










          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f453527%2f1-not-working-with-sed%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          3
          down vote



          accepted










          sed backreferences have the form 1, 2, etc. $1 is more Perl-like. Also, if using basic regular expressions (BRE), you need to escape the parentheses (...) forming a group, as well as ? and +. Or you can use extended regular expressions with the -E option.



          Note that sed regexes are greedy, so <.*> will match <h> PIDAT <h> in that line, instead of stopping at the first >. And .*? does not make sense (.* already can match nothing, so making it optional via ? is unnecessary).



          This might work:



          sed -i -Ee 's/^(<[^>]*>).*/1/' *.conll


          [^>] matches everything except >, so <[^>]*> will match <h> but not <h> PIDAT <h>.






          share|improve this answer

















          • 1




            Another possibility is that the OP might be thinking that .*? is a non-greedy match. Your solution, of course, has that covered.
            – John1024
            Jul 5 at 6:01






          • 1




            @John1024 ah, yes, I forgot PCRE uses *? for non-greedy match. That makes sense now.
            – muru
            Jul 5 at 6:02










          • Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the (), but that was also given in your answer ;) Anyway, thanks again!!
            – Carolina Cárdenas
            Jul 5 at 6:03






          • 1




            @CarolinaCárdenas did you get that error when using the -E option?
            – muru
            Jul 5 at 6:04










          • @muru yes. Exactly.
            – Carolina Cárdenas
            Jul 5 at 6:35














          up vote
          3
          down vote



          accepted










          sed backreferences have the form 1, 2, etc. $1 is more Perl-like. Also, if using basic regular expressions (BRE), you need to escape the parentheses (...) forming a group, as well as ? and +. Or you can use extended regular expressions with the -E option.



          Note that sed regexes are greedy, so <.*> will match <h> PIDAT <h> in that line, instead of stopping at the first >. And .*? does not make sense (.* already can match nothing, so making it optional via ? is unnecessary).



          This might work:



          sed -i -Ee 's/^(<[^>]*>).*/1/' *.conll


          [^>] matches everything except >, so <[^>]*> will match <h> but not <h> PIDAT <h>.






          share|improve this answer

















          • 1




            Another possibility is that the OP might be thinking that .*? is a non-greedy match. Your solution, of course, has that covered.
            – John1024
            Jul 5 at 6:01






          • 1




            @John1024 ah, yes, I forgot PCRE uses *? for non-greedy match. That makes sense now.
            – muru
            Jul 5 at 6:02










          • Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the (), but that was also given in your answer ;) Anyway, thanks again!!
            – Carolina Cárdenas
            Jul 5 at 6:03






          • 1




            @CarolinaCárdenas did you get that error when using the -E option?
            – muru
            Jul 5 at 6:04










          • @muru yes. Exactly.
            – Carolina Cárdenas
            Jul 5 at 6:35












          up vote
          3
          down vote



          accepted







          up vote
          3
          down vote



          accepted






          sed backreferences have the form 1, 2, etc. $1 is more Perl-like. Also, if using basic regular expressions (BRE), you need to escape the parentheses (...) forming a group, as well as ? and +. Or you can use extended regular expressions with the -E option.



          Note that sed regexes are greedy, so <.*> will match <h> PIDAT <h> in that line, instead of stopping at the first >. And .*? does not make sense (.* already can match nothing, so making it optional via ? is unnecessary).



          This might work:



          sed -i -Ee 's/^(<[^>]*>).*/1/' *.conll


          [^>] matches everything except >, so <[^>]*> will match <h> but not <h> PIDAT <h>.






          share|improve this answer













          sed backreferences have the form 1, 2, etc. $1 is more Perl-like. Also, if using basic regular expressions (BRE), you need to escape the parentheses (...) forming a group, as well as ? and +. Or you can use extended regular expressions with the -E option.



          Note that sed regexes are greedy, so <.*> will match <h> PIDAT <h> in that line, instead of stopping at the first >. And .*? does not make sense (.* already can match nothing, so making it optional via ? is unnecessary).



          This might work:



          sed -i -Ee 's/^(<[^>]*>).*/1/' *.conll


          [^>] matches everything except >, so <[^>]*> will match <h> but not <h> PIDAT <h>.







          share|improve this answer













          share|improve this answer



          share|improve this answer











          answered Jul 5 at 5:27









          muru

          33.1k576139




          33.1k576139







          • 1




            Another possibility is that the OP might be thinking that .*? is a non-greedy match. Your solution, of course, has that covered.
            – John1024
            Jul 5 at 6:01






          • 1




            @John1024 ah, yes, I forgot PCRE uses *? for non-greedy match. That makes sense now.
            – muru
            Jul 5 at 6:02










          • Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the (), but that was also given in your answer ;) Anyway, thanks again!!
            – Carolina Cárdenas
            Jul 5 at 6:03






          • 1




            @CarolinaCárdenas did you get that error when using the -E option?
            – muru
            Jul 5 at 6:04










          • @muru yes. Exactly.
            – Carolina Cárdenas
            Jul 5 at 6:35












          • 1




            Another possibility is that the OP might be thinking that .*? is a non-greedy match. Your solution, of course, has that covered.
            – John1024
            Jul 5 at 6:01






          • 1




            @John1024 ah, yes, I forgot PCRE uses *? for non-greedy match. That makes sense now.
            – muru
            Jul 5 at 6:02










          • Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the (), but that was also given in your answer ;) Anyway, thanks again!!
            – Carolina Cárdenas
            Jul 5 at 6:03






          • 1




            @CarolinaCárdenas did you get that error when using the -E option?
            – muru
            Jul 5 at 6:04










          • @muru yes. Exactly.
            – Carolina Cárdenas
            Jul 5 at 6:35







          1




          1




          Another possibility is that the OP might be thinking that .*? is a non-greedy match. Your solution, of course, has that covered.
          – John1024
          Jul 5 at 6:01




          Another possibility is that the OP might be thinking that .*? is a non-greedy match. Your solution, of course, has that covered.
          – John1024
          Jul 5 at 6:01




          1




          1




          @John1024 ah, yes, I forgot PCRE uses *? for non-greedy match. That makes sense now.
          – muru
          Jul 5 at 6:02




          @John1024 ah, yes, I forgot PCRE uses *? for non-greedy match. That makes sense now.
          – muru
          Jul 5 at 6:02












          Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the (), but that was also given in your answer ;) Anyway, thanks again!!
          – Carolina Cárdenas
          Jul 5 at 6:03




          Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the (), but that was also given in your answer ;) Anyway, thanks again!!
          – Carolina Cárdenas
          Jul 5 at 6:03




          1




          1




          @CarolinaCárdenas did you get that error when using the -E option?
          – muru
          Jul 5 at 6:04




          @CarolinaCárdenas did you get that error when using the -E option?
          – muru
          Jul 5 at 6:04












          @muru yes. Exactly.
          – Carolina Cárdenas
          Jul 5 at 6:35




          @muru yes. Exactly.
          – Carolina Cárdenas
          Jul 5 at 6:35












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f453527%2f1-not-working-with-sed%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          How many registers does an x86_64 CPU actually have?

          Nur Jahan