$1 not working with sed
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have a bunch of files that contain XML tags like:
<h> PIDAT <h> O
I need to delete everything what comes after the first <h>
in that line, so I can get this:
<h>
For that I'm using
sed -i -e 's/(^<.*?>).+/$1/' *.conll
But it seems that sed is not recognizing the $1
. (As I understand, $1
should delete everything what is not contained in the group). Is there a way I can achieve this? I'd really appreciate if you could point me in the right direction.
PS: I tested those expressions on a regex app and they worked, but it is not working from the command line.
sed regular-expression xml
add a comment |Â
up vote
0
down vote
favorite
I have a bunch of files that contain XML tags like:
<h> PIDAT <h> O
I need to delete everything what comes after the first <h>
in that line, so I can get this:
<h>
For that I'm using
sed -i -e 's/(^<.*?>).+/$1/' *.conll
But it seems that sed is not recognizing the $1
. (As I understand, $1
should delete everything what is not contained in the group). Is there a way I can achieve this? I'd really appreciate if you could point me in the right direction.
PS: I tested those expressions on a regex app and they worked, but it is not working from the command line.
sed regular-expression xml
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a bunch of files that contain XML tags like:
<h> PIDAT <h> O
I need to delete everything what comes after the first <h>
in that line, so I can get this:
<h>
For that I'm using
sed -i -e 's/(^<.*?>).+/$1/' *.conll
But it seems that sed is not recognizing the $1
. (As I understand, $1
should delete everything what is not contained in the group). Is there a way I can achieve this? I'd really appreciate if you could point me in the right direction.
PS: I tested those expressions on a regex app and they worked, but it is not working from the command line.
sed regular-expression xml
I have a bunch of files that contain XML tags like:
<h> PIDAT <h> O
I need to delete everything what comes after the first <h>
in that line, so I can get this:
<h>
For that I'm using
sed -i -e 's/(^<.*?>).+/$1/' *.conll
But it seems that sed is not recognizing the $1
. (As I understand, $1
should delete everything what is not contained in the group). Is there a way I can achieve this? I'd really appreciate if you could point me in the right direction.
PS: I tested those expressions on a regex app and they worked, but it is not working from the command line.
sed regular-expression xml
asked Jul 5 at 5:16
Carolina Cárdenas
31
31
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
sed
backreferences have the form 1
, 2
, etc. $1
is more Perl-like. Also, if using basic regular expressions (BRE), you need to escape the parentheses (...)
forming a group, as well as ?
and +
. Or you can use extended regular expressions with the -E
option.
Note that sed regexes are greedy, so <.*>
will match <h> PIDAT <h>
in that line, instead of stopping at the first >
. And .*?
does not make sense (.*
already can match nothing, so making it optional via ?
is unnecessary).
This might work:
sed -i -Ee 's/^(<[^>]*>).*/1/' *.conll
[^>]
matches everything except >
, so <[^>]*>
will match <h>
but not <h> PIDAT <h>
.
1
Another possibility is that the OP might be thinking that.*?
is a non-greedy match. Your solution, of course, has that covered.
â John1024
Jul 5 at 6:01
1
@John1024 ah, yes, I forgot PCRE uses*?
for non-greedy match. That makes sense now.
â muru
Jul 5 at 6:02
Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the()
, but that was also given in your answer ;) Anyway, thanks again!!
â Carolina Cárdenas
Jul 5 at 6:03
1
@CarolinaCárdenas did you get that error when using the-E
option?
â muru
Jul 5 at 6:04
@muru yes. Exactly.
â Carolina Cárdenas
Jul 5 at 6:35
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
sed
backreferences have the form 1
, 2
, etc. $1
is more Perl-like. Also, if using basic regular expressions (BRE), you need to escape the parentheses (...)
forming a group, as well as ?
and +
. Or you can use extended regular expressions with the -E
option.
Note that sed regexes are greedy, so <.*>
will match <h> PIDAT <h>
in that line, instead of stopping at the first >
. And .*?
does not make sense (.*
already can match nothing, so making it optional via ?
is unnecessary).
This might work:
sed -i -Ee 's/^(<[^>]*>).*/1/' *.conll
[^>]
matches everything except >
, so <[^>]*>
will match <h>
but not <h> PIDAT <h>
.
1
Another possibility is that the OP might be thinking that.*?
is a non-greedy match. Your solution, of course, has that covered.
â John1024
Jul 5 at 6:01
1
@John1024 ah, yes, I forgot PCRE uses*?
for non-greedy match. That makes sense now.
â muru
Jul 5 at 6:02
Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the()
, but that was also given in your answer ;) Anyway, thanks again!!
â Carolina Cárdenas
Jul 5 at 6:03
1
@CarolinaCárdenas did you get that error when using the-E
option?
â muru
Jul 5 at 6:04
@muru yes. Exactly.
â Carolina Cárdenas
Jul 5 at 6:35
add a comment |Â
up vote
3
down vote
accepted
sed
backreferences have the form 1
, 2
, etc. $1
is more Perl-like. Also, if using basic regular expressions (BRE), you need to escape the parentheses (...)
forming a group, as well as ?
and +
. Or you can use extended regular expressions with the -E
option.
Note that sed regexes are greedy, so <.*>
will match <h> PIDAT <h>
in that line, instead of stopping at the first >
. And .*?
does not make sense (.*
already can match nothing, so making it optional via ?
is unnecessary).
This might work:
sed -i -Ee 's/^(<[^>]*>).*/1/' *.conll
[^>]
matches everything except >
, so <[^>]*>
will match <h>
but not <h> PIDAT <h>
.
1
Another possibility is that the OP might be thinking that.*?
is a non-greedy match. Your solution, of course, has that covered.
â John1024
Jul 5 at 6:01
1
@John1024 ah, yes, I forgot PCRE uses*?
for non-greedy match. That makes sense now.
â muru
Jul 5 at 6:02
Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the()
, but that was also given in your answer ;) Anyway, thanks again!!
â Carolina Cárdenas
Jul 5 at 6:03
1
@CarolinaCárdenas did you get that error when using the-E
option?
â muru
Jul 5 at 6:04
@muru yes. Exactly.
â Carolina Cárdenas
Jul 5 at 6:35
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
sed
backreferences have the form 1
, 2
, etc. $1
is more Perl-like. Also, if using basic regular expressions (BRE), you need to escape the parentheses (...)
forming a group, as well as ?
and +
. Or you can use extended regular expressions with the -E
option.
Note that sed regexes are greedy, so <.*>
will match <h> PIDAT <h>
in that line, instead of stopping at the first >
. And .*?
does not make sense (.*
already can match nothing, so making it optional via ?
is unnecessary).
This might work:
sed -i -Ee 's/^(<[^>]*>).*/1/' *.conll
[^>]
matches everything except >
, so <[^>]*>
will match <h>
but not <h> PIDAT <h>
.
sed
backreferences have the form 1
, 2
, etc. $1
is more Perl-like. Also, if using basic regular expressions (BRE), you need to escape the parentheses (...)
forming a group, as well as ?
and +
. Or you can use extended regular expressions with the -E
option.
Note that sed regexes are greedy, so <.*>
will match <h> PIDAT <h>
in that line, instead of stopping at the first >
. And .*?
does not make sense (.*
already can match nothing, so making it optional via ?
is unnecessary).
This might work:
sed -i -Ee 's/^(<[^>]*>).*/1/' *.conll
[^>]
matches everything except >
, so <[^>]*>
will match <h>
but not <h> PIDAT <h>
.
answered Jul 5 at 5:27
muru
33.1k576139
33.1k576139
1
Another possibility is that the OP might be thinking that.*?
is a non-greedy match. Your solution, of course, has that covered.
â John1024
Jul 5 at 6:01
1
@John1024 ah, yes, I forgot PCRE uses*?
for non-greedy match. That makes sense now.
â muru
Jul 5 at 6:02
Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the()
, but that was also given in your answer ;) Anyway, thanks again!!
â Carolina Cárdenas
Jul 5 at 6:03
1
@CarolinaCárdenas did you get that error when using the-E
option?
â muru
Jul 5 at 6:04
@muru yes. Exactly.
â Carolina Cárdenas
Jul 5 at 6:35
add a comment |Â
1
Another possibility is that the OP might be thinking that.*?
is a non-greedy match. Your solution, of course, has that covered.
â John1024
Jul 5 at 6:01
1
@John1024 ah, yes, I forgot PCRE uses*?
for non-greedy match. That makes sense now.
â muru
Jul 5 at 6:02
Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the()
, but that was also given in your answer ;) Anyway, thanks again!!
â Carolina Cárdenas
Jul 5 at 6:03
1
@CarolinaCárdenas did you get that error when using the-E
option?
â muru
Jul 5 at 6:04
@muru yes. Exactly.
â Carolina Cárdenas
Jul 5 at 6:35
1
1
Another possibility is that the OP might be thinking that
.*?
is a non-greedy match. Your solution, of course, has that covered.â John1024
Jul 5 at 6:01
Another possibility is that the OP might be thinking that
.*?
is a non-greedy match. Your solution, of course, has that covered.â John1024
Jul 5 at 6:01
1
1
@John1024 ah, yes, I forgot PCRE uses
*?
for non-greedy match. That makes sense now.â muru
Jul 5 at 6:02
@John1024 ah, yes, I forgot PCRE uses
*?
for non-greedy match. That makes sense now.â muru
Jul 5 at 6:02
Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the
()
, but that was also given in your answer ;) Anyway, thanks again!!â Carolina Cárdenas
Jul 5 at 6:03
Impressive. Thank you for your explanation! I got an error with your line of code: "1 not defined in the RE", but after a quick Google search, I realized it was because it didn't escape the
()
, but that was also given in your answer ;) Anyway, thanks again!!â Carolina Cárdenas
Jul 5 at 6:03
1
1
@CarolinaCárdenas did you get that error when using the
-E
option?â muru
Jul 5 at 6:04
@CarolinaCárdenas did you get that error when using the
-E
option?â muru
Jul 5 at 6:04
@muru yes. Exactly.
â Carolina Cárdenas
Jul 5 at 6:35
@muru yes. Exactly.
â Carolina Cárdenas
Jul 5 at 6:35
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f453527%2f1-not-working-with-sed%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password