Using sed to get specific text from file

Clash Royale CLAN TAG#URR8PPP
up vote
-1
down vote
favorite
Not sure why I'm not getting this. I've been searching and testing my command for a couple hours and I'm not getting anywhere.
The text is:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><result expand="changes,testResults,metadata,logEntries,plan,vcsRevisions,artifacts,comments,labels,jiraIssues" key="EP-ED-JOB1-174" state="Failed" lifeCycleState="Finished" number="174" ....
And I just want to pull out the ' state="Failed" ' part, it could also be ' state="Successful" '
I've tried a million variations of this:
sed '/state=".*"/p' htmlResponse.txt
But paren's, escape slashes etc seem to match the entire chunk of text. What's wrong with my regex?
sed regular-expression
add a comment |Â
up vote
-1
down vote
favorite
Not sure why I'm not getting this. I've been searching and testing my command for a couple hours and I'm not getting anywhere.
The text is:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><result expand="changes,testResults,metadata,logEntries,plan,vcsRevisions,artifacts,comments,labels,jiraIssues" key="EP-ED-JOB1-174" state="Failed" lifeCycleState="Finished" number="174" ....
And I just want to pull out the ' state="Failed" ' part, it could also be ' state="Successful" '
I've tried a million variations of this:
sed '/state=".*"/p' htmlResponse.txt
But paren's, escape slashes etc seem to match the entire chunk of text. What's wrong with my regex?
sed regular-expression
you need to use capture groups around what you want and use substitution to print only those portion.. to avoid greedy issue, in this case you can use[^"]*instead of.*... but really, you should use xml parser instead of regex
â Sundeep
Oct 16 '17 at 15:39
If I dosed -n '/state="[^"]*/p' htmlResponse.htmlit still gives me back everything.
â Justin
Oct 16 '17 at 15:42
Usexmllintinstead. Use the right tools for the right job.
â Valentin B
Oct 16 '17 at 15:48
add a comment |Â
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
Not sure why I'm not getting this. I've been searching and testing my command for a couple hours and I'm not getting anywhere.
The text is:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><result expand="changes,testResults,metadata,logEntries,plan,vcsRevisions,artifacts,comments,labels,jiraIssues" key="EP-ED-JOB1-174" state="Failed" lifeCycleState="Finished" number="174" ....
And I just want to pull out the ' state="Failed" ' part, it could also be ' state="Successful" '
I've tried a million variations of this:
sed '/state=".*"/p' htmlResponse.txt
But paren's, escape slashes etc seem to match the entire chunk of text. What's wrong with my regex?
sed regular-expression
Not sure why I'm not getting this. I've been searching and testing my command for a couple hours and I'm not getting anywhere.
The text is:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><result expand="changes,testResults,metadata,logEntries,plan,vcsRevisions,artifacts,comments,labels,jiraIssues" key="EP-ED-JOB1-174" state="Failed" lifeCycleState="Finished" number="174" ....
And I just want to pull out the ' state="Failed" ' part, it could also be ' state="Successful" '
I've tried a million variations of this:
sed '/state=".*"/p' htmlResponse.txt
But paren's, escape slashes etc seem to match the entire chunk of text. What's wrong with my regex?
sed regular-expression
asked Oct 16 '17 at 15:31
Justin
1013
1013
you need to use capture groups around what you want and use substitution to print only those portion.. to avoid greedy issue, in this case you can use[^"]*instead of.*... but really, you should use xml parser instead of regex
â Sundeep
Oct 16 '17 at 15:39
If I dosed -n '/state="[^"]*/p' htmlResponse.htmlit still gives me back everything.
â Justin
Oct 16 '17 at 15:42
Usexmllintinstead. Use the right tools for the right job.
â Valentin B
Oct 16 '17 at 15:48
add a comment |Â
you need to use capture groups around what you want and use substitution to print only those portion.. to avoid greedy issue, in this case you can use[^"]*instead of.*... but really, you should use xml parser instead of regex
â Sundeep
Oct 16 '17 at 15:39
If I dosed -n '/state="[^"]*/p' htmlResponse.htmlit still gives me back everything.
â Justin
Oct 16 '17 at 15:42
Usexmllintinstead. Use the right tools for the right job.
â Valentin B
Oct 16 '17 at 15:48
you need to use capture groups around what you want and use substitution to print only those portion.. to avoid greedy issue, in this case you can use
[^"]* instead of .*... but really, you should use xml parser instead of regexâ Sundeep
Oct 16 '17 at 15:39
you need to use capture groups around what you want and use substitution to print only those portion.. to avoid greedy issue, in this case you can use
[^"]* instead of .*... but really, you should use xml parser instead of regexâ Sundeep
Oct 16 '17 at 15:39
If I do
sed -n '/state="[^"]*/p' htmlResponse.html it still gives me back everything.â Justin
Oct 16 '17 at 15:42
If I do
sed -n '/state="[^"]*/p' htmlResponse.html it still gives me back everything.â Justin
Oct 16 '17 at 15:42
Use
xmllint instead. Use the right tools for the right job.â Valentin B
Oct 16 '17 at 15:48
Use
xmllint instead. Use the right tools for the right job.â Valentin B
Oct 16 '17 at 15:48
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
2
down vote
accepted
Putting aside the obligatory "you should really be using a proper XML parser because regexes aren't powerful enough to parse XML" comment, I see two problems in your sed line:
".*"will match from the first"to the last, since.matches"- The
sedcommand/.../pprints the whole line if it matches the regex.
Here's two things I'd suggest for quick-and-dirty HTML-scraping shell scripts:
- Use
"[^"]*"to match "quote, any number of non-quote characters, end quote" - It's lots easier to use
grep -oto pull out bits of a file that match a regex
So that would make your command more like:
grep -o 'state="[^"]*"'
Or, if you really must use sed:
sed -n 's/.*(state="[^"]*").*/1/p'
Thanks! I went with grep as the command just looks easier to type and understand.
â Justin
Oct 16 '17 at 16:16
add a comment |Â
up vote
1
down vote
The right way is to use XML parsers like xmlstarlet:
printf 'state="%s"n' $(xmlstarlet sel -t -v "//result/@state" -n htmlResponse.txt)
The output:
state="Failed"
add a comment |Â
up vote
0
down vote
You likely want to match the whole line and print just the matching group:
sed -r 's/.*state="([^"]*)".*/1/' htmlResponse.txt
That actually just pulls out the Failed or Successful (without including the state= part that precedes it), which I suspect is what you want. But if you do need that, you can add it back easily, or use a slightly different regex, as in wwoods's answer.
However, as Sundeep mentions, it is not at all robust to parse HTML (or XML) with a regular expression. It's one thing to use grep or sed to search for things interactively, but if this is part of a script that needs to carry out an important task and actually work, you should parse the XML properly.
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Putting aside the obligatory "you should really be using a proper XML parser because regexes aren't powerful enough to parse XML" comment, I see two problems in your sed line:
".*"will match from the first"to the last, since.matches"- The
sedcommand/.../pprints the whole line if it matches the regex.
Here's two things I'd suggest for quick-and-dirty HTML-scraping shell scripts:
- Use
"[^"]*"to match "quote, any number of non-quote characters, end quote" - It's lots easier to use
grep -oto pull out bits of a file that match a regex
So that would make your command more like:
grep -o 'state="[^"]*"'
Or, if you really must use sed:
sed -n 's/.*(state="[^"]*").*/1/p'
Thanks! I went with grep as the command just looks easier to type and understand.
â Justin
Oct 16 '17 at 16:16
add a comment |Â
up vote
2
down vote
accepted
Putting aside the obligatory "you should really be using a proper XML parser because regexes aren't powerful enough to parse XML" comment, I see two problems in your sed line:
".*"will match from the first"to the last, since.matches"- The
sedcommand/.../pprints the whole line if it matches the regex.
Here's two things I'd suggest for quick-and-dirty HTML-scraping shell scripts:
- Use
"[^"]*"to match "quote, any number of non-quote characters, end quote" - It's lots easier to use
grep -oto pull out bits of a file that match a regex
So that would make your command more like:
grep -o 'state="[^"]*"'
Or, if you really must use sed:
sed -n 's/.*(state="[^"]*").*/1/p'
Thanks! I went with grep as the command just looks easier to type and understand.
â Justin
Oct 16 '17 at 16:16
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Putting aside the obligatory "you should really be using a proper XML parser because regexes aren't powerful enough to parse XML" comment, I see two problems in your sed line:
".*"will match from the first"to the last, since.matches"- The
sedcommand/.../pprints the whole line if it matches the regex.
Here's two things I'd suggest for quick-and-dirty HTML-scraping shell scripts:
- Use
"[^"]*"to match "quote, any number of non-quote characters, end quote" - It's lots easier to use
grep -oto pull out bits of a file that match a regex
So that would make your command more like:
grep -o 'state="[^"]*"'
Or, if you really must use sed:
sed -n 's/.*(state="[^"]*").*/1/p'
Putting aside the obligatory "you should really be using a proper XML parser because regexes aren't powerful enough to parse XML" comment, I see two problems in your sed line:
".*"will match from the first"to the last, since.matches"- The
sedcommand/.../pprints the whole line if it matches the regex.
Here's two things I'd suggest for quick-and-dirty HTML-scraping shell scripts:
- Use
"[^"]*"to match "quote, any number of non-quote characters, end quote" - It's lots easier to use
grep -oto pull out bits of a file that match a regex
So that would make your command more like:
grep -o 'state="[^"]*"'
Or, if you really must use sed:
sed -n 's/.*(state="[^"]*").*/1/p'
edited Oct 16 '17 at 15:46
answered Oct 16 '17 at 15:41
wwoods
98679
98679
Thanks! I went with grep as the command just looks easier to type and understand.
â Justin
Oct 16 '17 at 16:16
add a comment |Â
Thanks! I went with grep as the command just looks easier to type and understand.
â Justin
Oct 16 '17 at 16:16
Thanks! I went with grep as the command just looks easier to type and understand.
â Justin
Oct 16 '17 at 16:16
Thanks! I went with grep as the command just looks easier to type and understand.
â Justin
Oct 16 '17 at 16:16
add a comment |Â
up vote
1
down vote
The right way is to use XML parsers like xmlstarlet:
printf 'state="%s"n' $(xmlstarlet sel -t -v "//result/@state" -n htmlResponse.txt)
The output:
state="Failed"
add a comment |Â
up vote
1
down vote
The right way is to use XML parsers like xmlstarlet:
printf 'state="%s"n' $(xmlstarlet sel -t -v "//result/@state" -n htmlResponse.txt)
The output:
state="Failed"
add a comment |Â
up vote
1
down vote
up vote
1
down vote
The right way is to use XML parsers like xmlstarlet:
printf 'state="%s"n' $(xmlstarlet sel -t -v "//result/@state" -n htmlResponse.txt)
The output:
state="Failed"
The right way is to use XML parsers like xmlstarlet:
printf 'state="%s"n' $(xmlstarlet sel -t -v "//result/@state" -n htmlResponse.txt)
The output:
state="Failed"
answered Oct 16 '17 at 15:59
RomanPerekhrest
22.5k12145
22.5k12145
add a comment |Â
add a comment |Â
up vote
0
down vote
You likely want to match the whole line and print just the matching group:
sed -r 's/.*state="([^"]*)".*/1/' htmlResponse.txt
That actually just pulls out the Failed or Successful (without including the state= part that precedes it), which I suspect is what you want. But if you do need that, you can add it back easily, or use a slightly different regex, as in wwoods's answer.
However, as Sundeep mentions, it is not at all robust to parse HTML (or XML) with a regular expression. It's one thing to use grep or sed to search for things interactively, but if this is part of a script that needs to carry out an important task and actually work, you should parse the XML properly.
add a comment |Â
up vote
0
down vote
You likely want to match the whole line and print just the matching group:
sed -r 's/.*state="([^"]*)".*/1/' htmlResponse.txt
That actually just pulls out the Failed or Successful (without including the state= part that precedes it), which I suspect is what you want. But if you do need that, you can add it back easily, or use a slightly different regex, as in wwoods's answer.
However, as Sundeep mentions, it is not at all robust to parse HTML (or XML) with a regular expression. It's one thing to use grep or sed to search for things interactively, but if this is part of a script that needs to carry out an important task and actually work, you should parse the XML properly.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
You likely want to match the whole line and print just the matching group:
sed -r 's/.*state="([^"]*)".*/1/' htmlResponse.txt
That actually just pulls out the Failed or Successful (without including the state= part that precedes it), which I suspect is what you want. But if you do need that, you can add it back easily, or use a slightly different regex, as in wwoods's answer.
However, as Sundeep mentions, it is not at all robust to parse HTML (or XML) with a regular expression. It's one thing to use grep or sed to search for things interactively, but if this is part of a script that needs to carry out an important task and actually work, you should parse the XML properly.
You likely want to match the whole line and print just the matching group:
sed -r 's/.*state="([^"]*)".*/1/' htmlResponse.txt
That actually just pulls out the Failed or Successful (without including the state= part that precedes it), which I suspect is what you want. But if you do need that, you can add it back easily, or use a slightly different regex, as in wwoods's answer.
However, as Sundeep mentions, it is not at all robust to parse HTML (or XML) with a regular expression. It's one thing to use grep or sed to search for things interactively, but if this is part of a script that needs to carry out an important task and actually work, you should parse the XML properly.
answered Oct 16 '17 at 15:42
Eliah Kagan
3,16221530
3,16221530
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f398439%2fusing-sed-to-get-specific-text-from-file%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
you need to use capture groups around what you want and use substitution to print only those portion.. to avoid greedy issue, in this case you can use
[^"]*instead of.*... but really, you should use xml parser instead of regexâ Sundeep
Oct 16 '17 at 15:39
If I do
sed -n '/state="[^"]*/p' htmlResponse.htmlit still gives me back everything.â Justin
Oct 16 '17 at 15:42
Use
xmllintinstead. Use the right tools for the right job.â Valentin B
Oct 16 '17 at 15:48