sed - if condition met, use next pattern
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I have a number of plain text files with similar but slightly different structure I need to extract a particular line from.
This line of text doesn't follow any particular pattern (i.e. its content is always different) and is not always in the same place in the file --- though is usually close to the beginning of the file.
These files are press releases (originally in PDF, converted to text on the fly with pdftotext
), and the line I need to extract is the subject, that I need to use as filename afterwards.
If I just run sed -n '1p'
on these files, extracting the very first line, sometimes I get the result I want, more often not.
A sample of the different results I get:
Title of the press release # correct result
# wrong, here the first line is empty
29.9.2016 # wrong, here the first line contains the date
PRESS RELEASE # also wrong, I would need to scan further down
These are pretty much all of the cases. What gives me hope is that, since these files have very similar structure and contain a title close to the beginning, if I keep scanning down sooner or later I will find what I'm looking for.
Is there any way to tell sed, in the same sed command, to try different patterns until a set of conditions in not met?
In my case I would need to tell sed to:
- check that the line is not empty
- check that the line doesn't contain a date
- check that the line doesn't contain the words "Press Release"
If none of the conditions are met, output the line, if any is met, skip to the next line.
Is this something that sed would be able to do?
shell-script shell sed
add a comment |Â
up vote
1
down vote
favorite
I have a number of plain text files with similar but slightly different structure I need to extract a particular line from.
This line of text doesn't follow any particular pattern (i.e. its content is always different) and is not always in the same place in the file --- though is usually close to the beginning of the file.
These files are press releases (originally in PDF, converted to text on the fly with pdftotext
), and the line I need to extract is the subject, that I need to use as filename afterwards.
If I just run sed -n '1p'
on these files, extracting the very first line, sometimes I get the result I want, more often not.
A sample of the different results I get:
Title of the press release # correct result
# wrong, here the first line is empty
29.9.2016 # wrong, here the first line contains the date
PRESS RELEASE # also wrong, I would need to scan further down
These are pretty much all of the cases. What gives me hope is that, since these files have very similar structure and contain a title close to the beginning, if I keep scanning down sooner or later I will find what I'm looking for.
Is there any way to tell sed, in the same sed command, to try different patterns until a set of conditions in not met?
In my case I would need to tell sed to:
- check that the line is not empty
- check that the line doesn't contain a date
- check that the line doesn't contain the words "Press Release"
If none of the conditions are met, output the line, if any is met, skip to the next line.
Is this something that sed would be able to do?
shell-script shell sed
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a number of plain text files with similar but slightly different structure I need to extract a particular line from.
This line of text doesn't follow any particular pattern (i.e. its content is always different) and is not always in the same place in the file --- though is usually close to the beginning of the file.
These files are press releases (originally in PDF, converted to text on the fly with pdftotext
), and the line I need to extract is the subject, that I need to use as filename afterwards.
If I just run sed -n '1p'
on these files, extracting the very first line, sometimes I get the result I want, more often not.
A sample of the different results I get:
Title of the press release # correct result
# wrong, here the first line is empty
29.9.2016 # wrong, here the first line contains the date
PRESS RELEASE # also wrong, I would need to scan further down
These are pretty much all of the cases. What gives me hope is that, since these files have very similar structure and contain a title close to the beginning, if I keep scanning down sooner or later I will find what I'm looking for.
Is there any way to tell sed, in the same sed command, to try different patterns until a set of conditions in not met?
In my case I would need to tell sed to:
- check that the line is not empty
- check that the line doesn't contain a date
- check that the line doesn't contain the words "Press Release"
If none of the conditions are met, output the line, if any is met, skip to the next line.
Is this something that sed would be able to do?
shell-script shell sed
I have a number of plain text files with similar but slightly different structure I need to extract a particular line from.
This line of text doesn't follow any particular pattern (i.e. its content is always different) and is not always in the same place in the file --- though is usually close to the beginning of the file.
These files are press releases (originally in PDF, converted to text on the fly with pdftotext
), and the line I need to extract is the subject, that I need to use as filename afterwards.
If I just run sed -n '1p'
on these files, extracting the very first line, sometimes I get the result I want, more often not.
A sample of the different results I get:
Title of the press release # correct result
# wrong, here the first line is empty
29.9.2016 # wrong, here the first line contains the date
PRESS RELEASE # also wrong, I would need to scan further down
These are pretty much all of the cases. What gives me hope is that, since these files have very similar structure and contain a title close to the beginning, if I keep scanning down sooner or later I will find what I'm looking for.
Is there any way to tell sed, in the same sed command, to try different patterns until a set of conditions in not met?
In my case I would need to tell sed to:
- check that the line is not empty
- check that the line doesn't contain a date
- check that the line doesn't contain the words "Press Release"
If none of the conditions are met, output the line, if any is met, skip to the next line.
Is this something that sed would be able to do?
shell-script shell sed
shell-script shell sed
edited Aug 7 at 12:54
asked Aug 7 at 12:47
zool
1425
1425
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
Finding the first line with any form of text that is not empty (and does not only contain whitespace), does not contain only digits and dots, and does not contain the string PRESS RELEASE
(capitalized):
sed '/^[[:blank:]]*$/d; /^[0-9.]*$/d; /PRESS RELEASE/d; q' file
If dates can have -
and spaces in them, and if PRESS RELEASE
could also be written press release
, Press Release
or Press release
(or pRESS Release
or some other combination):
sed -E '/^[[:blank:]]*$/d; /^[0-9. -]*$/d; /[Pp](RESS|ress) [Rr](ELEASE|elease)/d; q' file
or with GNU sed
for case insensitive matching of press release
:
sed '/^[[:blank:]]*$/d; /^[0-9. -]*$/d; /press release/Id; q' file
Each time a pattern is triggered, the d
command deletes that line from the input and a new cycle is started with the next line. If no patterns are triggered, then the q
causes the script to exit, but the current line will be printed first.
Thanks, this is very helpful. Is the/[Pp](RESS|ress) [Rr](ELEASE|elease)/d
bit really necessary? Isn't there a flag to tell sed to match in a case insensitive manner?
â zool
Aug 7 at 13:36
1
@zool With GNUsed
, you could use/press release/Id
(that's a capitalI
, lowercased
). Since I don't know whatsed
you are using, I kept to standardsed
constructs.
â Kusalananda
Aug 7 at 13:40
I am indeed on macOS where the sed implementation doesn't support theI
switch, but installedgnu-sed
via homebrew and now I'm good to go. Thanks a lot!
â zool
Aug 7 at 14:25
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Finding the first line with any form of text that is not empty (and does not only contain whitespace), does not contain only digits and dots, and does not contain the string PRESS RELEASE
(capitalized):
sed '/^[[:blank:]]*$/d; /^[0-9.]*$/d; /PRESS RELEASE/d; q' file
If dates can have -
and spaces in them, and if PRESS RELEASE
could also be written press release
, Press Release
or Press release
(or pRESS Release
or some other combination):
sed -E '/^[[:blank:]]*$/d; /^[0-9. -]*$/d; /[Pp](RESS|ress) [Rr](ELEASE|elease)/d; q' file
or with GNU sed
for case insensitive matching of press release
:
sed '/^[[:blank:]]*$/d; /^[0-9. -]*$/d; /press release/Id; q' file
Each time a pattern is triggered, the d
command deletes that line from the input and a new cycle is started with the next line. If no patterns are triggered, then the q
causes the script to exit, but the current line will be printed first.
Thanks, this is very helpful. Is the/[Pp](RESS|ress) [Rr](ELEASE|elease)/d
bit really necessary? Isn't there a flag to tell sed to match in a case insensitive manner?
â zool
Aug 7 at 13:36
1
@zool With GNUsed
, you could use/press release/Id
(that's a capitalI
, lowercased
). Since I don't know whatsed
you are using, I kept to standardsed
constructs.
â Kusalananda
Aug 7 at 13:40
I am indeed on macOS where the sed implementation doesn't support theI
switch, but installedgnu-sed
via homebrew and now I'm good to go. Thanks a lot!
â zool
Aug 7 at 14:25
add a comment |Â
up vote
2
down vote
accepted
Finding the first line with any form of text that is not empty (and does not only contain whitespace), does not contain only digits and dots, and does not contain the string PRESS RELEASE
(capitalized):
sed '/^[[:blank:]]*$/d; /^[0-9.]*$/d; /PRESS RELEASE/d; q' file
If dates can have -
and spaces in them, and if PRESS RELEASE
could also be written press release
, Press Release
or Press release
(or pRESS Release
or some other combination):
sed -E '/^[[:blank:]]*$/d; /^[0-9. -]*$/d; /[Pp](RESS|ress) [Rr](ELEASE|elease)/d; q' file
or with GNU sed
for case insensitive matching of press release
:
sed '/^[[:blank:]]*$/d; /^[0-9. -]*$/d; /press release/Id; q' file
Each time a pattern is triggered, the d
command deletes that line from the input and a new cycle is started with the next line. If no patterns are triggered, then the q
causes the script to exit, but the current line will be printed first.
Thanks, this is very helpful. Is the/[Pp](RESS|ress) [Rr](ELEASE|elease)/d
bit really necessary? Isn't there a flag to tell sed to match in a case insensitive manner?
â zool
Aug 7 at 13:36
1
@zool With GNUsed
, you could use/press release/Id
(that's a capitalI
, lowercased
). Since I don't know whatsed
you are using, I kept to standardsed
constructs.
â Kusalananda
Aug 7 at 13:40
I am indeed on macOS where the sed implementation doesn't support theI
switch, but installedgnu-sed
via homebrew and now I'm good to go. Thanks a lot!
â zool
Aug 7 at 14:25
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Finding the first line with any form of text that is not empty (and does not only contain whitespace), does not contain only digits and dots, and does not contain the string PRESS RELEASE
(capitalized):
sed '/^[[:blank:]]*$/d; /^[0-9.]*$/d; /PRESS RELEASE/d; q' file
If dates can have -
and spaces in them, and if PRESS RELEASE
could also be written press release
, Press Release
or Press release
(or pRESS Release
or some other combination):
sed -E '/^[[:blank:]]*$/d; /^[0-9. -]*$/d; /[Pp](RESS|ress) [Rr](ELEASE|elease)/d; q' file
or with GNU sed
for case insensitive matching of press release
:
sed '/^[[:blank:]]*$/d; /^[0-9. -]*$/d; /press release/Id; q' file
Each time a pattern is triggered, the d
command deletes that line from the input and a new cycle is started with the next line. If no patterns are triggered, then the q
causes the script to exit, but the current line will be printed first.
Finding the first line with any form of text that is not empty (and does not only contain whitespace), does not contain only digits and dots, and does not contain the string PRESS RELEASE
(capitalized):
sed '/^[[:blank:]]*$/d; /^[0-9.]*$/d; /PRESS RELEASE/d; q' file
If dates can have -
and spaces in them, and if PRESS RELEASE
could also be written press release
, Press Release
or Press release
(or pRESS Release
or some other combination):
sed -E '/^[[:blank:]]*$/d; /^[0-9. -]*$/d; /[Pp](RESS|ress) [Rr](ELEASE|elease)/d; q' file
or with GNU sed
for case insensitive matching of press release
:
sed '/^[[:blank:]]*$/d; /^[0-9. -]*$/d; /press release/Id; q' file
Each time a pattern is triggered, the d
command deletes that line from the input and a new cycle is started with the next line. If no patterns are triggered, then the q
causes the script to exit, but the current line will be printed first.
edited Aug 7 at 13:41
answered Aug 7 at 13:00
Kusalananda
106k14209327
106k14209327
Thanks, this is very helpful. Is the/[Pp](RESS|ress) [Rr](ELEASE|elease)/d
bit really necessary? Isn't there a flag to tell sed to match in a case insensitive manner?
â zool
Aug 7 at 13:36
1
@zool With GNUsed
, you could use/press release/Id
(that's a capitalI
, lowercased
). Since I don't know whatsed
you are using, I kept to standardsed
constructs.
â Kusalananda
Aug 7 at 13:40
I am indeed on macOS where the sed implementation doesn't support theI
switch, but installedgnu-sed
via homebrew and now I'm good to go. Thanks a lot!
â zool
Aug 7 at 14:25
add a comment |Â
Thanks, this is very helpful. Is the/[Pp](RESS|ress) [Rr](ELEASE|elease)/d
bit really necessary? Isn't there a flag to tell sed to match in a case insensitive manner?
â zool
Aug 7 at 13:36
1
@zool With GNUsed
, you could use/press release/Id
(that's a capitalI
, lowercased
). Since I don't know whatsed
you are using, I kept to standardsed
constructs.
â Kusalananda
Aug 7 at 13:40
I am indeed on macOS where the sed implementation doesn't support theI
switch, but installedgnu-sed
via homebrew and now I'm good to go. Thanks a lot!
â zool
Aug 7 at 14:25
Thanks, this is very helpful. Is the
/[Pp](RESS|ress) [Rr](ELEASE|elease)/d
bit really necessary? Isn't there a flag to tell sed to match in a case insensitive manner?â zool
Aug 7 at 13:36
Thanks, this is very helpful. Is the
/[Pp](RESS|ress) [Rr](ELEASE|elease)/d
bit really necessary? Isn't there a flag to tell sed to match in a case insensitive manner?â zool
Aug 7 at 13:36
1
1
@zool With GNU
sed
, you could use /press release/Id
(that's a capital I
, lowercase d
). Since I don't know what sed
you are using, I kept to standard sed
constructs.â Kusalananda
Aug 7 at 13:40
@zool With GNU
sed
, you could use /press release/Id
(that's a capital I
, lowercase d
). Since I don't know what sed
you are using, I kept to standard sed
constructs.â Kusalananda
Aug 7 at 13:40
I am indeed on macOS where the sed implementation doesn't support the
I
switch, but installed gnu-sed
via homebrew and now I'm good to go. Thanks a lot!â zool
Aug 7 at 14:25
I am indeed on macOS where the sed implementation doesn't support the
I
switch, but installed gnu-sed
via homebrew and now I'm good to go. Thanks a lot!â zool
Aug 7 at 14:25
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461062%2fsed-if-condition-met-use-next-pattern%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password