extract text between 2 different matches

 Clash Royale CLAN TAG#URR8PPP
Clash Royale CLAN TAG#URR8PPP
up vote
6
down vote
favorite
I am trying to extract text between specific first match(_ and -). for example, I need to get number 5 from below:
MQSeriesRuntime_5-U200491-7.5.0-4.x86_64
I tried awk field seperator (awk -F) but thats getting me the entire text after _.
linux awk sed
add a comment |
up vote
6
down vote
favorite
I am trying to extract text between specific first match(_ and -). for example, I need to get number 5 from below:
MQSeriesRuntime_5-U200491-7.5.0-4.x86_64
I tried awk field seperator (awk -F) but thats getting me the entire text after _.
linux awk sed
add a comment |
up vote
6
down vote
favorite
up vote
6
down vote
favorite
I am trying to extract text between specific first match(_ and -). for example, I need to get number 5 from below:
MQSeriesRuntime_5-U200491-7.5.0-4.x86_64
I tried awk field seperator (awk -F) but thats getting me the entire text after _.
linux awk sed
I am trying to extract text between specific first match(_ and -). for example, I need to get number 5 from below:
MQSeriesRuntime_5-U200491-7.5.0-4.x86_64
I tried awk field seperator (awk -F) but thats getting me the entire text after _.
linux awk sed
linux awk sed
edited Nov 20 at 22:32
Rui F Ribeiro
38.2k1475125
38.2k1475125
asked Jul 11 '17 at 17:36
MO12
8129
8129
add a comment |
add a comment |
 6 Answers
 6
 
active
oldest
votes
up vote
8
down vote
You just need to be creative about your field separator:
$ awk 'BEGIN FS="_ print $2' input
5
The trick with FS is that it's not a string; it's a regular expression.
To explain a little more fully as you request below:
An awk script may define a code block named BEGIN which executes before any of the incoming data are processed. 
I use this code block to define the field separator (FS) using a regular expression as either a hyphen (-) or underscore (_). 
The next code block, print $2, will print the second field (i. e. the second string of characters as delimited by the heretofore separator, /-|_/), which is the 5 which you seek. A code block with no prefix will execute for every record which is read by awk.
 
 
 
 
 
 
 That worked, thanks much! can you please explain the BEGIN and part please
 – MO12
 Jul 11 '17 at 17:43
 
 
 
 
 
 
 1
 
 
 
 
 Or just use the- -Fflag, which is shorter:- awk -F '_|-' 'print $2' input.txt
 – Wildcard
 Jul 11 '17 at 20:57
 
 
 
 
 
 
 1
 
 
 
 
 I keep forgetting- -Fis a thing, because my- awkings have been complex enough to generally need- BEGINblocks anyhow (:
 – DopeGhoti
 Jul 11 '17 at 21:04
 
 
 
 
 
 
 
 
 
 Given the "first matches" requirements this is a really neat solution.
 – Thorbjørn Ravn Andersen
 Jul 12 '17 at 3:24
 
 
 
add a comment |
up vote
4
down vote
By using the -F parameter, can achieve a slightly shorter solution.
$ awk -F'-|_' 'print $2' input
5
add a comment |
up vote
1
down vote
sed alternative approach:
sed 's/^[^_-]*_([^_-]*)-.*/1/' file
5
 
 
 2
 
 
 
 
 You fit two smileys into the regex. Nice.- [^_-]
 – Wildcard
 Jul 11 '17 at 21:09
 
 
 
 
 
 
 
 
 
 @Wildcard, that's funny- [^_-]
 – RomanPerekhrest
 Jul 11 '17 at 21:12
 
 
 
add a comment |
up vote
0
down vote
Python
With use of <<< to redirect the desired string into the stdin of python interpreter and with re.split() we can take out the second item in the resulting list of items being split at the two separators. 
$ python -c 'import re,sys; print(re.split("-|_",sys.stdin.readline())[1])' <<< "MQSeriesRuntime_5-U200491-7.5.0-4.x86_64" 
5 
Alternatively, we could just give the string as command-line argument and operate on sys.argv[1]:
$ python3 -c 'import re,sys; print(re.split("-|_",sys.argv[1])[1])' "MQSeriesRuntime_5-U200491-7.5.0-4.x86_64" 
5
This works with Python 2 and 3. If we want to process a file and extract input from each line in this manner, we can do the following:
$ cat input.txt
MQSeriesRuntime_5-U200491-7.5.0-4.x86_64
MQSeriesRuntime_2-U200491-7.5.0-4.x86_64
MQSeriesRuntime_6-U200491-7.5.0-4.x86_64
$ python3 -c 'import re,sys; print("n".join(map(lambda x: re.split("-|_",x)[1], sys.stdin.readlines())))' < input.txt 
5
2
6
add a comment |
up vote
0
down vote
Inspired by https://stackoverflow.com/a/2957781/53897:
echo MQSeriesRuntime_5-U200491-7.5.0-4.x86_64 | perl -n -e '/_([^-]+)/ && print $1'
add a comment |
up vote
0
down vote
Could use cut with first delimiter _ get the second collumn and then cut it again with - delimiter to get the first collumn.
echo MQSeriesRuntime_5-U200491-7.5.0-4.x86_64 | cut -d"_" -f2 | cut -d"-" -f1 
add a comment |
 6 Answers
 6
 
active
oldest
votes
 6 Answers
 6
 
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
8
down vote
You just need to be creative about your field separator:
$ awk 'BEGIN FS="_ print $2' input
5
The trick with FS is that it's not a string; it's a regular expression.
To explain a little more fully as you request below:
An awk script may define a code block named BEGIN which executes before any of the incoming data are processed. 
I use this code block to define the field separator (FS) using a regular expression as either a hyphen (-) or underscore (_). 
The next code block, print $2, will print the second field (i. e. the second string of characters as delimited by the heretofore separator, /-|_/), which is the 5 which you seek. A code block with no prefix will execute for every record which is read by awk.
 
 
 
 
 
 
 That worked, thanks much! can you please explain the BEGIN and part please
 – MO12
 Jul 11 '17 at 17:43
 
 
 
 
 
 
 1
 
 
 
 
 Or just use the- -Fflag, which is shorter:- awk -F '_|-' 'print $2' input.txt
 – Wildcard
 Jul 11 '17 at 20:57
 
 
 
 
 
 
 1
 
 
 
 
 I keep forgetting- -Fis a thing, because my- awkings have been complex enough to generally need- BEGINblocks anyhow (:
 – DopeGhoti
 Jul 11 '17 at 21:04
 
 
 
 
 
 
 
 
 
 Given the "first matches" requirements this is a really neat solution.
 – Thorbjørn Ravn Andersen
 Jul 12 '17 at 3:24
 
 
 
add a comment |
up vote
8
down vote
You just need to be creative about your field separator:
$ awk 'BEGIN FS="_ print $2' input
5
The trick with FS is that it's not a string; it's a regular expression.
To explain a little more fully as you request below:
An awk script may define a code block named BEGIN which executes before any of the incoming data are processed. 
I use this code block to define the field separator (FS) using a regular expression as either a hyphen (-) or underscore (_). 
The next code block, print $2, will print the second field (i. e. the second string of characters as delimited by the heretofore separator, /-|_/), which is the 5 which you seek. A code block with no prefix will execute for every record which is read by awk.
 
 
 
 
 
 
 That worked, thanks much! can you please explain the BEGIN and part please
 – MO12
 Jul 11 '17 at 17:43
 
 
 
 
 
 
 1
 
 
 
 
 Or just use the- -Fflag, which is shorter:- awk -F '_|-' 'print $2' input.txt
 – Wildcard
 Jul 11 '17 at 20:57
 
 
 
 
 
 
 1
 
 
 
 
 I keep forgetting- -Fis a thing, because my- awkings have been complex enough to generally need- BEGINblocks anyhow (:
 – DopeGhoti
 Jul 11 '17 at 21:04
 
 
 
 
 
 
 
 
 
 Given the "first matches" requirements this is a really neat solution.
 – Thorbjørn Ravn Andersen
 Jul 12 '17 at 3:24
 
 
 
add a comment |
up vote
8
down vote
up vote
8
down vote
You just need to be creative about your field separator:
$ awk 'BEGIN FS="_ print $2' input
5
The trick with FS is that it's not a string; it's a regular expression.
To explain a little more fully as you request below:
An awk script may define a code block named BEGIN which executes before any of the incoming data are processed. 
I use this code block to define the field separator (FS) using a regular expression as either a hyphen (-) or underscore (_). 
The next code block, print $2, will print the second field (i. e. the second string of characters as delimited by the heretofore separator, /-|_/), which is the 5 which you seek. A code block with no prefix will execute for every record which is read by awk.
You just need to be creative about your field separator:
$ awk 'BEGIN FS="_ print $2' input
5
The trick with FS is that it's not a string; it's a regular expression.
To explain a little more fully as you request below:
An awk script may define a code block named BEGIN which executes before any of the incoming data are processed. 
I use this code block to define the field separator (FS) using a regular expression as either a hyphen (-) or underscore (_). 
The next code block, print $2, will print the second field (i. e. the second string of characters as delimited by the heretofore separator, /-|_/), which is the 5 which you seek. A code block with no prefix will execute for every record which is read by awk.
edited Jul 11 '17 at 17:56
answered Jul 11 '17 at 17:42
DopeGhoti
42.7k55181
42.7k55181
 
 
 
 
 
 
 That worked, thanks much! can you please explain the BEGIN and part please
 – MO12
 Jul 11 '17 at 17:43
 
 
 
 
 
 
 1
 
 
 
 
 Or just use the- -Fflag, which is shorter:- awk -F '_|-' 'print $2' input.txt
 – Wildcard
 Jul 11 '17 at 20:57
 
 
 
 
 
 
 1
 
 
 
 
 I keep forgetting- -Fis a thing, because my- awkings have been complex enough to generally need- BEGINblocks anyhow (:
 – DopeGhoti
 Jul 11 '17 at 21:04
 
 
 
 
 
 
 
 
 
 Given the "first matches" requirements this is a really neat solution.
 – Thorbjørn Ravn Andersen
 Jul 12 '17 at 3:24
 
 
 
add a comment |
 
 
 
 
 
 
 That worked, thanks much! can you please explain the BEGIN and part please
 – MO12
 Jul 11 '17 at 17:43
 
 
 
 
 
 
 1
 
 
 
 
 Or just use the- -Fflag, which is shorter:- awk -F '_|-' 'print $2' input.txt
 – Wildcard
 Jul 11 '17 at 20:57
 
 
 
 
 
 
 1
 
 
 
 
 I keep forgetting- -Fis a thing, because my- awkings have been complex enough to generally need- BEGINblocks anyhow (:
 – DopeGhoti
 Jul 11 '17 at 21:04
 
 
 
 
 
 
 
 
 
 Given the "first matches" requirements this is a really neat solution.
 – Thorbjørn Ravn Andersen
 Jul 12 '17 at 3:24
 
 
 
That worked, thanks much! can you please explain the BEGIN and part please
– MO12
Jul 11 '17 at 17:43
That worked, thanks much! can you please explain the BEGIN and part please
– MO12
Jul 11 '17 at 17:43
1
1
Or just use the
-F flag, which is shorter: awk -F '_|-' 'print $2' input.txt– Wildcard
Jul 11 '17 at 20:57
Or just use the
-F flag, which is shorter: awk -F '_|-' 'print $2' input.txt– Wildcard
Jul 11 '17 at 20:57
1
1
I keep forgetting
-F is a thing, because my awkings have been complex enough to generally need BEGIN blocks anyhow (:– DopeGhoti
Jul 11 '17 at 21:04
I keep forgetting
-F is a thing, because my awkings have been complex enough to generally need BEGIN blocks anyhow (:– DopeGhoti
Jul 11 '17 at 21:04
Given the "first matches" requirements this is a really neat solution.
– Thorbjørn Ravn Andersen
Jul 12 '17 at 3:24
Given the "first matches" requirements this is a really neat solution.
– Thorbjørn Ravn Andersen
Jul 12 '17 at 3:24
add a comment |
up vote
4
down vote
By using the -F parameter, can achieve a slightly shorter solution.
$ awk -F'-|_' 'print $2' input
5
add a comment |
up vote
4
down vote
By using the -F parameter, can achieve a slightly shorter solution.
$ awk -F'-|_' 'print $2' input
5
add a comment |
up vote
4
down vote
up vote
4
down vote
By using the -F parameter, can achieve a slightly shorter solution.
$ awk -F'-|_' 'print $2' input
5
By using the -F parameter, can achieve a slightly shorter solution.
$ awk -F'-|_' 'print $2' input
5
edited Jul 12 '17 at 3:30


heemayl
34.1k371101
34.1k371101
answered Jul 11 '17 at 19:55


steve
13.8k22452
13.8k22452
add a comment |
add a comment |
up vote
1
down vote
sed alternative approach:
sed 's/^[^_-]*_([^_-]*)-.*/1/' file
5
 
 
 2
 
 
 
 
 You fit two smileys into the regex. Nice.- [^_-]
 – Wildcard
 Jul 11 '17 at 21:09
 
 
 
 
 
 
 
 
 
 @Wildcard, that's funny- [^_-]
 – RomanPerekhrest
 Jul 11 '17 at 21:12
 
 
 
add a comment |
up vote
1
down vote
sed alternative approach:
sed 's/^[^_-]*_([^_-]*)-.*/1/' file
5
 
 
 2
 
 
 
 
 You fit two smileys into the regex. Nice.- [^_-]
 – Wildcard
 Jul 11 '17 at 21:09
 
 
 
 
 
 
 
 
 
 @Wildcard, that's funny- [^_-]
 – RomanPerekhrest
 Jul 11 '17 at 21:12
 
 
 
add a comment |
up vote
1
down vote
up vote
1
down vote
sed alternative approach:
sed 's/^[^_-]*_([^_-]*)-.*/1/' file
5
sed alternative approach:
sed 's/^[^_-]*_([^_-]*)-.*/1/' file
5
answered Jul 11 '17 at 20:54


RomanPerekhrest
22.6k12246
22.6k12246
 
 
 2
 
 
 
 
 You fit two smileys into the regex. Nice.- [^_-]
 – Wildcard
 Jul 11 '17 at 21:09
 
 
 
 
 
 
 
 
 
 @Wildcard, that's funny- [^_-]
 – RomanPerekhrest
 Jul 11 '17 at 21:12
 
 
 
add a comment |
 
 
 2
 
 
 
 
 You fit two smileys into the regex. Nice.- [^_-]
 – Wildcard
 Jul 11 '17 at 21:09
 
 
 
 
 
 
 
 
 
 @Wildcard, that's funny- [^_-]
 – RomanPerekhrest
 Jul 11 '17 at 21:12
 
 
 
2
2
You fit two smileys into the regex. Nice.
[^_-]– Wildcard
Jul 11 '17 at 21:09
You fit two smileys into the regex. Nice.
[^_-]– Wildcard
Jul 11 '17 at 21:09
@Wildcard, that's funny
[^_-]– RomanPerekhrest
Jul 11 '17 at 21:12
@Wildcard, that's funny
[^_-]– RomanPerekhrest
Jul 11 '17 at 21:12
add a comment |
up vote
0
down vote
Python
With use of <<< to redirect the desired string into the stdin of python interpreter and with re.split() we can take out the second item in the resulting list of items being split at the two separators. 
$ python -c 'import re,sys; print(re.split("-|_",sys.stdin.readline())[1])' <<< "MQSeriesRuntime_5-U200491-7.5.0-4.x86_64" 
5 
Alternatively, we could just give the string as command-line argument and operate on sys.argv[1]:
$ python3 -c 'import re,sys; print(re.split("-|_",sys.argv[1])[1])' "MQSeriesRuntime_5-U200491-7.5.0-4.x86_64" 
5
This works with Python 2 and 3. If we want to process a file and extract input from each line in this manner, we can do the following:
$ cat input.txt
MQSeriesRuntime_5-U200491-7.5.0-4.x86_64
MQSeriesRuntime_2-U200491-7.5.0-4.x86_64
MQSeriesRuntime_6-U200491-7.5.0-4.x86_64
$ python3 -c 'import re,sys; print("n".join(map(lambda x: re.split("-|_",x)[1], sys.stdin.readlines())))' < input.txt 
5
2
6
add a comment |
up vote
0
down vote
Python
With use of <<< to redirect the desired string into the stdin of python interpreter and with re.split() we can take out the second item in the resulting list of items being split at the two separators. 
$ python -c 'import re,sys; print(re.split("-|_",sys.stdin.readline())[1])' <<< "MQSeriesRuntime_5-U200491-7.5.0-4.x86_64" 
5 
Alternatively, we could just give the string as command-line argument and operate on sys.argv[1]:
$ python3 -c 'import re,sys; print(re.split("-|_",sys.argv[1])[1])' "MQSeriesRuntime_5-U200491-7.5.0-4.x86_64" 
5
This works with Python 2 and 3. If we want to process a file and extract input from each line in this manner, we can do the following:
$ cat input.txt
MQSeriesRuntime_5-U200491-7.5.0-4.x86_64
MQSeriesRuntime_2-U200491-7.5.0-4.x86_64
MQSeriesRuntime_6-U200491-7.5.0-4.x86_64
$ python3 -c 'import re,sys; print("n".join(map(lambda x: re.split("-|_",x)[1], sys.stdin.readlines())))' < input.txt 
5
2
6
add a comment |
up vote
0
down vote
up vote
0
down vote
Python
With use of <<< to redirect the desired string into the stdin of python interpreter and with re.split() we can take out the second item in the resulting list of items being split at the two separators. 
$ python -c 'import re,sys; print(re.split("-|_",sys.stdin.readline())[1])' <<< "MQSeriesRuntime_5-U200491-7.5.0-4.x86_64" 
5 
Alternatively, we could just give the string as command-line argument and operate on sys.argv[1]:
$ python3 -c 'import re,sys; print(re.split("-|_",sys.argv[1])[1])' "MQSeriesRuntime_5-U200491-7.5.0-4.x86_64" 
5
This works with Python 2 and 3. If we want to process a file and extract input from each line in this manner, we can do the following:
$ cat input.txt
MQSeriesRuntime_5-U200491-7.5.0-4.x86_64
MQSeriesRuntime_2-U200491-7.5.0-4.x86_64
MQSeriesRuntime_6-U200491-7.5.0-4.x86_64
$ python3 -c 'import re,sys; print("n".join(map(lambda x: re.split("-|_",x)[1], sys.stdin.readlines())))' < input.txt 
5
2
6
Python
With use of <<< to redirect the desired string into the stdin of python interpreter and with re.split() we can take out the second item in the resulting list of items being split at the two separators. 
$ python -c 'import re,sys; print(re.split("-|_",sys.stdin.readline())[1])' <<< "MQSeriesRuntime_5-U200491-7.5.0-4.x86_64" 
5 
Alternatively, we could just give the string as command-line argument and operate on sys.argv[1]:
$ python3 -c 'import re,sys; print(re.split("-|_",sys.argv[1])[1])' "MQSeriesRuntime_5-U200491-7.5.0-4.x86_64" 
5
This works with Python 2 and 3. If we want to process a file and extract input from each line in this manner, we can do the following:
$ cat input.txt
MQSeriesRuntime_5-U200491-7.5.0-4.x86_64
MQSeriesRuntime_2-U200491-7.5.0-4.x86_64
MQSeriesRuntime_6-U200491-7.5.0-4.x86_64
$ python3 -c 'import re,sys; print("n".join(map(lambda x: re.split("-|_",x)[1], sys.stdin.readlines())))' < input.txt 
5
2
6
edited Jul 12 '17 at 2:54
answered Jul 12 '17 at 2:47


Sergiy Kolodyazhnyy
8,11212051
8,11212051
add a comment |
add a comment |
up vote
0
down vote
Inspired by https://stackoverflow.com/a/2957781/53897:
echo MQSeriesRuntime_5-U200491-7.5.0-4.x86_64 | perl -n -e '/_([^-]+)/ && print $1'
add a comment |
up vote
0
down vote
Inspired by https://stackoverflow.com/a/2957781/53897:
echo MQSeriesRuntime_5-U200491-7.5.0-4.x86_64 | perl -n -e '/_([^-]+)/ && print $1'
add a comment |
up vote
0
down vote
up vote
0
down vote
Inspired by https://stackoverflow.com/a/2957781/53897:
echo MQSeriesRuntime_5-U200491-7.5.0-4.x86_64 | perl -n -e '/_([^-]+)/ && print $1'
Inspired by https://stackoverflow.com/a/2957781/53897:
echo MQSeriesRuntime_5-U200491-7.5.0-4.x86_64 | perl -n -e '/_([^-]+)/ && print $1'
answered Jul 12 '17 at 3:22
Thorbjørn Ravn Andersen
821714
821714
add a comment |
add a comment |
up vote
0
down vote
Could use cut with first delimiter _ get the second collumn and then cut it again with - delimiter to get the first collumn.
echo MQSeriesRuntime_5-U200491-7.5.0-4.x86_64 | cut -d"_" -f2 | cut -d"-" -f1 
add a comment |
up vote
0
down vote
Could use cut with first delimiter _ get the second collumn and then cut it again with - delimiter to get the first collumn.
echo MQSeriesRuntime_5-U200491-7.5.0-4.x86_64 | cut -d"_" -f2 | cut -d"-" -f1 
add a comment |
up vote
0
down vote
up vote
0
down vote
Could use cut with first delimiter _ get the second collumn and then cut it again with - delimiter to get the first collumn.
echo MQSeriesRuntime_5-U200491-7.5.0-4.x86_64 | cut -d"_" -f2 | cut -d"-" -f1 
Could use cut with first delimiter _ get the second collumn and then cut it again with - delimiter to get the first collumn.
echo MQSeriesRuntime_5-U200491-7.5.0-4.x86_64 | cut -d"_" -f2 | cut -d"-" -f1 
answered Jul 15 '17 at 22:48
GiannakopoulosJ
432213
432213
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f377770%2fextract-text-between-2-different-matches%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown