How does storing the regular expression in a shell variable avoid problems with quoting characters that are special to the shell?
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
From Bash Manual
Storing the regular expression in a shell variable is often a useful way to avoid problems with quoting characters that are special to the
shell. It is sometimes difficult to specify a regular expression
literally without using quotes, or to keep track of the quoting used
by regular expressions while paying attention to the shellâÂÂs quote
removal. Using a shell variable to store the pattern decreases these
problems. For example, the following are equivalent:pattern=âÂÂ[[:space:]]*(a)?bâÂÂ
[[ $line =~ $pattern ]]
and
[[ $line =~ [[:space:]]*(a)?b ]]
If you want to match a character thatâÂÂs special to the regular
expression grammar, it has to be quoted to remove its special meaning.
This means that in the patternxxx.txt
âÂÂ, the.
matches any
character in the string (its usual regular expression meaning), but in
the pattern"xxx.txt"
it can only match a literal.
. Shell
programmers should take special care with backslashes, since
back-slashes are used both by the shell and regular expressions to
remove the special meaning from the following character. The following
two sets of commands are not equivalent:pattern=âÂÂ.âÂÂ
[[ . =~ $pattern ]]
[[ . =~ . ]]
[[ . =~ "$pattern" ]]
[[ . =~ âÂÂ.â ]]
The first two matches will succeed, but the second two will not,
because in the second two the backslash will be part of the pattern to
be matched. In the first two examples, the backslash removes the
special meaning from.
, so the literal.
matches. If the string in
the first examples were anything other than.
, saya
, the pattern
would not match, because the quoted.
in the pattern loses its
special meaning of matching any single character.
How is storing the regular expression in a shell variable a useful way to avoid problems with quoting characters that are special to the shell?
The given examples don't seem to explain that.
In the given examples, the regex literals in one method and the values of the shell variable pattern
in the other method are the same.
Thanks.
bash regular-expression
add a comment |Â
up vote
2
down vote
favorite
From Bash Manual
Storing the regular expression in a shell variable is often a useful way to avoid problems with quoting characters that are special to the
shell. It is sometimes difficult to specify a regular expression
literally without using quotes, or to keep track of the quoting used
by regular expressions while paying attention to the shellâÂÂs quote
removal. Using a shell variable to store the pattern decreases these
problems. For example, the following are equivalent:pattern=âÂÂ[[:space:]]*(a)?bâÂÂ
[[ $line =~ $pattern ]]
and
[[ $line =~ [[:space:]]*(a)?b ]]
If you want to match a character thatâÂÂs special to the regular
expression grammar, it has to be quoted to remove its special meaning.
This means that in the patternxxx.txt
âÂÂ, the.
matches any
character in the string (its usual regular expression meaning), but in
the pattern"xxx.txt"
it can only match a literal.
. Shell
programmers should take special care with backslashes, since
back-slashes are used both by the shell and regular expressions to
remove the special meaning from the following character. The following
two sets of commands are not equivalent:pattern=âÂÂ.âÂÂ
[[ . =~ $pattern ]]
[[ . =~ . ]]
[[ . =~ "$pattern" ]]
[[ . =~ âÂÂ.â ]]
The first two matches will succeed, but the second two will not,
because in the second two the backslash will be part of the pattern to
be matched. In the first two examples, the backslash removes the
special meaning from.
, so the literal.
matches. If the string in
the first examples were anything other than.
, saya
, the pattern
would not match, because the quoted.
in the pattern loses its
special meaning of matching any single character.
How is storing the regular expression in a shell variable a useful way to avoid problems with quoting characters that are special to the shell?
The given examples don't seem to explain that.
In the given examples, the regex literals in one method and the values of the shell variable pattern
in the other method are the same.
Thanks.
bash regular-expression
You said it 's from the Bash Manual. Where exactly is this to read? I can't find it, even if I tryman bash | grep -A2 -B2 "regular expression"
.
â John Goofy
Jul 27 '17 at 13:03
1
@JohnGoofy, the man page is a stripped down version of the manual. You may want to look at the info page instead. Tim has also added a link to the online manual in his answer.
â Stéphane Chazelas
Jul 27 '17 at 13:54
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
From Bash Manual
Storing the regular expression in a shell variable is often a useful way to avoid problems with quoting characters that are special to the
shell. It is sometimes difficult to specify a regular expression
literally without using quotes, or to keep track of the quoting used
by regular expressions while paying attention to the shellâÂÂs quote
removal. Using a shell variable to store the pattern decreases these
problems. For example, the following are equivalent:pattern=âÂÂ[[:space:]]*(a)?bâÂÂ
[[ $line =~ $pattern ]]
and
[[ $line =~ [[:space:]]*(a)?b ]]
If you want to match a character thatâÂÂs special to the regular
expression grammar, it has to be quoted to remove its special meaning.
This means that in the patternxxx.txt
âÂÂ, the.
matches any
character in the string (its usual regular expression meaning), but in
the pattern"xxx.txt"
it can only match a literal.
. Shell
programmers should take special care with backslashes, since
back-slashes are used both by the shell and regular expressions to
remove the special meaning from the following character. The following
two sets of commands are not equivalent:pattern=âÂÂ.âÂÂ
[[ . =~ $pattern ]]
[[ . =~ . ]]
[[ . =~ "$pattern" ]]
[[ . =~ âÂÂ.â ]]
The first two matches will succeed, but the second two will not,
because in the second two the backslash will be part of the pattern to
be matched. In the first two examples, the backslash removes the
special meaning from.
, so the literal.
matches. If the string in
the first examples were anything other than.
, saya
, the pattern
would not match, because the quoted.
in the pattern loses its
special meaning of matching any single character.
How is storing the regular expression in a shell variable a useful way to avoid problems with quoting characters that are special to the shell?
The given examples don't seem to explain that.
In the given examples, the regex literals in one method and the values of the shell variable pattern
in the other method are the same.
Thanks.
bash regular-expression
From Bash Manual
Storing the regular expression in a shell variable is often a useful way to avoid problems with quoting characters that are special to the
shell. It is sometimes difficult to specify a regular expression
literally without using quotes, or to keep track of the quoting used
by regular expressions while paying attention to the shellâÂÂs quote
removal. Using a shell variable to store the pattern decreases these
problems. For example, the following are equivalent:pattern=âÂÂ[[:space:]]*(a)?bâÂÂ
[[ $line =~ $pattern ]]
and
[[ $line =~ [[:space:]]*(a)?b ]]
If you want to match a character thatâÂÂs special to the regular
expression grammar, it has to be quoted to remove its special meaning.
This means that in the patternxxx.txt
âÂÂ, the.
matches any
character in the string (its usual regular expression meaning), but in
the pattern"xxx.txt"
it can only match a literal.
. Shell
programmers should take special care with backslashes, since
back-slashes are used both by the shell and regular expressions to
remove the special meaning from the following character. The following
two sets of commands are not equivalent:pattern=âÂÂ.âÂÂ
[[ . =~ $pattern ]]
[[ . =~ . ]]
[[ . =~ "$pattern" ]]
[[ . =~ âÂÂ.â ]]
The first two matches will succeed, but the second two will not,
because in the second two the backslash will be part of the pattern to
be matched. In the first two examples, the backslash removes the
special meaning from.
, so the literal.
matches. If the string in
the first examples were anything other than.
, saya
, the pattern
would not match, because the quoted.
in the pattern loses its
special meaning of matching any single character.
How is storing the regular expression in a shell variable a useful way to avoid problems with quoting characters that are special to the shell?
The given examples don't seem to explain that.
In the given examples, the regex literals in one method and the values of the shell variable pattern
in the other method are the same.
Thanks.
bash regular-expression
bash regular-expression
edited Jul 27 '17 at 13:29
asked Jul 27 '17 at 1:51
Tim
23.8k67232417
23.8k67232417
You said it 's from the Bash Manual. Where exactly is this to read? I can't find it, even if I tryman bash | grep -A2 -B2 "regular expression"
.
â John Goofy
Jul 27 '17 at 13:03
1
@JohnGoofy, the man page is a stripped down version of the manual. You may want to look at the info page instead. Tim has also added a link to the online manual in his answer.
â Stéphane Chazelas
Jul 27 '17 at 13:54
add a comment |Â
You said it 's from the Bash Manual. Where exactly is this to read? I can't find it, even if I tryman bash | grep -A2 -B2 "regular expression"
.
â John Goofy
Jul 27 '17 at 13:03
1
@JohnGoofy, the man page is a stripped down version of the manual. You may want to look at the info page instead. Tim has also added a link to the online manual in his answer.
â Stéphane Chazelas
Jul 27 '17 at 13:54
You said it 's from the Bash Manual. Where exactly is this to read? I can't find it, even if I try
man bash | grep -A2 -B2 "regular expression"
.â John Goofy
Jul 27 '17 at 13:03
You said it 's from the Bash Manual. Where exactly is this to read? I can't find it, even if I try
man bash | grep -A2 -B2 "regular expression"
.â John Goofy
Jul 27 '17 at 13:03
1
1
@JohnGoofy, the man page is a stripped down version of the manual. You may want to look at the info page instead. Tim has also added a link to the online manual in his answer.
â Stéphane Chazelas
Jul 27 '17 at 13:54
@JohnGoofy, the man page is a stripped down version of the manual. You may want to look at the info page instead. Tim has also added a link to the online manual in his answer.
â Stéphane Chazelas
Jul 27 '17 at 13:54
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
6
down vote
[[ ... ]]
tokenisation clashes with regular expressions (more on that in my answer to your follow-up question) and is overloaded as a shell quoting operator and a regexp operator (with some interference between the two in bash), and even when there's no apparent reason for a clash, the behaviour can be surprising. Rules can be confusing.
Who can tell what these will do without trying it (on all possible input) with any given version of bash
?
[[ $a = a|b ]]
[[ $a =~ a|b ]]
[[ $a =~ a&b ]]
[[ $a =~ (a|b) ]]
[[ $a =~ ([)}]*) ]]
[[ $a =~ [/(] ]]
[[ $a =~ s+ ]]
[[ $a =~ ( ) ]]
[[ $a =~ [ ] ]]
[[ $a =~ ([ ]) ]]
You can't quote the regexps, because if you do, since bash 3.2 and if bash 3.1 compatibility has not been enabled, quoting the regexps removes the special meaning of RE operator. For instance,
[[ $a =~ 'a|b' ]]
Matches if $a
contains a litteral a|b
only.
Storing the regexp in a variable avoids all those problems and also makes the code compatible to ksh93
and zsh
(provided you limit yourself to POSIX EREs):
regexp='a|b'
[[ $a =~ $regexp ]] # $regexp should *not* be quoted.
There's no ambiguity in the parsing/tokenising of that shell command, and the regexp that is used is the one stored in the variable without any transformation.
[[ $a =~ a|b ]]
works with|
being interpreted as OR in regex. In the approach of using a variable, the same regex is assigned to the variableregexp='a|b'
. So the example doesn't seem to show that the variable approach avoids problems with quoting characters. However, the variable approach does make a difference when=~
for regex is replaced with=
for globbing.
â Tim
Jul 27 '17 at 21:25
[[ $a = a|b ]]
results insyntax error in conditional expression: unexpected token '|'
, whileregexp='a|b'; [[ $a = $regexp ]]
doesn't. Why is the difference? Does parameter expansion ofregexp
delaya|b
's appearing in the conditional expression, so that the delay can avoid some interpretation step which reports error on[[ $a = a|b ]]
? What interpretation step is that?
â Tim
Jul 27 '17 at 21:31
add a comment |Â
up vote
3
down vote
The only way to match an explicit string is to quote it:
[[ $var =~ 'quux' ]]
Even if the string contains special characters (special to the shell[a])
without the shell expanding or interpreting them[b]:
$ var='^abcd'
$ [[ $var =~ '^ab' ]] && echo yes || echo no
yes
If we need to actually allow (shell) special characters and allow the shell to interpret them as a regular expression they should be un-quoted.
$ var='abcd'
$ [[ $var =~ ^ab ]] && echo yes || echo no
yes
But unquoted strings create new problems, like with spaces:
$ var='ab cd'
$ [[ $var =~ ^ab cd ]] && echo yes || echo no
bash: syntax error in conditional expression
bash: syntax error near `cd'
To solve it, we need to still quote special characters:
$ var='ab cd'
$ [[ $var =~ ^"ab cd" ]] && echo yes || echo no
yes
$ [[ $var =~ ^ab cd ]] && echo yes || echo no
yes
Other examples:
[[ "a b" =~ ^a b$ ]] && echo yes
[[ "a|b" =~ ^a|b$ ]] && echo yes
[[ "a&b" =~ ^a&b$ ]] && echo yes
Storing the regexp inside a variable avoids all those quoting problems.
$ regex='^a b$'
$ [[ "a b" =~ $regex ]] && echo yes
yes
[a]
List of shell special characters (|
&
;
(
)
<
>
space
tab
newline
).
[b]
This is true since bash version bash-3.2-alpha (under "3. New Features in Bash" heading):
f. Quoting the string argument to the [[ command's =~ operator now forces
string matching, as with the other pattern-matching operators.
Copy of extended description from bash FAQ:
E14) Why does quoting the pattern argument to the regular expression matching conditional operator (=~) cause regexp matching to stop working?
In versions of bash prior to bash-3.2, the effect of quoting the regular expression argument to the [[ command's =~ operator was not specified. The practical effect was that double-quoting the pattern argument required backslashes to quote special pattern characters, which interfered with the backslash processing performed by double-quoted word expansion and was inconsistent with how the == shell pattern matching operator treated quoted characters.
In bash-3.2, the shell was changed to internally quote characters in single-
and double-quoted string arguments to the =~ operator, which suppresses the
special meaning of the characters special to regular expression processing
(.',
[',',
(',),
*',+',
?',',
]*) ]]
[[ $a =~ [/(] ]]
[[ $a =~ s+ ]]
[[ $a =~ ( ) ]]
[[ $a =~ [ ] ]]
[[ $a =~ ([ ]) ]]You can't quote the regexps, because if you do, since bash 3.2 and if bash 3.1 compatibility has not been enabled, quoting the regexps removes the special meaning of RE operator. For instance,
[[ $a =~ 'a|b' ]]
Matches if
$a
contains a litterala|b
only.Storing the regexp in a variable avoids all those problems and also makes the code compatible to
ksh93
andzsh
(provided you limit yourself to POSIX EREs):regexp='a|b'
[[ $a =~ $regexp ]] # $regexp should *not* be quoted.There's no ambiguity in the parsing/tokenising of that shell command, and the regexp that is used is the one stored in the variable without any transformation.
[[ $a =~ a|b ]]
works with|
being interpreted as OR in regex. In the approach of using a variable, the same regex is assigned to the variableregexp='a|b'
. So the example doesn't seem to show that the variable approach avoids problems with quoting characters. However, the variable approach does make a difference when=~
for regex is replaced with=
for globbing.
â Tim
Jul 27 '17 at 21:25
[[ $a = a|b ]]
results insyntax error in conditional expression: unexpected token '|'
, whileregexp='a|b'; [[ $a = $regexp ]]
doesn't. Why is the difference? Does parameter expansion ofregexp
delaya|b
's appearing in the conditional expression, so that the delay can avoid some interpretation step which reports error on[[ $a = a|b ]]
? What interpretation step is that?
â Tim
Jul 27 '17 at 21:31
add a comment |Â
up vote
6
down vote
[[ ... ]]
tokenisation clashes with regular expressions (more on that in my answer to your follow-up question) and is overloaded as a shell quoting operator and a regexp operator (with some interference between the two in bash), and even when there's no apparent reason for a clash, the behaviour can be surprising. Rules can be confusing.
Who can tell what these will do without trying it (on all possible input) with any given version of bash
?
[[ $a = a|b ]]
[[ $a =~ a|b ]]
[[ $a =~ a&b ]]
[[ $a =~ (a|b) ]]
[[ $a =~ ([)}]*) ]]
[[ $a =~ [/(] ]]
[[ $a =~ s+ ]]
[[ $a =~ ( ) ]]
[[ $a =~ [ ] ]]
[[ $a =~ ([ ]) ]]
You can't quote the regexps, because if you do, since bash 3.2 and if bash 3.1 compatibility has not been enabled, quoting the regexps removes the special meaning of RE operator. For instance,
[[ $a =~ 'a|b' ]]
Matches if $a
contains a litteral a|b
only.
Storing the regexp in a variable avoids all those problems and also makes the code compatible to ksh93
and zsh
(provided you limit yourself to POSIX EREs):
regexp='a|b'
[[ $a =~ $regexp ]] # $regexp should *not* be quoted.
There's no ambiguity in the parsing/tokenising of that shell command, and the regexp that is used is the one stored in the variable without any transformation.
[[ $a =~ a|b ]]
works with|
being interpreted as OR in regex. In the approach of using a variable, the same regex is assigned to the variableregexp='a|b'
. So the example doesn't seem to show that the variable approach avoids problems with quoting characters. However, the variable approach does make a difference when=~
for regex is replaced with=
for globbing.
â Tim
Jul 27 '17 at 21:25
[[ $a = a|b ]]
results insyntax error in conditional expression: unexpected token '|'
, whileregexp='a|b'; [[ $a = $regexp ]]
doesn't. Why is the difference? Does parameter expansion ofregexp
delaya|b
's appearing in the conditional expression, so that the delay can avoid some interpretation step which reports error on[[ $a = a|b ]]
? What interpretation step is that?
â Tim
Jul 27 '17 at 21:31
add a comment |Â
up vote
6
down vote
up vote
6
down vote
[[ ... ]]
tokenisation clashes with regular expressions (more on that in my answer to your follow-up question) and is overloaded as a shell quoting operator and a regexp operator (with some interference between the two in bash), and even when there's no apparent reason for a clash, the behaviour can be surprising. Rules can be confusing.
Who can tell what these will do without trying it (on all possible input) with any given version of bash
?
[[ $a = a|b ]]
[[ $a =~ a|b ]]
[[ $a =~ a&b ]]
[[ $a =~ (a|b) ]]
[[ $a =~ ([)}]*) ]]
[[ $a =~ [/(] ]]
[[ $a =~ s+ ]]
[[ $a =~ ( ) ]]
[[ $a =~ [ ] ]]
[[ $a =~ ([ ]) ]]
You can't quote the regexps, because if you do, since bash 3.2 and if bash 3.1 compatibility has not been enabled, quoting the regexps removes the special meaning of RE operator. For instance,
[[ $a =~ 'a|b' ]]
Matches if $a
contains a litteral a|b
only.
Storing the regexp in a variable avoids all those problems and also makes the code compatible to ksh93
and zsh
(provided you limit yourself to POSIX EREs):
regexp='a|b'
[[ $a =~ $regexp ]] # $regexp should *not* be quoted.
There's no ambiguity in the parsing/tokenising of that shell command, and the regexp that is used is the one stored in the variable without any transformation.
[[ ... ]]
tokenisation clashes with regular expressions (more on that in my answer to your follow-up question) and is overloaded as a shell quoting operator and a regexp operator (with some interference between the two in bash), and even when there's no apparent reason for a clash, the behaviour can be surprising. Rules can be confusing.
Who can tell what these will do without trying it (on all possible input) with any given version of bash
?
[[ $a = a|b ]]
[[ $a =~ a|b ]]
[[ $a =~ a&b ]]
[[ $a =~ (a|b) ]]
[[ $a =~ ([)}]*) ]]
[[ $a =~ [/(] ]]
[[ $a =~ s+ ]]
[[ $a =~ ( ) ]]
[[ $a =~ [ ] ]]
[[ $a =~ ([ ]) ]]
You can't quote the regexps, because if you do, since bash 3.2 and if bash 3.1 compatibility has not been enabled, quoting the regexps removes the special meaning of RE operator. For instance,
[[ $a =~ 'a|b' ]]
Matches if $a
contains a litteral a|b
only.
Storing the regexp in a variable avoids all those problems and also makes the code compatible to ksh93
and zsh
(provided you limit yourself to POSIX EREs):
regexp='a|b'
[[ $a =~ $regexp ]] # $regexp should *not* be quoted.
There's no ambiguity in the parsing/tokenising of that shell command, and the regexp that is used is the one stored in the variable without any transformation.
edited Jul 28 '17 at 11:25
answered Jul 27 '17 at 13:43
Stéphane Chazelas
287k53528867
287k53528867
[[ $a =~ a|b ]]
works with|
being interpreted as OR in regex. In the approach of using a variable, the same regex is assigned to the variableregexp='a|b'
. So the example doesn't seem to show that the variable approach avoids problems with quoting characters. However, the variable approach does make a difference when=~
for regex is replaced with=
for globbing.
â Tim
Jul 27 '17 at 21:25
[[ $a = a|b ]]
results insyntax error in conditional expression: unexpected token '|'
, whileregexp='a|b'; [[ $a = $regexp ]]
doesn't. Why is the difference? Does parameter expansion ofregexp
delaya|b
's appearing in the conditional expression, so that the delay can avoid some interpretation step which reports error on[[ $a = a|b ]]
? What interpretation step is that?
â Tim
Jul 27 '17 at 21:31
add a comment |Â
[[ $a =~ a|b ]]
works with|
being interpreted as OR in regex. In the approach of using a variable, the same regex is assigned to the variableregexp='a|b'
. So the example doesn't seem to show that the variable approach avoids problems with quoting characters. However, the variable approach does make a difference when=~
for regex is replaced with=
for globbing.
â Tim
Jul 27 '17 at 21:25
[[ $a = a|b ]]
results insyntax error in conditional expression: unexpected token '|'
, whileregexp='a|b'; [[ $a = $regexp ]]
doesn't. Why is the difference? Does parameter expansion ofregexp
delaya|b
's appearing in the conditional expression, so that the delay can avoid some interpretation step which reports error on[[ $a = a|b ]]
? What interpretation step is that?
â Tim
Jul 27 '17 at 21:31
[[ $a =~ a|b ]]
works with |
being interpreted as OR in regex. In the approach of using a variable, the same regex is assigned to the variable regexp='a|b'
. So the example doesn't seem to show that the variable approach avoids problems with quoting characters. However, the variable approach does make a difference when =~
for regex is replaced with =
for globbing.â Tim
Jul 27 '17 at 21:25
[[ $a =~ a|b ]]
works with |
being interpreted as OR in regex. In the approach of using a variable, the same regex is assigned to the variable regexp='a|b'
. So the example doesn't seem to show that the variable approach avoids problems with quoting characters. However, the variable approach does make a difference when =~
for regex is replaced with =
for globbing.â Tim
Jul 27 '17 at 21:25
[[ $a = a|b ]]
results in syntax error in conditional expression: unexpected token '|'
, while regexp='a|b'; [[ $a = $regexp ]]
doesn't. Why is the difference? Does parameter expansion of regexp
delay a|b
's appearing in the conditional expression, so that the delay can avoid some interpretation step which reports error on [[ $a = a|b ]]
? What interpretation step is that?â Tim
Jul 27 '17 at 21:31
[[ $a = a|b ]]
results in syntax error in conditional expression: unexpected token '|'
, while regexp='a|b'; [[ $a = $regexp ]]
doesn't. Why is the difference? Does parameter expansion of regexp
delay a|b
's appearing in the conditional expression, so that the delay can avoid some interpretation step which reports error on [[ $a = a|b ]]
? What interpretation step is that?â Tim
Jul 27 '17 at 21:31
add a comment |Â
up vote
3
down vote
The only way to match an explicit string is to quote it:
[[ $var =~ 'quux' ]]
Even if the string contains special characters (special to the shell[a])
without the shell expanding or interpreting them[b]:
$ var='^abcd'
$ [[ $var =~ '^ab' ]] && echo yes || echo no
yes
If we need to actually allow (shell) special characters and allow the shell to interpret them as a regular expression they should be un-quoted.
$ var='abcd'
$ [[ $var =~ ^ab ]] && echo yes || echo no
yes
But unquoted strings create new problems, like with spaces:
$ var='ab cd'
$ [[ $var =~ ^ab cd ]] && echo yes || echo no
bash: syntax error in conditional expression
bash: syntax error near `cd'
To solve it, we need to still quote special characters:
$ var='ab cd'
$ [[ $var =~ ^"ab cd" ]] && echo yes || echo no
yes
$ [[ $var =~ ^ab cd ]] && echo yes || echo no
yes
Other examples:
[[ "a b" =~ ^a b$ ]] && echo yes
[[ "a|b" =~ ^a|b$ ]] && echo yes
[[ "a&b" =~ ^a&b$ ]] && echo yes
Storing the regexp inside a variable avoids all those quoting problems.
$ regex='^a b$'
$ [[ "a b" =~ $regex ]] && echo yes
yes
[a]
List of shell special characters (|
&
;
(
)
<
>
space
tab
newline
).
[b]
This is true since bash version bash-3.2-alpha (under "3. New Features in Bash" heading):
f. Quoting the string argument to the [[ command's =~ operator now forces
string matching, as with the other pattern-matching operators.
Copy of extended description from bash FAQ:
E14) Why does quoting the pattern argument to the regular expression matching conditional operator (=~) cause regexp matching to stop working?
In versions of bash prior to bash-3.2, the effect of quoting the regular expression argument to the [[ command's =~ operator was not specified. The practical effect was that double-quoting the pattern argument required backslashes to quote special pattern characters, which interfered with the backslash processing performed by double-quoted word expansion and was inconsistent with how the == shell pattern matching operator treated quoted characters.
In bash-3.2, the shell was changed to internally quote characters in single-
and double-quoted string arguments to the =~ operator, which suppresses the
special meaning of the characters special to regular expression processing
(.',
[',',
(',),
*',+',
?',{',
|',^', and
$') and forces
them to be matched literally. This is consistent with how the `==' pattern
matching operator treats quoted portions of its pattern argument.
Since the treatment of quoted string arguments was changed, several issues
have arisen, chief among them the problem of white space in pattern arguments
and the differing treatment of quoted strings between bash-3.1 and bash-3.2.
Both problems may be solved by using a shell variable to hold the pattern.
Since word splitting is not performed when expanding shell variables in all
operands of the [[ command, this allows users to quote patterns as they wish
when assigning the variable, then expand the values to a single string that
may contain whitespace. The first problem may be solved by using backslashes
or any other quoting mechanism to escape the white space in the patterns.
Related questions:
Using a variable in a regex
add a comment |Â
up vote
3
down vote
The only way to match an explicit string is to quote it:
[[ $var =~ 'quux' ]]
Even if the string contains special characters (special to the shell[a])
without the shell expanding or interpreting them[b]:
$ var='^abcd'
$ [[ $var =~ '^ab' ]] && echo yes || echo no
yes
If we need to actually allow (shell) special characters and allow the shell to interpret them as a regular expression they should be un-quoted.
$ var='abcd'
$ [[ $var =~ ^ab ]] && echo yes || echo no
yes
But unquoted strings create new problems, like with spaces:
$ var='ab cd'
$ [[ $var =~ ^ab cd ]] && echo yes || echo no
bash: syntax error in conditional expression
bash: syntax error near `cd'
To solve it, we need to still quote special characters:
$ var='ab cd'
$ [[ $var =~ ^"ab cd" ]] && echo yes || echo no
yes
$ [[ $var =~ ^ab cd ]] && echo yes || echo no
yes
Other examples:
[[ "a b" =~ ^a b$ ]] && echo yes
[[ "a|b" =~ ^a|b$ ]] && echo yes
[[ "a&b" =~ ^a&b$ ]] && echo yes
Storing the regexp inside a variable avoids all those quoting problems.
$ regex='^a b$'
$ [[ "a b" =~ $regex ]] && echo yes
yes
[a]
List of shell special characters (|
&
;
(
)
<
>
space
tab
newline
).
[b]
This is true since bash version bash-3.2-alpha (under "3. New Features in Bash" heading):
f. Quoting the string argument to the [[ command's =~ operator now forces
string matching, as with the other pattern-matching operators.
Copy of extended description from bash FAQ:
E14) Why does quoting the pattern argument to the regular expression matching conditional operator (=~) cause regexp matching to stop working?
In versions of bash prior to bash-3.2, the effect of quoting the regular expression argument to the [[ command's =~ operator was not specified. The practical effect was that double-quoting the pattern argument required backslashes to quote special pattern characters, which interfered with the backslash processing performed by double-quoted word expansion and was inconsistent with how the == shell pattern matching operator treated quoted characters.
In bash-3.2, the shell was changed to internally quote characters in single-
and double-quoted string arguments to the =~ operator, which suppresses the
special meaning of the characters special to regular expression processing
(.',
[',',
(',),
*',+',
?',{',
|',^', and
$') and forces
them to be matched literally. This is consistent with how the `==' pattern
matching operator treats quoted portions of its pattern argument.
Since the treatment of quoted string arguments was changed, several issues
have arisen, chief among them the problem of white space in pattern arguments
and the differing treatment of quoted strings between bash-3.1 and bash-3.2.
Both problems may be solved by using a shell variable to hold the pattern.
Since word splitting is not performed when expanding shell variables in all
operands of the [[ command, this allows users to quote patterns as they wish
when assigning the variable, then expand the values to a single string that
may contain whitespace. The first problem may be solved by using backslashes
or any other quoting mechanism to escape the white space in the patterns.
Related questions:
Using a variable in a regex
add a comment |Â
up vote
3
down vote
up vote
3
down vote
The only way to match an explicit string is to quote it:
[[ $var =~ 'quux' ]]
Even if the string contains special characters (special to the shell[a])
without the shell expanding or interpreting them[b]:
$ var='^abcd'
$ [[ $var =~ '^ab' ]] && echo yes || echo no
yes
If we need to actually allow (shell) special characters and allow the shell to interpret them as a regular expression they should be un-quoted.
$ var='abcd'
$ [[ $var =~ ^ab ]] && echo yes || echo no
yes
But unquoted strings create new problems, like with spaces:
$ var='ab cd'
$ [[ $var =~ ^ab cd ]] && echo yes || echo no
bash: syntax error in conditional expression
bash: syntax error near `cd'
To solve it, we need to still quote special characters:
$ var='ab cd'
$ [[ $var =~ ^"ab cd" ]] && echo yes || echo no
yes
$ [[ $var =~ ^ab cd ]] && echo yes || echo no
yes
Other examples:
[[ "a b" =~ ^a b$ ]] && echo yes
[[ "a|b" =~ ^a|b$ ]] && echo yes
[[ "a&b" =~ ^a&b$ ]] && echo yes
Storing the regexp inside a variable avoids all those quoting problems.
$ regex='^a b$'
$ [[ "a b" =~ $regex ]] && echo yes
yes
[a]
List of shell special characters (|
&
;
(
)
<
>
space
tab
newline
).
[b]
This is true since bash version bash-3.2-alpha (under "3. New Features in Bash" heading):
f. Quoting the string argument to the [[ command's =~ operator now forces
string matching, as with the other pattern-matching operators.
Copy of extended description from bash FAQ:
E14) Why does quoting the pattern argument to the regular expression matching conditional operator (=~) cause regexp matching to stop working?
In versions of bash prior to bash-3.2, the effect of quoting the regular expression argument to the [[ command's =~ operator was not specified. The practical effect was that double-quoting the pattern argument required backslashes to quote special pattern characters, which interfered with the backslash processing performed by double-quoted word expansion and was inconsistent with how the == shell pattern matching operator treated quoted characters.
In bash-3.2, the shell was changed to internally quote characters in single-
and double-quoted string arguments to the =~ operator, which suppresses the
special meaning of the characters special to regular expression processing
(.',
[',',
(',),
*',+',
?',{',
|',^', and
$') and forces
them to be matched literally. This is consistent with how the `==' pattern
matching operator treats quoted portions of its pattern argument.
Since the treatment of quoted string arguments was changed, several issues
have arisen, chief among them the problem of white space in pattern arguments
and the differing treatment of quoted strings between bash-3.1 and bash-3.2.
Both problems may be solved by using a shell variable to hold the pattern.
Since word splitting is not performed when expanding shell variables in all
operands of the [[ command, this allows users to quote patterns as they wish
when assigning the variable, then expand the values to a single string that
may contain whitespace. The first problem may be solved by using backslashes
or any other quoting mechanism to escape the white space in the patterns.
Related questions:
Using a variable in a regex
The only way to match an explicit string is to quote it:
[[ $var =~ 'quux' ]]
Even if the string contains special characters (special to the shell[a])
without the shell expanding or interpreting them[b]:
$ var='^abcd'
$ [[ $var =~ '^ab' ]] && echo yes || echo no
yes
If we need to actually allow (shell) special characters and allow the shell to interpret them as a regular expression they should be un-quoted.
$ var='abcd'
$ [[ $var =~ ^ab ]] && echo yes || echo no
yes
But unquoted strings create new problems, like with spaces:
$ var='ab cd'
$ [[ $var =~ ^ab cd ]] && echo yes || echo no
bash: syntax error in conditional expression
bash: syntax error near `cd'
To solve it, we need to still quote special characters:
$ var='ab cd'
$ [[ $var =~ ^"ab cd" ]] && echo yes || echo no
yes
$ [[ $var =~ ^ab cd ]] && echo yes || echo no
yes
Other examples:
[[ "a b" =~ ^a b$ ]] && echo yes
[[ "a|b" =~ ^a|b$ ]] && echo yes
[[ "a&b" =~ ^a&b$ ]] && echo yes
Storing the regexp inside a variable avoids all those quoting problems.
$ regex='^a b$'
$ [[ "a b" =~ $regex ]] && echo yes
yes
[a]
List of shell special characters (|
&
;
(
)
<
>
space
tab
newline
).
[b]
This is true since bash version bash-3.2-alpha (under "3. New Features in Bash" heading):
f. Quoting the string argument to the [[ command's =~ operator now forces
string matching, as with the other pattern-matching operators.
Copy of extended description from bash FAQ:
E14) Why does quoting the pattern argument to the regular expression matching conditional operator (=~) cause regexp matching to stop working?
In versions of bash prior to bash-3.2, the effect of quoting the regular expression argument to the [[ command's =~ operator was not specified. The practical effect was that double-quoting the pattern argument required backslashes to quote special pattern characters, which interfered with the backslash processing performed by double-quoted word expansion and was inconsistent with how the == shell pattern matching operator treated quoted characters.
In bash-3.2, the shell was changed to internally quote characters in single-
and double-quoted string arguments to the =~ operator, which suppresses the
special meaning of the characters special to regular expression processing
(.',
[',',
(',),
*',+',
?',{',
|',^', and
$') and forces
them to be matched literally. This is consistent with how the `==' pattern
matching operator treats quoted portions of its pattern argument.
Since the treatment of quoted string arguments was changed, several issues
have arisen, chief among them the problem of white space in pattern arguments
and the differing treatment of quoted strings between bash-3.1 and bash-3.2.
Both problems may be solved by using a shell variable to hold the pattern.
Since word splitting is not performed when expanding shell variables in all
operands of the [[ command, this allows users to quote patterns as they wish
when assigning the variable, then expand the values to a single string that
may contain whitespace. The first problem may be solved by using backslashes
or any other quoting mechanism to escape the white space in the patterns.
Related questions:
Using a variable in a regex
edited Sep 19 at 20:44
answered Jun 2 at 12:30
Isaac
7,52911037
7,52911037
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f382054%2fhow-does-storing-the-regular-expression-in-a-shell-variable-avoid-problems-with%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
You said it 's from the Bash Manual. Where exactly is this to read? I can't find it, even if I try
man bash | grep -A2 -B2 "regular expression"
.â John Goofy
Jul 27 '17 at 13:03
1
@JohnGoofy, the man page is a stripped down version of the manual. You may want to look at the info page instead. Tim has also added a link to the online manual in his answer.
â Stéphane Chazelas
Jul 27 '17 at 13:54