Bash regex and IFS split

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












0















I have the following problem: I want to extract text that is inside brackets from a string (with or without the brackets). My string looks like this:



STR="[1] [2][345] [678 9] foo bar"



I initially wanted to use bash regex and BASH_REMATCH. I ended up using the following code:



regex='[([^]]*)](.*)'
MATCHES=()
STR="[1] [2][345] [678 9] foo bar"
while [[ -n $STR && $STR =~ $regex ]];
do
MATCHES+=("$BASH_REMATCH[1]")
STR=$BASH_REMATCH[2]
echo -e "matches: $BASH_REMATCH[1] -> $BASH_REMATCH[2]"
done


This kind of worked but my issue was that it would only capture one character inside the brackets, hence [345] would result in 3.



I could not figure out why that was happening so I ended up using grep and PCRE after all. My current solution is



regex="[[^]]*?]"
if [[ $(grep -o '[.*]' <<< $STR) ]];
then
MATCHES=$(grep -oP "$regex" <<< $STR)
else
echo "No special flags provided."
exit 0
fi


I then proceed to a for loop:



for arg in $MATCHES;
do
echo $arg
done


The problem is that it does not separate the fields as I would want them to. I used hexdump in order to find out the proper delimiter:



hexdump -C <<< $MATCHES



which, to my surprise, showed that the delimiter is in hex 0a, the LF. That was not an issue as I know that for loop uses IFS for splitting. I then set IFS to LF by using IFS=$'n'. To my (once again) surprise, that set the value of IFS to 0a0a, according to hexdump again. So that did not work. I then set the value of IFS to IFS='' and that (for my third surprise) set the value to 0a. But that did not work either, the for loop did not change behavior. Perhaps the scope of IFS was not set correctly by my script?



My questions are the following:



1) Why did the original bash only regex approach did not work? Why was it only capturing a single character? regex101 dot com showed the expected behavior, but then again, it does not provide a bash regex mode.



2) Why does the IFS set not work as I would have expected? It adds an "extra" LF, even when I set it to empty.



3) Why does IFS not seem to affect the for loop?



4) Is there a simpler way for me to tackle the original problem (extracting [foo] [bar] [foo bar] from strings like [foo] [bar] 1 asdf[foo bar], in a way that I can loop for each bracket pair).




Bonus question!



B) I am confused as to when I should enclose a variable or expression in quotes or double-quotes. I have read a bit about globbing and parameter expansion and I am now looking for something more in-depth. Any recommendations?










share|improve this question
























  • See Security implications of forgetting to quote a variable in bash/POSIX shells and all the questions it links to for your bonus question.

    – Stéphane Chazelas
    Feb 27 at 14:01











  • Thank you very much, that was an interesting read. I will keep those points in mind when writing scripts from now on.

    – Nikolaos Paschos
    Mar 1 at 8:36















0















I have the following problem: I want to extract text that is inside brackets from a string (with or without the brackets). My string looks like this:



STR="[1] [2][345] [678 9] foo bar"



I initially wanted to use bash regex and BASH_REMATCH. I ended up using the following code:



regex='[([^]]*)](.*)'
MATCHES=()
STR="[1] [2][345] [678 9] foo bar"
while [[ -n $STR && $STR =~ $regex ]];
do
MATCHES+=("$BASH_REMATCH[1]")
STR=$BASH_REMATCH[2]
echo -e "matches: $BASH_REMATCH[1] -> $BASH_REMATCH[2]"
done


This kind of worked but my issue was that it would only capture one character inside the brackets, hence [345] would result in 3.



I could not figure out why that was happening so I ended up using grep and PCRE after all. My current solution is



regex="[[^]]*?]"
if [[ $(grep -o '[.*]' <<< $STR) ]];
then
MATCHES=$(grep -oP "$regex" <<< $STR)
else
echo "No special flags provided."
exit 0
fi


I then proceed to a for loop:



for arg in $MATCHES;
do
echo $arg
done


The problem is that it does not separate the fields as I would want them to. I used hexdump in order to find out the proper delimiter:



hexdump -C <<< $MATCHES



which, to my surprise, showed that the delimiter is in hex 0a, the LF. That was not an issue as I know that for loop uses IFS for splitting. I then set IFS to LF by using IFS=$'n'. To my (once again) surprise, that set the value of IFS to 0a0a, according to hexdump again. So that did not work. I then set the value of IFS to IFS='' and that (for my third surprise) set the value to 0a. But that did not work either, the for loop did not change behavior. Perhaps the scope of IFS was not set correctly by my script?



My questions are the following:



1) Why did the original bash only regex approach did not work? Why was it only capturing a single character? regex101 dot com showed the expected behavior, but then again, it does not provide a bash regex mode.



2) Why does the IFS set not work as I would have expected? It adds an "extra" LF, even when I set it to empty.



3) Why does IFS not seem to affect the for loop?



4) Is there a simpler way for me to tackle the original problem (extracting [foo] [bar] [foo bar] from strings like [foo] [bar] 1 asdf[foo bar], in a way that I can loop for each bracket pair).




Bonus question!



B) I am confused as to when I should enclose a variable or expression in quotes or double-quotes. I have read a bit about globbing and parameter expansion and I am now looking for something more in-depth. Any recommendations?










share|improve this question
























  • See Security implications of forgetting to quote a variable in bash/POSIX shells and all the questions it links to for your bonus question.

    – Stéphane Chazelas
    Feb 27 at 14:01











  • Thank you very much, that was an interesting read. I will keep those points in mind when writing scripts from now on.

    – Nikolaos Paschos
    Mar 1 at 8:36













0












0








0








I have the following problem: I want to extract text that is inside brackets from a string (with or without the brackets). My string looks like this:



STR="[1] [2][345] [678 9] foo bar"



I initially wanted to use bash regex and BASH_REMATCH. I ended up using the following code:



regex='[([^]]*)](.*)'
MATCHES=()
STR="[1] [2][345] [678 9] foo bar"
while [[ -n $STR && $STR =~ $regex ]];
do
MATCHES+=("$BASH_REMATCH[1]")
STR=$BASH_REMATCH[2]
echo -e "matches: $BASH_REMATCH[1] -> $BASH_REMATCH[2]"
done


This kind of worked but my issue was that it would only capture one character inside the brackets, hence [345] would result in 3.



I could not figure out why that was happening so I ended up using grep and PCRE after all. My current solution is



regex="[[^]]*?]"
if [[ $(grep -o '[.*]' <<< $STR) ]];
then
MATCHES=$(grep -oP "$regex" <<< $STR)
else
echo "No special flags provided."
exit 0
fi


I then proceed to a for loop:



for arg in $MATCHES;
do
echo $arg
done


The problem is that it does not separate the fields as I would want them to. I used hexdump in order to find out the proper delimiter:



hexdump -C <<< $MATCHES



which, to my surprise, showed that the delimiter is in hex 0a, the LF. That was not an issue as I know that for loop uses IFS for splitting. I then set IFS to LF by using IFS=$'n'. To my (once again) surprise, that set the value of IFS to 0a0a, according to hexdump again. So that did not work. I then set the value of IFS to IFS='' and that (for my third surprise) set the value to 0a. But that did not work either, the for loop did not change behavior. Perhaps the scope of IFS was not set correctly by my script?



My questions are the following:



1) Why did the original bash only regex approach did not work? Why was it only capturing a single character? regex101 dot com showed the expected behavior, but then again, it does not provide a bash regex mode.



2) Why does the IFS set not work as I would have expected? It adds an "extra" LF, even when I set it to empty.



3) Why does IFS not seem to affect the for loop?



4) Is there a simpler way for me to tackle the original problem (extracting [foo] [bar] [foo bar] from strings like [foo] [bar] 1 asdf[foo bar], in a way that I can loop for each bracket pair).




Bonus question!



B) I am confused as to when I should enclose a variable or expression in quotes or double-quotes. I have read a bit about globbing and parameter expansion and I am now looking for something more in-depth. Any recommendations?










share|improve this question
















I have the following problem: I want to extract text that is inside brackets from a string (with or without the brackets). My string looks like this:



STR="[1] [2][345] [678 9] foo bar"



I initially wanted to use bash regex and BASH_REMATCH. I ended up using the following code:



regex='[([^]]*)](.*)'
MATCHES=()
STR="[1] [2][345] [678 9] foo bar"
while [[ -n $STR && $STR =~ $regex ]];
do
MATCHES+=("$BASH_REMATCH[1]")
STR=$BASH_REMATCH[2]
echo -e "matches: $BASH_REMATCH[1] -> $BASH_REMATCH[2]"
done


This kind of worked but my issue was that it would only capture one character inside the brackets, hence [345] would result in 3.



I could not figure out why that was happening so I ended up using grep and PCRE after all. My current solution is



regex="[[^]]*?]"
if [[ $(grep -o '[.*]' <<< $STR) ]];
then
MATCHES=$(grep -oP "$regex" <<< $STR)
else
echo "No special flags provided."
exit 0
fi


I then proceed to a for loop:



for arg in $MATCHES;
do
echo $arg
done


The problem is that it does not separate the fields as I would want them to. I used hexdump in order to find out the proper delimiter:



hexdump -C <<< $MATCHES



which, to my surprise, showed that the delimiter is in hex 0a, the LF. That was not an issue as I know that for loop uses IFS for splitting. I then set IFS to LF by using IFS=$'n'. To my (once again) surprise, that set the value of IFS to 0a0a, according to hexdump again. So that did not work. I then set the value of IFS to IFS='' and that (for my third surprise) set the value to 0a. But that did not work either, the for loop did not change behavior. Perhaps the scope of IFS was not set correctly by my script?



My questions are the following:



1) Why did the original bash only regex approach did not work? Why was it only capturing a single character? regex101 dot com showed the expected behavior, but then again, it does not provide a bash regex mode.



2) Why does the IFS set not work as I would have expected? It adds an "extra" LF, even when I set it to empty.



3) Why does IFS not seem to affect the for loop?



4) Is there a simpler way for me to tackle the original problem (extracting [foo] [bar] [foo bar] from strings like [foo] [bar] 1 asdf[foo bar], in a way that I can loop for each bracket pair).




Bonus question!



B) I am confused as to when I should enclose a variable or expression in quotes or double-quotes. I have read a bit about globbing and parameter expansion and I am now looking for something more in-depth. Any recommendations?







bash grep regular-expression wildcards






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 27 at 15:11









Jeff Schaller

43.9k1161141




43.9k1161141










asked Feb 27 at 11:16









Nikolaos PaschosNikolaos Paschos

355




355












  • See Security implications of forgetting to quote a variable in bash/POSIX shells and all the questions it links to for your bonus question.

    – Stéphane Chazelas
    Feb 27 at 14:01











  • Thank you very much, that was an interesting read. I will keep those points in mind when writing scripts from now on.

    – Nikolaos Paschos
    Mar 1 at 8:36

















  • See Security implications of forgetting to quote a variable in bash/POSIX shells and all the questions it links to for your bonus question.

    – Stéphane Chazelas
    Feb 27 at 14:01











  • Thank you very much, that was an interesting read. I will keep those points in mind when writing scripts from now on.

    – Nikolaos Paschos
    Mar 1 at 8:36
















See Security implications of forgetting to quote a variable in bash/POSIX shells and all the questions it links to for your bonus question.

– Stéphane Chazelas
Feb 27 at 14:01





See Security implications of forgetting to quote a variable in bash/POSIX shells and all the questions it links to for your bonus question.

– Stéphane Chazelas
Feb 27 at 14:01













Thank you very much, that was an interesting read. I will keep those points in mind when writing scripts from now on.

– Nikolaos Paschos
Mar 1 at 8:36





Thank you very much, that was an interesting read. I will keep those points in mind when writing scripts from now on.

– Nikolaos Paschos
Mar 1 at 8:36










1 Answer
1






active

oldest

votes


















3














To match any non-empty string of characters that does not contain a ], use [^]]+.



Using [^]]* would match a non- followed by zero or more ]. This is why you managed to parse out the 1 and the 2 but not the other strings.



The IFS variable does not come into play in your first piece of code. Variables inside [[ ... ]] does not need double quoting.



To print the separate elements of an array, use



printf '%sn' "$MATCHES[@]"


or



for elem in "$MATCHES[@]"; do
printf '%sn' "$elem"
done


Just $MATCHES would expand to only the first element of the array (and would apply word splitting and filename globbing to the value).






share|improve this answer























  • I see, my main problem was that I was trying to escape the brackets (hence the [ and ]). After removing the brackets and following your comment I got it working in a flash! Thank you for the help!

    – Nikolaos Paschos
    Feb 27 at 13:54










Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f503308%2fbash-regex-and-ifs-split%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














To match any non-empty string of characters that does not contain a ], use [^]]+.



Using [^]]* would match a non- followed by zero or more ]. This is why you managed to parse out the 1 and the 2 but not the other strings.



The IFS variable does not come into play in your first piece of code. Variables inside [[ ... ]] does not need double quoting.



To print the separate elements of an array, use



printf '%sn' "$MATCHES[@]"


or



for elem in "$MATCHES[@]"; do
printf '%sn' "$elem"
done


Just $MATCHES would expand to only the first element of the array (and would apply word splitting and filename globbing to the value).






share|improve this answer























  • I see, my main problem was that I was trying to escape the brackets (hence the [ and ]). After removing the brackets and following your comment I got it working in a flash! Thank you for the help!

    – Nikolaos Paschos
    Feb 27 at 13:54















3














To match any non-empty string of characters that does not contain a ], use [^]]+.



Using [^]]* would match a non- followed by zero or more ]. This is why you managed to parse out the 1 and the 2 but not the other strings.



The IFS variable does not come into play in your first piece of code. Variables inside [[ ... ]] does not need double quoting.



To print the separate elements of an array, use



printf '%sn' "$MATCHES[@]"


or



for elem in "$MATCHES[@]"; do
printf '%sn' "$elem"
done


Just $MATCHES would expand to only the first element of the array (and would apply word splitting and filename globbing to the value).






share|improve this answer























  • I see, my main problem was that I was trying to escape the brackets (hence the [ and ]). After removing the brackets and following your comment I got it working in a flash! Thank you for the help!

    – Nikolaos Paschos
    Feb 27 at 13:54













3












3








3







To match any non-empty string of characters that does not contain a ], use [^]]+.



Using [^]]* would match a non- followed by zero or more ]. This is why you managed to parse out the 1 and the 2 but not the other strings.



The IFS variable does not come into play in your first piece of code. Variables inside [[ ... ]] does not need double quoting.



To print the separate elements of an array, use



printf '%sn' "$MATCHES[@]"


or



for elem in "$MATCHES[@]"; do
printf '%sn' "$elem"
done


Just $MATCHES would expand to only the first element of the array (and would apply word splitting and filename globbing to the value).






share|improve this answer













To match any non-empty string of characters that does not contain a ], use [^]]+.



Using [^]]* would match a non- followed by zero or more ]. This is why you managed to parse out the 1 and the 2 but not the other strings.



The IFS variable does not come into play in your first piece of code. Variables inside [[ ... ]] does not need double quoting.



To print the separate elements of an array, use



printf '%sn' "$MATCHES[@]"


or



for elem in "$MATCHES[@]"; do
printf '%sn' "$elem"
done


Just $MATCHES would expand to only the first element of the array (and would apply word splitting and filename globbing to the value).







share|improve this answer












share|improve this answer



share|improve this answer










answered Feb 27 at 11:49









KusalanandaKusalananda

137k17258426




137k17258426












  • I see, my main problem was that I was trying to escape the brackets (hence the [ and ]). After removing the brackets and following your comment I got it working in a flash! Thank you for the help!

    – Nikolaos Paschos
    Feb 27 at 13:54

















  • I see, my main problem was that I was trying to escape the brackets (hence the [ and ]). After removing the brackets and following your comment I got it working in a flash! Thank you for the help!

    – Nikolaos Paschos
    Feb 27 at 13:54
















I see, my main problem was that I was trying to escape the brackets (hence the [ and ]). After removing the brackets and following your comment I got it working in a flash! Thank you for the help!

– Nikolaos Paschos
Feb 27 at 13:54





I see, my main problem was that I was trying to escape the brackets (hence the [ and ]). After removing the brackets and following your comment I got it working in a flash! Thank you for the help!

– Nikolaos Paschos
Feb 27 at 13:54

















draft saved

draft discarded
















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f503308%2fbash-regex-and-ifs-split%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown






Popular posts from this blog

How to check contact read email or not when send email to Individual?

Bahrain

Postfix configuration issue with fips on centos 7; mailgun relay