mjhjmtu

Question

Or, an introductory guide to robust filename handling and other string passing in shell scripts.

I wrote a shell script which works well most of the time. But it chokes on some inputs (e.g. on some file names).

I encountered a problem such as the following:

I have a file name containing a space hello world, and it was treated as two separate files hello and world.

I have an input line with two consecutive spaces and they shrank to one in the input.

Leading and trailing whitespace disappears from input lines.

Sometimes, when the input contains one of the characters [*?, they are
replaced by some text which is is actually the name of files.

There is an apostrophe ' (or a double quote ") in the input and things got weird after that point.

There is a backslash in the input (or: I am using Cygwin and some of my file names have Windows-style separators).

What is going on and how do I fix it?

shellcheck help you to improve the quality of your programs. — Sep 23 '16 at 6:33
Besides the protective techniques described in the answers, and although it is probably obvious to most readers, I think it may be worth commenting that when files are intended to be processed using command-line tools, it is good practice to avoid fancy characters in the names in the first place, if possible. — Nov 10 '16 at 13:30
There are now also tools to rewrite shell scripts with proper quoting. — Jan 20 '17 at 19:06
@bli No, that makes only bugs take longer to turn up. It's hiding bugs today. And now, you don't know all filenames later used with your code. — Aug 7 at 6:16

Communityâ™¦ 1 · Accepted Answer · 2017-05-23 12:40:03Z

Always use double quotes around variable substitutions and command substitutions: `"$foo"`, `"$(foo)"`

If you use $foo unquoted, your script will choke on input or parameters (or command output, with $(foo)) containing whitespace or [*?.

There, you can stop reading. Well, ok, here are a few more:

read Ã¢Â€Â” To read input line by line with the read builtin, use while IFS= read -r line; do Ã¢Â€Â¦

Plain read treats backslashes and whitespace specially.

xargs Ã¢Â€Â” Avoid xargs. If you must use xargs, make that xargs -0. Instead of find Ã¢Â€Â¦ | xargs, prefer find Ã¢Â€Â¦Ã‚Â -exec Ã¢Â€Â¦.
xargs treats whitespace and the characters "' specially.

This answer applies to Bourne/POSIX-style shells (sh, ash, dash, bash, ksh, mksh, yashÃ¢Â€Â¦). Zsh users should skip it and read the end of When is double-quoting necessary? instead. If you want the whole nitty-gritty, read the standard or your shell's manual.

Note that the explanations below contains a few approximations (statements that are true in most conditions but can be affected by the surrounding context or by configuration).

Why do I need to write `"$foo"`? What happens without the quotes?

$foo does not mean Ã¢Â€Âœtake the value of the variable fooÃ¢Â€Â. It means something much more complex:

First, take the value of the variable.

Field splitting: treat that value as a whitespace-separated list of fields, and build the resulting list. For example, if the variable contains foo * bar Ã¢Â€Â‹ then the result of this step is the 3-element list foo, *, bar.

Filename generation: treat each field as a glob, i.e. as a wildcard pattern, and replace it by the list of file names that match this pattern. If the pattern doesn't match any files, it is left unmodified. In our example, this results in the list containing foo, following by the list of files in the current directory, and finally bar. If the current directory is empty, the result is foo, *, bar.

Note that the result is a list of strings. There are two contexts in shell syntax: list context and string context. Field splitting and filename generation only happen in list context, but that's most of the time. Double quotes delimit a string context: the whole double-quoted string is a single string, not to be split. (Exception: "$@" to expand to the list of positional parameters, e.g. "$@" is equivalent to "$1" "$2" "$3" if there are three positional parameters. See What is the difference between $* and $@?)

The same happens to command substitution with $(foo) or with `foo`. On a side note, don't use `foo`: its quoting rules are weird and non-portable, and all modern shells support $(foo) which is absolutely equivalent except for having intuitive quoting rules.

The output of arithmetic substitution also undergoes the same expansions, but that isn't normally a concern as it only contains non-expandable characters (assuming IFS doesn't contain digits or -).

See When is double-quoting necessary? for more details about the cases when you can leave out the quotes.

Unless you mean for all this rigmarole to happen, just remember to always use double quotes around variable and command substitutions. Do take care: leaving out the quotes can lead not just to errors but to security holes.

How do I process a list of file names?

If you write myfiles="file1 file2", with spaces to separate the files, this can't work with file names containing spaces. Unix file names can contain any character other than / (which is always a directory separator) and null bytes (which you can't use in shell scripts with most shells).

Same problem with myfiles=*.txt; Ã¢Â€Â¦ process $myfiles. When you do this, the variable myfiles contains the 5-character string *.txt, and it's when you write $myfiles that the wildcard is expanded. This example will actually work, until you change your script to be myfiles="$someprefix*.txt"; Ã¢Â€Â¦ process $myfiles. If someprefix is set to final report, this won't work.

To process a list of any kind (such as file names), put it in an array. This requires mksh, ksh93, yash or bash (or zsh, which doesn't have all these quoting issues); a plain POSIX shell (such as ash or dash) doesn't have array variables.

myfiles=("$someprefix"*.txt)
process "$myfiles[@]"

Ksh88 has array variables with a different assignment syntax set -A myfiles "someprefix"*.txt (see assignation variable under different ksh environment if you need ksh88/bash portability). Bourne/POSIX-style shells have a single one array, the array of positional parameters "$@" which you set with set and which is local to a function:

set -- "$someprefix"*.txt
process -- "$@"

What about file names that begin with `-`?

On a related note, keep in mind that file names can begin with a - (dash/minus), which most commands interpret as denoting an option. If you have a file name that begins with a variable part, be sure to pass -- before it, as in the snippet above. This indicates to the command that it has reached the end of options, so anything after that is a file name even if it starts with -.

Alternatively, you can make sure that your file names begin with a character other than -. Absolute file names begin with /, and you can add ./ at the beginning of relative names. The following snippet turns the content of the variable f into a Ã¢Â€ÂœsafeÃ¢Â€Â way of refering to the same file that's guaranteed not to start with -.

case "$f" in -*) "f=./$f";; esac

On a final note on this topic, beware that some commands interpret - as meaning standard input or standard output, even after --. If you need to refer to an actual file named -, or if you're calling such a program and you don't want it to read from stdin or write to stdout, make sure to rewrite - as above. See What is the difference between "du -sh *" and "du -sh ./*"? for further discussion.

How do I store a command in a variable?

Ã¢Â€ÂœCommandÃ¢Â€Â can mean three things: a command name (the name as an executable, with or without full path, or the name of a function, builtin or alias), a command name with arguments, or a piece of shell code. There are accordingly different ways of storing them in a variable.

If you have a command name, just store it and use the variable with double quotes as usual.

command_path="$1"
Ã¢Â€Â¦
"$command_path" --option --message="hello world"

If you have a command with arguments, the problem is the same as with a list of file names above: this is a list of strings, not a string. You can't just stuff the arguments into a single string with spaces in between, because if you do that you can't tell the difference between spaces that are part of arguments and spaces that separate arguments. If your shell has arrays, you can use them.

cmd=(/path/to/executable --option --message="hello world" --)
cmd=("$cmd[@]" "$file1" "$file2")
"$cmd[@]"

What if you're using a shell without arrays? You can still use the positional parameters, if you don't mind modifying them.

set -- /path/to/executable --option --message="hello world" --
set -- "$@" "$file1" "$file2"
"$@"

What if you need to store a complex shell command, e.g. with redirections, pipes, etc.? Or if you don't want to modify the positional parameters? Then you can build a string containing the command, and use the eval builtin.

code='/path/to/executable --option --message="hello world" -- /path/to/file1 | grep "interesting stuff"'
eval "$code"

Note the nested quotes in the definition of code: the single quotes 'Ã¢Â€Â¦' delimit a string literal, so that the value of the variable code is the string /path/to/executable --option --message="hello world" -- /path/to/file1. The eval builtin tells the shell to parse the string passed as an argument as if it appeared in the script, so at that point the quotes and pipe are parsed, etc.

Using eval is tricky. Think carefully about what gets parsed when. In particular, you can't just stuff a file name into the code: you need to quote it, just like you would if it was in a source code file. There's no direct way to do that. Something like code="$code $filename" breaks if the file name contains any shell special character (spaces, $, ;, |, <, >, etc.). code="$code "$filename"" still breaks on "$`. Even code="$code '$filename'" breaks if the file name contains a '. There are two solutions.

Add a layer of quotes around the file name. The easiest way to do that is to add single quotes around it, and replace single quotes by '''.
```
quoted_filename=$(printf %s. "$filename" | sed "s/'/'\\''/g")
code="$code '$quoted_filename%.'"
```

Keep the variable expansion inside the code, so that it's looked up when the code is evaluated, not when the code fragment is built. This is simpler but only works if the variable is still around with the same value at the time the code is executed, not e.g. if the code is built in a loop.
```
code="$code "$filename""
```

Finally, do you really need a variable containing code? The most natural way to give a name to a code block is to define a function.

What's up with `read`?

Without -r, read allows continuation lines Ã¢Â€Â” this is a single logical line of input:

hello 
world

read splits the input line into fields delimited by characters in $IFS (without -r, backslash also escapes those). For example, if the input is a line containing three words, then read first second third sets first to the first word of input, second to the second word and third to the third word. If there are more words, the last variable contains everything that's left after setting the preceding ones. Leading and trailing whitespace are trimmed.

Setting IFS to the empty string avoids any trimming. See Why is `while IFS= read` used so often, instead of `IFS=; while read..`? for a longer explanation.

What's wrong with `xargs`?

The input format of xargs is whitespace-separated strings which can optionally be single- or double-quoted. No standard tool outputs this format.

The input to xargs -L1 or xargs -l is almost a list of lines, but not quite Ã¢Â€Â”Ã‚Â if there is a space at the end of a line, the following line is a continuation line.

You can use xargs -0 where applicable (and where available: GNU (Linux, Cygwin), BusyBox, BSD, OSX, but it isn't in POSIX). That's safe, because null bytes can't appear in most data, in particular in file names. To produce a null-separated list of file names, use find Ã¢Â€Â¦ -print0 (or you can use find Ã¢Â€Â¦ -exec Ã¢Â€Â¦ as explained below).

How do I process files found by `find`?

find Ã¢Â€Â¦ -exec some_command a_parameter another_parameter +

some_command needs to be an external command, it can't be a shell function or alias. If you need to invoke a shell to process the files, call sh explicitly.

find Ã¢Â€Â¦ -exec sh -c '
 for x do
 Ã¢Â€Â¦ # process the file "$x"
 done
' find-sh +

I have some other question

Browse the quoting tag on this site, or shell or shell-script. (Click on Ã¢Â€Âœlearn moreÃ¢Â€Â¦Ã¢Â€Â to see some general tips and a hand-selected list of common questions.) If you've searched and you can't find an answer, ask away.

+1 Nicely done. I have just one addition regarding "The input format of xargs is whitespace-separated strings which can optionally be single- or double-quoted. No standard tool outputs this format": GNU's ls --quoting-style=shell-always provides this format and often seems useful. — May 24 '14 at 5:00
@John1024 It's a GNU feature only, so I'll stick with Ã¢Â€Âœno standard toolÃ¢Â€Â. — May 24 '14 at 5:02
You also need quotes around $(( ... )) (also $[...] in some shells) except in zsh (even in sh emulation) and mksh. — May 24 '14 at 6:39
Note that xargs -0 is not POSIX. Except with FreeBSD xargs, you generally want xargs -r0 instead of xargs -0. — May 24 '14 at 6:41
Another nice (GNU-only) feature is xargs -d "n" so that you can run e.g. locate PATTERN1 |xargs -d "n" grep PATTERN2 to search for file names matching PATTERN1 with content matching PATTERN2. Without GNU, you can do it e.g. like locate PATTERN1 |perl -pne 's/n//' |xargs -0 grep PATTERN1 — Jan 16 '15 at 4:29

score 20 · Answer 2 · 2016-05-25 14:21:43Z

up vote
20
down vote

While Gilles answer is excellent, I take issue at his main point

Always use double quotes around variable substitutions and command
substitutions: "$foo", "$(foo)"

When you are starting out with a Bash-like shell that does word splitting, yes of
course the safe advice is always use quotes. However word splitting is not
always performed

Â§ Word Splitting

These commands can be run without error

foo=$bar
bar=$(a command)
logfile=$logdir/foo-$(date +%Y%m%d)
PATH=/usr/local/bin:$PATH ./myscript
case $foo in bar) echo bar ;; baz) echo baz ;; esac

I am not encouraging users to adopt this behavior, but if someone firmly
understands when word splitting occurs then they should be able to decide for
themselves when to use quotes.

edited May 25 '16 at 14:21

answered May 24 '14 at 8:05

Steven Penny

2,31121535

14

As I mention in my answer, see unix.stackexchange.com/questions/68694/â€¦ for details. Do notice the question Ã¢Â€Â” Ã¢Â€ÂœWhy does my shell script choke?Ã¢Â€Â. The most common problem (from years of experience on this site and elsewhere) is missing double quotes. Ã¢Â€ÂœAlways use double quotesÃ¢Â€Â is easier to remember than Ã¢Â€Âœalways use double quotes, except for these cases where they aren't necessaryÃ¢Â€Â.
â€“Â Gilles
May 24 '14 at 8:25

11

Rules are difficult to understand for beginners. For instance, foo=$bar is OK, but export foo=$bar or env foo=$var are not (at least in some shells). An advise for beginner: always quote your variables unless you know what you're doing and have a good reason not to.
â€“Â StÃ©phane Chazelas
May 24 '14 at 17:05

5

@StevenPenny Is it really more correct? Are there reasonable cases where quotes would break the script? In situations where in half cases quotes must be used, and in other half quotes may be used optionally - then a recommendation "always use quotes, just in case" is the one that should be thought, since it's true, simple and less risky. Teaching such lists of exceptions to beginners is well known to be ineffective (lacking context, they won't remember them) and counterproductive, as they'll confuse needed/unneeded quotes, breaking their scripts and demotivating them to learn further.
â€“Â Peteris
May 25 '14 at 8:20

6

My $0.02 would be that recommending to quote everything is good advice. Mistakenly quoting something that doesn't need it is harmless, mistakenly failing to quote something that does need it is harmful. So, for the majority of shell script authors who will never understand the intricacies of when exactly word splitting occurs, quoting everything is much safer than trying to quote only where necessary.
â€“Â godlygeek
May 25 '14 at 17:03

5

@Peteris and godlygeek: "Are there reasonable cases where quotes would break the script?" It depends on your definition of "reasonable". If a script sets criteria="-type f", then find . $criteria works but find . "$criteria" doesn't.
â€“Â G-Man
Aug 28 '14 at 19:06

Â |Â
show 7 more comments

score 17 · Answer 3 · 2016-10-18 05:42:19Z

As far as I know, there are only two cases in which it is necessary to double-quote expansions, and those cases involve the two special shell parameters "$@" and "$*" - which are specified to expand differently when enclosed in double-quotes. In all other cases (excluding, perhaps, shell-specific array implementations) the behavior of an expansion is a configurable thing - there are options for that.

This is not to say, of course, that double-quoting should be avoided - to the contrary, it is probably the most convenient and robust method of delimiting an expansion which the shell has to offer. But, I think, as alternatives have already been expertly expounded, this an excellent place to discuss what happens when the shell expands a value.

The shell, in its heart and soul (for those that have such), is a command-interpreter - it is a parser, like a big, interactive, sed. If your shell statement is choking on whitespace or similar then it is very likely because you have not fully understood the shell's interpretation process - especially how and why it translates an input statement to an actionable command. The shell's job is to:

accept input

interpret and split it correctly into tokenized input words
- input words are the shell syntax items such as $word or echo $words 3 4* 5
- words are always split on whitespace - that's just syntax - but only the literal whitespace characters served to the shell in its input file

expand those if necessary into multiple fields
- fields result from word expansions - they make up the final executable command
- excepting "$@", $IFS field-splitting, and pathname expansion an input word must always evaluate to a single field.

and then to execute the resulting command
- in most cases this involves passing on the results of its interpretation in some form or another

People often say the shell is a glue, and, if this is true, then what it is sticking is lists of arguments - or fields - to one process or another when it execs them. Most shells do not handle the NUL byte well - if at all - and this is because they're already splitting on it. The shell has to exec a lot and it must do this with a NUL delimited array of arguments that it hands to the system kernel at exec time. If you were to intermingle the shell's delimiter with its delimited data then the shell would probably screw it up. Its internal data structures - like most programs - rely on that delimiter. zsh, notably, does not screw this up.

And that's where $IFS comes in. $IFS is an always present - and likewise settable - shell parameter that defines how the shell should split shell expansions from word to field - specifically on what values those fields should delimit. $IFS splits shell expansions on delimiters other than NUL - or, in other words the shell substitutes bytes resulting from an expansion that match those in the value of $IFS with NUL in its internal data-arrays. When you look at it like that you might begin to see that every field-split shell expansion is an $IFS-delimited data array.

It's important to understand that $IFS only delimits expansions that are not already otherwise delimited - which you can do with "double-quotes. When you quote an expansion you delimit it at the head and at least to the tail of its value. In those cases $IFS does not apply as there are no fields to separate. In fact, a double-quoted expansion exhibits identical field-splitting behavior to an unquoted expansion when IFS= is set to an empty value.

Unless quoted, $IFS is itself an $IFS delimited shell expansion. It defaults to a specified value of <space><tab><newline> - all three of which exhibit special properties when contained within $IFS. Whereas any other value for $IFS is specified to evaluate to a single field per expansion occurrence, $IFS whitespace - any of those three - is specified to elide to a single field per expansion sequence and leading/trailing sequences are elided entirely. This is probably easiest to understand via example.

slashes=///// spaces=' '
IFS=/; printf '<%s>' $slashes$spaces
<><><><><>< >
IFS=' '; printf '<%s>' $slashes$spaces
</////>
IFS=; printf '<%s>' $slashes$spaces
<///// >
unset IFS; printf '<%s>' "$slashes$spaces"
<///// >

But that's just $IFS - just the word-splitting or whitespace as asked, so what of the the special characters?

The shell - by default - will also expand certain unquoted tokens (such as ?*[ as noted elsewhere here) into multiple fields when they occur in a list. This is called pathname expansion, or globbing. It is an incredibly useful tool, and, as it occurs after field-splitting in the shell's parse-order it is not affected by $IFS - fields generated by a pathname expansion are delimited on the head/tail of the filenames themselves regardless of whether their contents contain any characters currently in $IFS. This behavior is set to on by default - but it is very easily configured otherwise.

set -f

That instructs the shell not to glob. Pathname expansion will not occur at least until that setting is somehow undone - such as if the current shell is replaced with another new shell process or....

set +f

...is issued to the shell. Double-quotes - as they also do for $IFS field-splitting - render this global setting unnecessary per expansion. So:

echo "*" *

...if pathname expansion is currently enabled will likely produce very different results per argument - as the first will expand only to its literal value (the single asterisk character, which is to say, not at all) and the second only to the same if the current working directory contains no filenames which might match (and it matches nearly all of them). However if you do:

set -f; echo "*" *

...the results for both arguments are identical - the * does not expand in that case.

I actually agree with @StÃ©phaneChazelas that it (mostly) confuses things more than helping...but I found it helpful, personally, so I upvoted. I now have a better idea (and some examples) of how IFS actually works. What I don't get is why it would ever be a good idea to set IFS to something other than default. — Jan 9 '16 at 4:12
@Wildcard - it's a field delimiter. if you have a value in a variable that you want to expand to multiple fields you split it on $IFS. cd /usr/bin; set -f; IFS=/; for path_component in $PWD; do echo $path_component; done prints n then usrn then binn. The first echo is empty because / is a null field. The path_components can have newlines or spaces or whatever - wouldn't matter because the components were split on / and not the default value. people do it w/ awk all the time, anyway. your shell does it too — Jan 9 '16 at 6:40

score 0 · Answer 4 · 2018-02-28 13:35:13Z

I had a large video project with spaces in filenames and spaces in directory names. While find -type f -print0 | xargs -0 works for several purposes and across different shells, I find that using a custom IFS (input field separator) gives you more flexibility if you're using bash. The snippet below uses bash and sets IFS to just a newline; provided there aren't newlines in your filenames:

(IFS=$'n'; for i in $(find -type f -print) ; do
 echo ">>>$i<<<"
done)

Note the use of parens to isolate the redefinition of IFS. I've read other posts about how to recover IFS, but this is just easier.

More, setting IFS to newline lets you set shell variables beforehand and easily print them out. For instance, I can grow a variable V incrementally using newlines as separators:

V=""
V="./Ralphie's Camcorder/STREAM/00123.MTS,04:58,05:52,-vf yadif"
V="$V"$'n'"./Ralphie's Camcorder/STREAM/00111.MTS,00:00,59:59,-vf yadif"
V="$V"$'n'"next item goes here..."

and correspondingly:

(IFS=$'n'; for v in $V ; do
 echo ">>>$v<<<"
done)

Now I can "list" the setting of V with echo "$V" using double quotes to output the newlines. (Credit to this thread for the $'n' explanation.)

But then you'll still have problems with filenames containing newline or glob characters. See also: Why is looping over find's output bad practice?. If using zsh, you can use IFS=$'' and use -print0 (zsh doesn't do globbing upon expansions so glob characters are not a problem there). — Feb 28 at 14:10
This works with file names containing spaces, but it doesn't work against potentially hostile file names or accidental Ã¢Â€ÂœnonsensicalÃ¢Â€Â file names. You can easily fix the issue of file names containing wildcard characters by adding set -f. On the other hand, your approach fundamentally fails with file names containing newlines. When dealing with data other than file names, it also fails with empty items. — Mar 1 at 18:52
Right, my caveat is that it won't work with newlines in filenames. However, I believe we have to draw the line just shy of madness ;-) — Mar 8 at 18:01
And I'm not sure why this received a downvote. This is a perfectly reasonable method for iterating over filenames with spaces. Using -print0 requires xargs, and there are things that are difficult using that chain. I'm sorry someone does not agree with my answer, but that's no reason to downvote it. — Mar 8 at 18:03

Communityâ™¦ 1 · Accepted Answer · 2017-05-23 12:40:03Z

Always use double quotes around variable substitutions and command substitutions: `"$foo"`, `"$(foo)"`

If you use $foo unquoted, your script will choke on input or parameters (or command output, with $(foo)) containing whitespace or [*?.

There, you can stop reading. Well, ok, here are a few more:

read Ã¢Â€Â” To read input line by line with the read builtin, use while IFS= read -r line; do Ã¢Â€Â¦

Plain read treats backslashes and whitespace specially.

xargs Ã¢Â€Â” Avoid xargs. If you must use xargs, make that xargs -0. Instead of find Ã¢Â€Â¦ | xargs, prefer find Ã¢Â€Â¦Ã‚Â -exec Ã¢Â€Â¦.
xargs treats whitespace and the characters "' specially.

This answer applies to Bourne/POSIX-style shells (sh, ash, dash, bash, ksh, mksh, yashÃ¢Â€Â¦). Zsh users should skip it and read the end of When is double-quoting necessary? instead. If you want the whole nitty-gritty, read the standard or your shell's manual.

Note that the explanations below contains a few approximations (statements that are true in most conditions but can be affected by the surrounding context or by configuration).

Why do I need to write `"$foo"`? What happens without the quotes?

$foo does not mean Ã¢Â€Âœtake the value of the variable fooÃ¢Â€Â. It means something much more complex:

First, take the value of the variable.

Field splitting: treat that value as a whitespace-separated list of fields, and build the resulting list. For example, if the variable contains foo * bar Ã¢Â€Â‹ then the result of this step is the 3-element list foo, *, bar.

Filename generation: treat each field as a glob, i.e. as a wildcard pattern, and replace it by the list of file names that match this pattern. If the pattern doesn't match any files, it is left unmodified. In our example, this results in the list containing foo, following by the list of files in the current directory, and finally bar. If the current directory is empty, the result is foo, *, bar.

Note that the result is a list of strings. There are two contexts in shell syntax: list context and string context. Field splitting and filename generation only happen in list context, but that's most of the time. Double quotes delimit a string context: the whole double-quoted string is a single string, not to be split. (Exception: "$@" to expand to the list of positional parameters, e.g. "$@" is equivalent to "$1" "$2" "$3" if there are three positional parameters. See What is the difference between $* and $@?)

The same happens to command substitution with $(foo) or with `foo`. On a side note, don't use `foo`: its quoting rules are weird and non-portable, and all modern shells support $(foo) which is absolutely equivalent except for having intuitive quoting rules.

The output of arithmetic substitution also undergoes the same expansions, but that isn't normally a concern as it only contains non-expandable characters (assuming IFS doesn't contain digits or -).

See When is double-quoting necessary? for more details about the cases when you can leave out the quotes.

Unless you mean for all this rigmarole to happen, just remember to always use double quotes around variable and command substitutions. Do take care: leaving out the quotes can lead not just to errors but to security holes.

How do I process a list of file names?

If you write myfiles="file1 file2", with spaces to separate the files, this can't work with file names containing spaces. Unix file names can contain any character other than / (which is always a directory separator) and null bytes (which you can't use in shell scripts with most shells).

Same problem with myfiles=*.txt; Ã¢Â€Â¦ process $myfiles. When you do this, the variable myfiles contains the 5-character string *.txt, and it's when you write $myfiles that the wildcard is expanded. This example will actually work, until you change your script to be myfiles="$someprefix*.txt"; Ã¢Â€Â¦ process $myfiles. If someprefix is set to final report, this won't work.

To process a list of any kind (such as file names), put it in an array. This requires mksh, ksh93, yash or bash (or zsh, which doesn't have all these quoting issues); a plain POSIX shell (such as ash or dash) doesn't have array variables.

myfiles=("$someprefix"*.txt)
process "$myfiles[@]"

Ksh88 has array variables with a different assignment syntax set -A myfiles "someprefix"*.txt (see assignation variable under different ksh environment if you need ksh88/bash portability). Bourne/POSIX-style shells have a single one array, the array of positional parameters "$@" which you set with set and which is local to a function:

set -- "$someprefix"*.txt
process -- "$@"

What about file names that begin with `-`?

On a related note, keep in mind that file names can begin with a - (dash/minus), which most commands interpret as denoting an option. If you have a file name that begins with a variable part, be sure to pass -- before it, as in the snippet above. This indicates to the command that it has reached the end of options, so anything after that is a file name even if it starts with -.

Alternatively, you can make sure that your file names begin with a character other than -. Absolute file names begin with /, and you can add ./ at the beginning of relative names. The following snippet turns the content of the variable f into a Ã¢Â€ÂœsafeÃ¢Â€Â way of refering to the same file that's guaranteed not to start with -.

case "$f" in -*) "f=./$f";; esac

On a final note on this topic, beware that some commands interpret - as meaning standard input or standard output, even after --. If you need to refer to an actual file named -, or if you're calling such a program and you don't want it to read from stdin or write to stdout, make sure to rewrite - as above. See What is the difference between "du -sh *" and "du -sh ./*"? for further discussion.

How do I store a command in a variable?

Ã¢Â€ÂœCommandÃ¢Â€Â can mean three things: a command name (the name as an executable, with or without full path, or the name of a function, builtin or alias), a command name with arguments, or a piece of shell code. There are accordingly different ways of storing them in a variable.

If you have a command name, just store it and use the variable with double quotes as usual.

command_path="$1"
Ã¢Â€Â¦
"$command_path" --option --message="hello world"

If you have a command with arguments, the problem is the same as with a list of file names above: this is a list of strings, not a string. You can't just stuff the arguments into a single string with spaces in between, because if you do that you can't tell the difference between spaces that are part of arguments and spaces that separate arguments. If your shell has arrays, you can use them.

cmd=(/path/to/executable --option --message="hello world" --)
cmd=("$cmd[@]" "$file1" "$file2")
"$cmd[@]"

What if you're using a shell without arrays? You can still use the positional parameters, if you don't mind modifying them.

set -- /path/to/executable --option --message="hello world" --
set -- "$@" "$file1" "$file2"
"$@"

What if you need to store a complex shell command, e.g. with redirections, pipes, etc.? Or if you don't want to modify the positional parameters? Then you can build a string containing the command, and use the eval builtin.

code='/path/to/executable --option --message="hello world" -- /path/to/file1 | grep "interesting stuff"'
eval "$code"

Note the nested quotes in the definition of code: the single quotes 'Ã¢Â€Â¦' delimit a string literal, so that the value of the variable code is the string /path/to/executable --option --message="hello world" -- /path/to/file1. The eval builtin tells the shell to parse the string passed as an argument as if it appeared in the script, so at that point the quotes and pipe are parsed, etc.

Using eval is tricky. Think carefully about what gets parsed when. In particular, you can't just stuff a file name into the code: you need to quote it, just like you would if it was in a source code file. There's no direct way to do that. Something like code="$code $filename" breaks if the file name contains any shell special character (spaces, $, ;, |, <, >, etc.). code="$code "$filename"" still breaks on "$`. Even code="$code '$filename'" breaks if the file name contains a '. There are two solutions.

Add a layer of quotes around the file name. The easiest way to do that is to add single quotes around it, and replace single quotes by '''.
```
quoted_filename=$(printf %s. "$filename" | sed "s/'/'\\''/g")
code="$code '$quoted_filename%.'"
```

Keep the variable expansion inside the code, so that it's looked up when the code is evaluated, not when the code fragment is built. This is simpler but only works if the variable is still around with the same value at the time the code is executed, not e.g. if the code is built in a loop.
```
code="$code "$filename""
```

Finally, do you really need a variable containing code? The most natural way to give a name to a code block is to define a function.

What's up with `read`?

Without -r, read allows continuation lines Ã¢Â€Â” this is a single logical line of input:

hello 
world

read splits the input line into fields delimited by characters in $IFS (without -r, backslash also escapes those). For example, if the input is a line containing three words, then read first second third sets first to the first word of input, second to the second word and third to the third word. If there are more words, the last variable contains everything that's left after setting the preceding ones. Leading and trailing whitespace are trimmed.

Setting IFS to the empty string avoids any trimming. See Why is `while IFS= read` used so often, instead of `IFS=; while read..`? for a longer explanation.

What's wrong with `xargs`?

The input format of xargs is whitespace-separated strings which can optionally be single- or double-quoted. No standard tool outputs this format.

The input to xargs -L1 or xargs -l is almost a list of lines, but not quite Ã¢Â€Â”Ã‚Â if there is a space at the end of a line, the following line is a continuation line.

You can use xargs -0 where applicable (and where available: GNU (Linux, Cygwin), BusyBox, BSD, OSX, but it isn't in POSIX). That's safe, because null bytes can't appear in most data, in particular in file names. To produce a null-separated list of file names, use find Ã¢Â€Â¦ -print0 (or you can use find Ã¢Â€Â¦ -exec Ã¢Â€Â¦ as explained below).

How do I process files found by `find`?

find Ã¢Â€Â¦ -exec some_command a_parameter another_parameter +

some_command needs to be an external command, it can't be a shell function or alias. If you need to invoke a shell to process the files, call sh explicitly.

find Ã¢Â€Â¦ -exec sh -c '
 for x do
 Ã¢Â€Â¦ # process the file "$x"
 done
' find-sh +

I have some other question

Browse the quoting tag on this site, or shell or shell-script. (Click on Ã¢Â€Âœlearn moreÃ¢Â€Â¦Ã¢Â€Â to see some general tips and a hand-selected list of common questions.) If you've searched and you can't find an answer, ask away.

+1 Nicely done. I have just one addition regarding "The input format of xargs is whitespace-separated strings which can optionally be single- or double-quoted. No standard tool outputs this format": GNU's ls --quoting-style=shell-always provides this format and often seems useful. — May 24 '14 at 5:00
@John1024 It's a GNU feature only, so I'll stick with Ã¢Â€Âœno standard toolÃ¢Â€Â. — May 24 '14 at 5:02
You also need quotes around $(( ... )) (also $[...] in some shells) except in zsh (even in sh emulation) and mksh. — May 24 '14 at 6:39
Note that xargs -0 is not POSIX. Except with FreeBSD xargs, you generally want xargs -r0 instead of xargs -0. — May 24 '14 at 6:41
Another nice (GNU-only) feature is xargs -d "n" so that you can run e.g. locate PATTERN1 |xargs -d "n" grep PATTERN2 to search for file names matching PATTERN1 with content matching PATTERN2. Without GNU, you can do it e.g. like locate PATTERN1 |perl -pne 's/n//' |xargs -0 grep PATTERN1 — Jan 16 '15 at 4:29

score 20 · Answer 6 · 2016-05-25 14:21:43Z

up vote
20
down vote

While Gilles answer is excellent, I take issue at his main point

Always use double quotes around variable substitutions and command
substitutions: "$foo", "$(foo)"

When you are starting out with a Bash-like shell that does word splitting, yes of
course the safe advice is always use quotes. However word splitting is not
always performed

Â§ Word Splitting

These commands can be run without error

foo=$bar
bar=$(a command)
logfile=$logdir/foo-$(date +%Y%m%d)
PATH=/usr/local/bin:$PATH ./myscript
case $foo in bar) echo bar ;; baz) echo baz ;; esac

I am not encouraging users to adopt this behavior, but if someone firmly
understands when word splitting occurs then they should be able to decide for
themselves when to use quotes.

edited May 25 '16 at 14:21

answered May 24 '14 at 8:05

Steven Penny

2,31121535

14

As I mention in my answer, see unix.stackexchange.com/questions/68694/â€¦ for details. Do notice the question Ã¢Â€Â” Ã¢Â€ÂœWhy does my shell script choke?Ã¢Â€Â. The most common problem (from years of experience on this site and elsewhere) is missing double quotes. Ã¢Â€ÂœAlways use double quotesÃ¢Â€Â is easier to remember than Ã¢Â€Âœalways use double quotes, except for these cases where they aren't necessaryÃ¢Â€Â.
â€“Â Gilles
May 24 '14 at 8:25

11

Rules are difficult to understand for beginners. For instance, foo=$bar is OK, but export foo=$bar or env foo=$var are not (at least in some shells). An advise for beginner: always quote your variables unless you know what you're doing and have a good reason not to.
â€“Â StÃ©phane Chazelas
May 24 '14 at 17:05

5

@StevenPenny Is it really more correct? Are there reasonable cases where quotes would break the script? In situations where in half cases quotes must be used, and in other half quotes may be used optionally - then a recommendation "always use quotes, just in case" is the one that should be thought, since it's true, simple and less risky. Teaching such lists of exceptions to beginners is well known to be ineffective (lacking context, they won't remember them) and counterproductive, as they'll confuse needed/unneeded quotes, breaking their scripts and demotivating them to learn further.
â€“Â Peteris
May 25 '14 at 8:20

6

My $0.02 would be that recommending to quote everything is good advice. Mistakenly quoting something that doesn't need it is harmless, mistakenly failing to quote something that does need it is harmful. So, for the majority of shell script authors who will never understand the intricacies of when exactly word splitting occurs, quoting everything is much safer than trying to quote only where necessary.
â€“Â godlygeek
May 25 '14 at 17:03

5

@Peteris and godlygeek: "Are there reasonable cases where quotes would break the script?" It depends on your definition of "reasonable". If a script sets criteria="-type f", then find . $criteria works but find . "$criteria" doesn't.
â€“Â G-Man
Aug 28 '14 at 19:06

Â |Â
show 7 more comments

score 17 · Answer 7 · 2016-10-18 05:42:19Z

As far as I know, there are only two cases in which it is necessary to double-quote expansions, and those cases involve the two special shell parameters "$@" and "$*" - which are specified to expand differently when enclosed in double-quotes. In all other cases (excluding, perhaps, shell-specific array implementations) the behavior of an expansion is a configurable thing - there are options for that.

This is not to say, of course, that double-quoting should be avoided - to the contrary, it is probably the most convenient and robust method of delimiting an expansion which the shell has to offer. But, I think, as alternatives have already been expertly expounded, this an excellent place to discuss what happens when the shell expands a value.

The shell, in its heart and soul (for those that have such), is a command-interpreter - it is a parser, like a big, interactive, sed. If your shell statement is choking on whitespace or similar then it is very likely because you have not fully understood the shell's interpretation process - especially how and why it translates an input statement to an actionable command. The shell's job is to:

accept input

interpret and split it correctly into tokenized input words
- input words are the shell syntax items such as $word or echo $words 3 4* 5
- words are always split on whitespace - that's just syntax - but only the literal whitespace characters served to the shell in its input file

expand those if necessary into multiple fields
- fields result from word expansions - they make up the final executable command
- excepting "$@", $IFS field-splitting, and pathname expansion an input word must always evaluate to a single field.

and then to execute the resulting command
- in most cases this involves passing on the results of its interpretation in some form or another

People often say the shell is a glue, and, if this is true, then what it is sticking is lists of arguments - or fields - to one process or another when it execs them. Most shells do not handle the NUL byte well - if at all - and this is because they're already splitting on it. The shell has to exec a lot and it must do this with a NUL delimited array of arguments that it hands to the system kernel at exec time. If you were to intermingle the shell's delimiter with its delimited data then the shell would probably screw it up. Its internal data structures - like most programs - rely on that delimiter. zsh, notably, does not screw this up.

And that's where $IFS comes in. $IFS is an always present - and likewise settable - shell parameter that defines how the shell should split shell expansions from word to field - specifically on what values those fields should delimit. $IFS splits shell expansions on delimiters other than NUL - or, in other words the shell substitutes bytes resulting from an expansion that match those in the value of $IFS with NUL in its internal data-arrays. When you look at it like that you might begin to see that every field-split shell expansion is an $IFS-delimited data array.

It's important to understand that $IFS only delimits expansions that are not already otherwise delimited - which you can do with "double-quotes. When you quote an expansion you delimit it at the head and at least to the tail of its value. In those cases $IFS does not apply as there are no fields to separate. In fact, a double-quoted expansion exhibits identical field-splitting behavior to an unquoted expansion when IFS= is set to an empty value.

Unless quoted, $IFS is itself an $IFS delimited shell expansion. It defaults to a specified value of <space><tab><newline> - all three of which exhibit special properties when contained within $IFS. Whereas any other value for $IFS is specified to evaluate to a single field per expansion occurrence, $IFS whitespace - any of those three - is specified to elide to a single field per expansion sequence and leading/trailing sequences are elided entirely. This is probably easiest to understand via example.

slashes=///// spaces=' '
IFS=/; printf '<%s>' $slashes$spaces
<><><><><>< >
IFS=' '; printf '<%s>' $slashes$spaces
</////>
IFS=; printf '<%s>' $slashes$spaces
<///// >
unset IFS; printf '<%s>' "$slashes$spaces"
<///// >

But that's just $IFS - just the word-splitting or whitespace as asked, so what of the the special characters?

The shell - by default - will also expand certain unquoted tokens (such as ?*[ as noted elsewhere here) into multiple fields when they occur in a list. This is called pathname expansion, or globbing. It is an incredibly useful tool, and, as it occurs after field-splitting in the shell's parse-order it is not affected by $IFS - fields generated by a pathname expansion are delimited on the head/tail of the filenames themselves regardless of whether their contents contain any characters currently in $IFS. This behavior is set to on by default - but it is very easily configured otherwise.

set -f

That instructs the shell not to glob. Pathname expansion will not occur at least until that setting is somehow undone - such as if the current shell is replaced with another new shell process or....

set +f

...is issued to the shell. Double-quotes - as they also do for $IFS field-splitting - render this global setting unnecessary per expansion. So:

echo "*" *

...if pathname expansion is currently enabled will likely produce very different results per argument - as the first will expand only to its literal value (the single asterisk character, which is to say, not at all) and the second only to the same if the current working directory contains no filenames which might match (and it matches nearly all of them). However if you do:

set -f; echo "*" *

...the results for both arguments are identical - the * does not expand in that case.

I actually agree with @StÃ©phaneChazelas that it (mostly) confuses things more than helping...but I found it helpful, personally, so I upvoted. I now have a better idea (and some examples) of how IFS actually works. What I don't get is why it would ever be a good idea to set IFS to something other than default. — Jan 9 '16 at 4:12
@Wildcard - it's a field delimiter. if you have a value in a variable that you want to expand to multiple fields you split it on $IFS. cd /usr/bin; set -f; IFS=/; for path_component in $PWD; do echo $path_component; done prints n then usrn then binn. The first echo is empty because / is a null field. The path_components can have newlines or spaces or whatever - wouldn't matter because the components were split on / and not the default value. people do it w/ awk all the time, anyway. your shell does it too — Jan 9 '16 at 6:40

score 0 · Answer 8 · 2018-02-28 13:35:13Z

I had a large video project with spaces in filenames and spaces in directory names. While find -type f -print0 | xargs -0 works for several purposes and across different shells, I find that using a custom IFS (input field separator) gives you more flexibility if you're using bash. The snippet below uses bash and sets IFS to just a newline; provided there aren't newlines in your filenames:

(IFS=$'n'; for i in $(find -type f -print) ; do
 echo ">>>$i<<<"
done)

Note the use of parens to isolate the redefinition of IFS. I've read other posts about how to recover IFS, but this is just easier.

More, setting IFS to newline lets you set shell variables beforehand and easily print them out. For instance, I can grow a variable V incrementally using newlines as separators:

V=""
V="./Ralphie's Camcorder/STREAM/00123.MTS,04:58,05:52,-vf yadif"
V="$V"$'n'"./Ralphie's Camcorder/STREAM/00111.MTS,00:00,59:59,-vf yadif"
V="$V"$'n'"next item goes here..."

and correspondingly:

(IFS=$'n'; for v in $V ; do
 echo ">>>$v<<<"
done)

Now I can "list" the setting of V with echo "$V" using double quotes to output the newlines. (Credit to this thread for the $'n' explanation.)

But then you'll still have problems with filenames containing newline or glob characters. See also: Why is looping over find's output bad practice?. If using zsh, you can use IFS=$'' and use -print0 (zsh doesn't do globbing upon expansions so glob characters are not a problem there). — Feb 28 at 14:10
This works with file names containing spaces, but it doesn't work against potentially hostile file names or accidental Ã¢Â€ÂœnonsensicalÃ¢Â€Â file names. You can easily fix the issue of file names containing wildcard characters by adding set -f. On the other hand, your approach fundamentally fails with file names containing newlines. When dealing with data other than file names, it also fails with empty items. — Mar 1 at 18:52
Right, my caveat is that it won't work with newlines in filenames. However, I believe we have to draw the line just shy of madness ;-) — Mar 8 at 18:01
And I'm not sure why this received a downvote. This is a perfectly reasonable method for iterating over filenames with spaces. Using -print0 requires xargs, and there are things that are difficult using that chain. I'm sorry someone does not agree with my answer, but that's no reason to downvote it. — Mar 8 at 18:03

Why does my shell script choke on whitespace or other special characters?

4 Answers 4

Always use double quotes around variable substitutions and command substitutions: "$foo", "$(foo)"

Why do I need to write "$foo"? What happens without the quotes?

How do I process a list of file names?

What about file names that begin with -?

How do I store a command in a variable?

What's up with read?

What's wrong with xargs?

How do I process files found by find?

I have some other question

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Always use double quotes around variable substitutions and command substitutions: "$foo", "$(foo)"

Why do I need to write "$foo"? What happens without the quotes?

How do I process a list of file names?

What about file names that begin with -?

How do I store a command in a variable?

What's up with read?

What's wrong with xargs?

How do I process files found by find?

I have some other question

Always use double quotes around variable substitutions and command substitutions: "$foo", "$(foo)"

Why do I need to write "$foo"? What happens without the quotes?

How do I process a list of file names?

What about file names that begin with -?

How do I store a command in a variable?

What's up with read?

What's wrong with xargs?

How do I process files found by find?

I have some other question

Always use double quotes around variable substitutions and command substitutions: "$foo", "$(foo)"

Why do I need to write "$foo"? What happens without the quotes?

How do I process a list of file names?

What about file names that begin with -?

How do I store a command in a variable?

What's up with read?

What's wrong with xargs?

How do I process files found by find?

I have some other question

Always use double quotes around variable substitutions and command substitutions: "$foo", "$(foo)"

Why do I need to write "$foo"? What happens without the quotes?

How do I process a list of file names?

What about file names that begin with -?

How do I store a command in a variable?

What's up with read?

What's wrong with xargs?

How do I process files found by find?

I have some other question

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

How to check contact read email or not when send email to Individual?

Running qemu-guest-agent on windows server 2008

Christian Cage

4 Answers
4

Always use double quotes around variable substitutions and command substitutions: `"$foo"`, `"$(foo)"`

Why do I need to write `"$foo"`? What happens without the quotes?

What about file names that begin with `-`?

What's up with `read`?

What's wrong with `xargs`?

How do I process files found by `find`?

4 Answers
4

4 Answers
4

Always use double quotes around variable substitutions and command substitutions: `"$foo"`, `"$(foo)"`

Why do I need to write `"$foo"`? What happens without the quotes?

What about file names that begin with `-`?

What's up with `read`?

What's wrong with `xargs`?

How do I process files found by `find`?

Always use double quotes around variable substitutions and command substitutions: `"$foo"`, `"$(foo)"`

Why do I need to write `"$foo"`? What happens without the quotes?

What about file names that begin with `-`?

What's up with `read`?

What's wrong with `xargs`?

How do I process files found by `find`?

Always use double quotes around variable substitutions and command substitutions: `"$foo"`, `"$(foo)"`

Why do I need to write `"$foo"`? What happens without the quotes?

What about file names that begin with `-`?

What's up with `read`?

What's wrong with `xargs`?

How do I process files found by `find`?

Always use double quotes around variable substitutions and command substitutions: `"$foo"`, `"$(foo)"`

Why do I need to write `"$foo"`? What happens without the quotes?

What about file names that begin with `-`?

What's up with `read`?

What's wrong with `xargs`?

How do I process files found by `find`?