Xargs to extract filename
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
I would like to find all the .html
files in a folder and append [file](./file.html)
to another file called index.md
. I tried the following command:
ls | awk "/.html$/" | xargs -0 -I @@ -L 1 sh -c 'echo "[$@@%.*](./@@)" >> index.md'
But it can't substitute @@
inside the command? What am I doing wrong?
Note: Filename can contain valid characters like space
Clarification:
index.md
would have each line with [file](./file.html)
where file is the actual file name in the folder
awk find xargs echo
 |Â
show 2 more comments
up vote
3
down vote
favorite
I would like to find all the .html
files in a folder and append [file](./file.html)
to another file called index.md
. I tried the following command:
ls | awk "/.html$/" | xargs -0 -I @@ -L 1 sh -c 'echo "[$@@%.*](./@@)" >> index.md'
But it can't substitute @@
inside the command? What am I doing wrong?
Note: Filename can contain valid characters like space
Clarification:
index.md
would have each line with [file](./file.html)
where file is the actual file name in the folder
awk find xargs echo
xargs -0
implies null-terminated strings on thexargs
stdin, butawk
does not print them.$
needs a variable name. Both points are addressed in @RoVo's answer
â weirdan
Sep 3 at 11:24
1
Would you please clarify how the content of "index.md" will look like?
â Goro
Sep 3 at 11:32
@Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
â Nikhil
Sep 3 at 11:39
@Nikhil. Would you please include it again. Thanks!
â Goro
Sep 3 at 11:40
@Goro Isn't it appropriate to justify accepted answer?
â Nikhil
Sep 3 at 14:37
 |Â
show 2 more comments
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I would like to find all the .html
files in a folder and append [file](./file.html)
to another file called index.md
. I tried the following command:
ls | awk "/.html$/" | xargs -0 -I @@ -L 1 sh -c 'echo "[$@@%.*](./@@)" >> index.md'
But it can't substitute @@
inside the command? What am I doing wrong?
Note: Filename can contain valid characters like space
Clarification:
index.md
would have each line with [file](./file.html)
where file is the actual file name in the folder
awk find xargs echo
I would like to find all the .html
files in a folder and append [file](./file.html)
to another file called index.md
. I tried the following command:
ls | awk "/.html$/" | xargs -0 -I @@ -L 1 sh -c 'echo "[$@@%.*](./@@)" >> index.md'
But it can't substitute @@
inside the command? What am I doing wrong?
Note: Filename can contain valid characters like space
Clarification:
index.md
would have each line with [file](./file.html)
where file is the actual file name in the folder
awk find xargs echo
awk find xargs echo
edited Sep 5 at 2:18
Rui F Ribeiro
36.8k1272117
36.8k1272117
asked Sep 3 at 11:14
Nikhil
1799
1799
xargs -0
implies null-terminated strings on thexargs
stdin, butawk
does not print them.$
needs a variable name. Both points are addressed in @RoVo's answer
â weirdan
Sep 3 at 11:24
1
Would you please clarify how the content of "index.md" will look like?
â Goro
Sep 3 at 11:32
@Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
â Nikhil
Sep 3 at 11:39
@Nikhil. Would you please include it again. Thanks!
â Goro
Sep 3 at 11:40
@Goro Isn't it appropriate to justify accepted answer?
â Nikhil
Sep 3 at 14:37
 |Â
show 2 more comments
xargs -0
implies null-terminated strings on thexargs
stdin, butawk
does not print them.$
needs a variable name. Both points are addressed in @RoVo's answer
â weirdan
Sep 3 at 11:24
1
Would you please clarify how the content of "index.md" will look like?
â Goro
Sep 3 at 11:32
@Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
â Nikhil
Sep 3 at 11:39
@Nikhil. Would you please include it again. Thanks!
â Goro
Sep 3 at 11:40
@Goro Isn't it appropriate to justify accepted answer?
â Nikhil
Sep 3 at 14:37
xargs -0
implies null-terminated strings on the xargs
stdin, but awk
does not print them. $
needs a variable name. Both points are addressed in @RoVo's answerâ weirdan
Sep 3 at 11:24
xargs -0
implies null-terminated strings on the xargs
stdin, but awk
does not print them. $
needs a variable name. Both points are addressed in @RoVo's answerâ weirdan
Sep 3 at 11:24
1
1
Would you please clarify how the content of "index.md" will look like?
â Goro
Sep 3 at 11:32
Would you please clarify how the content of "index.md" will look like?
â Goro
Sep 3 at 11:32
@Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
â Nikhil
Sep 3 at 11:39
@Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
â Nikhil
Sep 3 at 11:39
@Nikhil. Would you please include it again. Thanks!
â Goro
Sep 3 at 11:40
@Nikhil. Would you please include it again. Thanks!
â Goro
Sep 3 at 11:40
@Goro Isn't it appropriate to justify accepted answer?
â Nikhil
Sep 3 at 14:37
@Goro Isn't it appropriate to justify accepted answer?
â Nikhil
Sep 3 at 14:37
 |Â
show 2 more comments
3 Answers
3
active
oldest
votes
up vote
11
down vote
accepted
Do not parse ls.
You don't need xargs
for this, you can use find -exec
.
try this,
find . -maxdepth 1 -type f -name "*.html" -exec
sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;
If you want to use xargs
, use this very similar version:
find . -maxdepth 1 -type f -name "*.html" -print0 |
xargs -0 -I sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;
Another way without running xargs
or -exec
:
find . -maxdepth 1 -type f -name "*.html" -printf '[%f](./%f)n'
| sed 's/.html]/]/'
> index.md
Is that an extrash
argument in the first command, or is that intentional?
â Toby Speight
Sep 3 at 15:26
2
This is taken from this answer. See comments there andman sh
->-c
for a documentation why this is needed.
â RoVo
Sep 3 at 15:27
1
Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to$0
and any remaining arguments are assigned to the positional parameters.
â Toby Speight
Sep 3 at 15:40
1
Add '-type f' to avoid strangeness with directories matching "*.html"
â abligh
Sep 3 at 17:29
thanks, edited.
â RoVo
Sep 3 at 19:24
add a comment |Â
up vote
16
down vote
Just do:
for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done > index.md
Use set -o nullglob
(zsh
, yash
) or shopt -s nullglob
(bash
) for *.html
to expand to nothing instead of *.html
(or report an error in zsh
) when there's no html
file. With zsh
, you can also use *.html(N)
or in ksh93
~(N)*.html
.
Or with one printf
call with zsh
:
files=(*.html)
rootnames=($files:r)
printf '[%s](./%s)n' $basenames:^files > index.md
Note that, depending on which markdown syntax you're using, you may have to HTML-encode the title part and URI-encode the URI part if the file names contain some problematic characters. Not doing so could even end up introducing a form of XSS vulnerability depending on context. With ksh93, you can do it with:
for f in *.html; do
title=$ printf %H "$file%.*";
title=$title//$'n'/"<br/>"
uri=$ printf '%#H' "$file";
uri=$uri//$'n'/%0A
printf '%sn' "[$title]($uri)"
done > index.md
Where %H
ù does the HTML encoding and %#H
the URI encoding, but we still need to address newline characters separately.
Or with perl
:
perl -MURI::Encode=uri_encode -MHTML::Entities -CLSA -le '
for (<*.html>)
$uri = uri_encode("./$_");
s/.htmlz//;
$_ = encode_entities $_;
s:n:<br/>:g;
print "[$_]($uri)"
'
Using <br/>
for newline characters. You may want to use ⤠instead or more generally decide on some form of alternative representation for non-printable characters.
There are a few things wrong in your code:
- parsing the output of
ls
- use a
$
meant to be literal inside double quotes - Using
awk
for something thatgrep
can do (not wrong per se, but overkill) - use
xargs -0
when the input is not NUL-delimited -I
conflicts with-L 1
.-L 1
is to run one command per line of input but with each word in the line passed as separate arguments, while-I @@
runs one command for each line of input with the full line (minus the trailing blanks, and quoting still processed) used to replace@@
.- using
inside the code argument of
sh
(command injection vulnerability) - In
sh
, thevar
in$var%.*
is a variable name, it won't work with arbitrary text. - use
echo
for arbitrary data.
If you wanted to use xargs -0
, you'd need something like:
printf '%s' * | grep -z '.html$' | xargs -r0 sh -c '
for file do
printf "%sn" "[$file%.*](./$file)"
done' sh > file.md
- Replacing
ls
withprintf '%s' *
to get a NUL-delimited output awk
withgrep -z
(GNU extension) to process that NUL-delimited outputxargs -r0
(GNU extensions) without any-n
/-L
/-I
, because while we're at spawning ash
, we might as well have it process as many files as possible- have
xargs
pass the words as extra arguments tosh
(which become the positional parameters inside the inline code), not inside the code argument. - which means we can more easily store them in variables (here with
for file do
which loops over the positional parameters by default) so we can use the$param%pattern
parameter expansion operator. - use
printf
instead ofecho
.
It goes without saying that it makes little sense to use that instead of doing that for
loop directly over the *.html
files like in the top example.
ù It doesn't seem to work properly for multibyte characters in my version of ksh93 though (ksh93u+ on a GNU system)
That overwritesindex.md
though, which OP's code did not.
â weirdan
Sep 3 at 11:27
2
I think this is still what OP wants. OP uses>>
because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
â RoVo
Sep 3 at 11:28
@StéphaneChazelas Thanks for the answer. Butfor f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md
appends[*](./*.html)
when no html file exists.
â Nikhil
Sep 3 at 12:48
1
@Nikhil, see edit.
â Stéphane Chazelas
Sep 3 at 13:02
add a comment |Â
up vote
3
down vote
Do you really need xargs
?
ls *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'
(If you have more than 100000 files):
printf "%sn" *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'
or (slower, but shorter):
for f in *.html; do echo "[$f%.*](./$f)"; done
Note that withls *.html
, if any of thosehtml
files are of type directory,ls
will list their content. More generally, when you usels
with a shell wildcard, you want to usels -d -- *.html
(which also addresses the issues with file names starting with-
).
â Stéphane Chazelas
Sep 4 at 7:18
The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally,echo
can't be used for arbitrary data.
â Stéphane Chazelas
Sep 4 at 7:20
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
11
down vote
accepted
Do not parse ls.
You don't need xargs
for this, you can use find -exec
.
try this,
find . -maxdepth 1 -type f -name "*.html" -exec
sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;
If you want to use xargs
, use this very similar version:
find . -maxdepth 1 -type f -name "*.html" -print0 |
xargs -0 -I sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;
Another way without running xargs
or -exec
:
find . -maxdepth 1 -type f -name "*.html" -printf '[%f](./%f)n'
| sed 's/.html]/]/'
> index.md
Is that an extrash
argument in the first command, or is that intentional?
â Toby Speight
Sep 3 at 15:26
2
This is taken from this answer. See comments there andman sh
->-c
for a documentation why this is needed.
â RoVo
Sep 3 at 15:27
1
Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to$0
and any remaining arguments are assigned to the positional parameters.
â Toby Speight
Sep 3 at 15:40
1
Add '-type f' to avoid strangeness with directories matching "*.html"
â abligh
Sep 3 at 17:29
thanks, edited.
â RoVo
Sep 3 at 19:24
add a comment |Â
up vote
11
down vote
accepted
Do not parse ls.
You don't need xargs
for this, you can use find -exec
.
try this,
find . -maxdepth 1 -type f -name "*.html" -exec
sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;
If you want to use xargs
, use this very similar version:
find . -maxdepth 1 -type f -name "*.html" -print0 |
xargs -0 -I sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;
Another way without running xargs
or -exec
:
find . -maxdepth 1 -type f -name "*.html" -printf '[%f](./%f)n'
| sed 's/.html]/]/'
> index.md
Is that an extrash
argument in the first command, or is that intentional?
â Toby Speight
Sep 3 at 15:26
2
This is taken from this answer. See comments there andman sh
->-c
for a documentation why this is needed.
â RoVo
Sep 3 at 15:27
1
Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to$0
and any remaining arguments are assigned to the positional parameters.
â Toby Speight
Sep 3 at 15:40
1
Add '-type f' to avoid strangeness with directories matching "*.html"
â abligh
Sep 3 at 17:29
thanks, edited.
â RoVo
Sep 3 at 19:24
add a comment |Â
up vote
11
down vote
accepted
up vote
11
down vote
accepted
Do not parse ls.
You don't need xargs
for this, you can use find -exec
.
try this,
find . -maxdepth 1 -type f -name "*.html" -exec
sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;
If you want to use xargs
, use this very similar version:
find . -maxdepth 1 -type f -name "*.html" -print0 |
xargs -0 -I sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;
Another way without running xargs
or -exec
:
find . -maxdepth 1 -type f -name "*.html" -printf '[%f](./%f)n'
| sed 's/.html]/]/'
> index.md
Do not parse ls.
You don't need xargs
for this, you can use find -exec
.
try this,
find . -maxdepth 1 -type f -name "*.html" -exec
sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;
If you want to use xargs
, use this very similar version:
find . -maxdepth 1 -type f -name "*.html" -print0 |
xargs -0 -I sh -c 'f=$(basename "$1"); echo "[$f%.*]($1)" >> index.md' sh ;
Another way without running xargs
or -exec
:
find . -maxdepth 1 -type f -name "*.html" -printf '[%f](./%f)n'
| sed 's/.html]/]/'
> index.md
edited Sep 3 at 19:24
answered Sep 3 at 11:23
RoVo
1,665213
1,665213
Is that an extrash
argument in the first command, or is that intentional?
â Toby Speight
Sep 3 at 15:26
2
This is taken from this answer. See comments there andman sh
->-c
for a documentation why this is needed.
â RoVo
Sep 3 at 15:27
1
Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to$0
and any remaining arguments are assigned to the positional parameters.
â Toby Speight
Sep 3 at 15:40
1
Add '-type f' to avoid strangeness with directories matching "*.html"
â abligh
Sep 3 at 17:29
thanks, edited.
â RoVo
Sep 3 at 19:24
add a comment |Â
Is that an extrash
argument in the first command, or is that intentional?
â Toby Speight
Sep 3 at 15:26
2
This is taken from this answer. See comments there andman sh
->-c
for a documentation why this is needed.
â RoVo
Sep 3 at 15:27
1
Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to$0
and any remaining arguments are assigned to the positional parameters.
â Toby Speight
Sep 3 at 15:40
1
Add '-type f' to avoid strangeness with directories matching "*.html"
â abligh
Sep 3 at 17:29
thanks, edited.
â RoVo
Sep 3 at 19:24
Is that an extra
sh
argument in the first command, or is that intentional?â Toby Speight
Sep 3 at 15:26
Is that an extra
sh
argument in the first command, or is that intentional?â Toby Speight
Sep 3 at 15:26
2
2
This is taken from this answer. See comments there and
man sh
-> -c
for a documentation why this is needed.â RoVo
Sep 3 at 15:27
This is taken from this answer. See comments there and
man sh
-> -c
for a documentation why this is needed.â RoVo
Sep 3 at 15:27
1
1
Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to
$0
and any remaining arguments are assigned to the positional parameters.â Toby Speight
Sep 3 at 15:40
Ah, thanks - I had missed that If there are arguments after the command_string, the first argument is assigned to
$0
and any remaining arguments are assigned to the positional parameters.â Toby Speight
Sep 3 at 15:40
1
1
Add '-type f' to avoid strangeness with directories matching "*.html"
â abligh
Sep 3 at 17:29
Add '-type f' to avoid strangeness with directories matching "*.html"
â abligh
Sep 3 at 17:29
thanks, edited.
â RoVo
Sep 3 at 19:24
thanks, edited.
â RoVo
Sep 3 at 19:24
add a comment |Â
up vote
16
down vote
Just do:
for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done > index.md
Use set -o nullglob
(zsh
, yash
) or shopt -s nullglob
(bash
) for *.html
to expand to nothing instead of *.html
(or report an error in zsh
) when there's no html
file. With zsh
, you can also use *.html(N)
or in ksh93
~(N)*.html
.
Or with one printf
call with zsh
:
files=(*.html)
rootnames=($files:r)
printf '[%s](./%s)n' $basenames:^files > index.md
Note that, depending on which markdown syntax you're using, you may have to HTML-encode the title part and URI-encode the URI part if the file names contain some problematic characters. Not doing so could even end up introducing a form of XSS vulnerability depending on context. With ksh93, you can do it with:
for f in *.html; do
title=$ printf %H "$file%.*";
title=$title//$'n'/"<br/>"
uri=$ printf '%#H' "$file";
uri=$uri//$'n'/%0A
printf '%sn' "[$title]($uri)"
done > index.md
Where %H
ù does the HTML encoding and %#H
the URI encoding, but we still need to address newline characters separately.
Or with perl
:
perl -MURI::Encode=uri_encode -MHTML::Entities -CLSA -le '
for (<*.html>)
$uri = uri_encode("./$_");
s/.htmlz//;
$_ = encode_entities $_;
s:n:<br/>:g;
print "[$_]($uri)"
'
Using <br/>
for newline characters. You may want to use ⤠instead or more generally decide on some form of alternative representation for non-printable characters.
There are a few things wrong in your code:
- parsing the output of
ls
- use a
$
meant to be literal inside double quotes - Using
awk
for something thatgrep
can do (not wrong per se, but overkill) - use
xargs -0
when the input is not NUL-delimited -I
conflicts with-L 1
.-L 1
is to run one command per line of input but with each word in the line passed as separate arguments, while-I @@
runs one command for each line of input with the full line (minus the trailing blanks, and quoting still processed) used to replace@@
.- using
inside the code argument of
sh
(command injection vulnerability) - In
sh
, thevar
in$var%.*
is a variable name, it won't work with arbitrary text. - use
echo
for arbitrary data.
If you wanted to use xargs -0
, you'd need something like:
printf '%s' * | grep -z '.html$' | xargs -r0 sh -c '
for file do
printf "%sn" "[$file%.*](./$file)"
done' sh > file.md
- Replacing
ls
withprintf '%s' *
to get a NUL-delimited output awk
withgrep -z
(GNU extension) to process that NUL-delimited outputxargs -r0
(GNU extensions) without any-n
/-L
/-I
, because while we're at spawning ash
, we might as well have it process as many files as possible- have
xargs
pass the words as extra arguments tosh
(which become the positional parameters inside the inline code), not inside the code argument. - which means we can more easily store them in variables (here with
for file do
which loops over the positional parameters by default) so we can use the$param%pattern
parameter expansion operator. - use
printf
instead ofecho
.
It goes without saying that it makes little sense to use that instead of doing that for
loop directly over the *.html
files like in the top example.
ù It doesn't seem to work properly for multibyte characters in my version of ksh93 though (ksh93u+ on a GNU system)
That overwritesindex.md
though, which OP's code did not.
â weirdan
Sep 3 at 11:27
2
I think this is still what OP wants. OP uses>>
because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
â RoVo
Sep 3 at 11:28
@StéphaneChazelas Thanks for the answer. Butfor f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md
appends[*](./*.html)
when no html file exists.
â Nikhil
Sep 3 at 12:48
1
@Nikhil, see edit.
â Stéphane Chazelas
Sep 3 at 13:02
add a comment |Â
up vote
16
down vote
Just do:
for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done > index.md
Use set -o nullglob
(zsh
, yash
) or shopt -s nullglob
(bash
) for *.html
to expand to nothing instead of *.html
(or report an error in zsh
) when there's no html
file. With zsh
, you can also use *.html(N)
or in ksh93
~(N)*.html
.
Or with one printf
call with zsh
:
files=(*.html)
rootnames=($files:r)
printf '[%s](./%s)n' $basenames:^files > index.md
Note that, depending on which markdown syntax you're using, you may have to HTML-encode the title part and URI-encode the URI part if the file names contain some problematic characters. Not doing so could even end up introducing a form of XSS vulnerability depending on context. With ksh93, you can do it with:
for f in *.html; do
title=$ printf %H "$file%.*";
title=$title//$'n'/"<br/>"
uri=$ printf '%#H' "$file";
uri=$uri//$'n'/%0A
printf '%sn' "[$title]($uri)"
done > index.md
Where %H
ù does the HTML encoding and %#H
the URI encoding, but we still need to address newline characters separately.
Or with perl
:
perl -MURI::Encode=uri_encode -MHTML::Entities -CLSA -le '
for (<*.html>)
$uri = uri_encode("./$_");
s/.htmlz//;
$_ = encode_entities $_;
s:n:<br/>:g;
print "[$_]($uri)"
'
Using <br/>
for newline characters. You may want to use ⤠instead or more generally decide on some form of alternative representation for non-printable characters.
There are a few things wrong in your code:
- parsing the output of
ls
- use a
$
meant to be literal inside double quotes - Using
awk
for something thatgrep
can do (not wrong per se, but overkill) - use
xargs -0
when the input is not NUL-delimited -I
conflicts with-L 1
.-L 1
is to run one command per line of input but with each word in the line passed as separate arguments, while-I @@
runs one command for each line of input with the full line (minus the trailing blanks, and quoting still processed) used to replace@@
.- using
inside the code argument of
sh
(command injection vulnerability) - In
sh
, thevar
in$var%.*
is a variable name, it won't work with arbitrary text. - use
echo
for arbitrary data.
If you wanted to use xargs -0
, you'd need something like:
printf '%s' * | grep -z '.html$' | xargs -r0 sh -c '
for file do
printf "%sn" "[$file%.*](./$file)"
done' sh > file.md
- Replacing
ls
withprintf '%s' *
to get a NUL-delimited output awk
withgrep -z
(GNU extension) to process that NUL-delimited outputxargs -r0
(GNU extensions) without any-n
/-L
/-I
, because while we're at spawning ash
, we might as well have it process as many files as possible- have
xargs
pass the words as extra arguments tosh
(which become the positional parameters inside the inline code), not inside the code argument. - which means we can more easily store them in variables (here with
for file do
which loops over the positional parameters by default) so we can use the$param%pattern
parameter expansion operator. - use
printf
instead ofecho
.
It goes without saying that it makes little sense to use that instead of doing that for
loop directly over the *.html
files like in the top example.
ù It doesn't seem to work properly for multibyte characters in my version of ksh93 though (ksh93u+ on a GNU system)
That overwritesindex.md
though, which OP's code did not.
â weirdan
Sep 3 at 11:27
2
I think this is still what OP wants. OP uses>>
because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
â RoVo
Sep 3 at 11:28
@StéphaneChazelas Thanks for the answer. Butfor f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md
appends[*](./*.html)
when no html file exists.
â Nikhil
Sep 3 at 12:48
1
@Nikhil, see edit.
â Stéphane Chazelas
Sep 3 at 13:02
add a comment |Â
up vote
16
down vote
up vote
16
down vote
Just do:
for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done > index.md
Use set -o nullglob
(zsh
, yash
) or shopt -s nullglob
(bash
) for *.html
to expand to nothing instead of *.html
(or report an error in zsh
) when there's no html
file. With zsh
, you can also use *.html(N)
or in ksh93
~(N)*.html
.
Or with one printf
call with zsh
:
files=(*.html)
rootnames=($files:r)
printf '[%s](./%s)n' $basenames:^files > index.md
Note that, depending on which markdown syntax you're using, you may have to HTML-encode the title part and URI-encode the URI part if the file names contain some problematic characters. Not doing so could even end up introducing a form of XSS vulnerability depending on context. With ksh93, you can do it with:
for f in *.html; do
title=$ printf %H "$file%.*";
title=$title//$'n'/"<br/>"
uri=$ printf '%#H' "$file";
uri=$uri//$'n'/%0A
printf '%sn' "[$title]($uri)"
done > index.md
Where %H
ù does the HTML encoding and %#H
the URI encoding, but we still need to address newline characters separately.
Or with perl
:
perl -MURI::Encode=uri_encode -MHTML::Entities -CLSA -le '
for (<*.html>)
$uri = uri_encode("./$_");
s/.htmlz//;
$_ = encode_entities $_;
s:n:<br/>:g;
print "[$_]($uri)"
'
Using <br/>
for newline characters. You may want to use ⤠instead or more generally decide on some form of alternative representation for non-printable characters.
There are a few things wrong in your code:
- parsing the output of
ls
- use a
$
meant to be literal inside double quotes - Using
awk
for something thatgrep
can do (not wrong per se, but overkill) - use
xargs -0
when the input is not NUL-delimited -I
conflicts with-L 1
.-L 1
is to run one command per line of input but with each word in the line passed as separate arguments, while-I @@
runs one command for each line of input with the full line (minus the trailing blanks, and quoting still processed) used to replace@@
.- using
inside the code argument of
sh
(command injection vulnerability) - In
sh
, thevar
in$var%.*
is a variable name, it won't work with arbitrary text. - use
echo
for arbitrary data.
If you wanted to use xargs -0
, you'd need something like:
printf '%s' * | grep -z '.html$' | xargs -r0 sh -c '
for file do
printf "%sn" "[$file%.*](./$file)"
done' sh > file.md
- Replacing
ls
withprintf '%s' *
to get a NUL-delimited output awk
withgrep -z
(GNU extension) to process that NUL-delimited outputxargs -r0
(GNU extensions) without any-n
/-L
/-I
, because while we're at spawning ash
, we might as well have it process as many files as possible- have
xargs
pass the words as extra arguments tosh
(which become the positional parameters inside the inline code), not inside the code argument. - which means we can more easily store them in variables (here with
for file do
which loops over the positional parameters by default) so we can use the$param%pattern
parameter expansion operator. - use
printf
instead ofecho
.
It goes without saying that it makes little sense to use that instead of doing that for
loop directly over the *.html
files like in the top example.
ù It doesn't seem to work properly for multibyte characters in my version of ksh93 though (ksh93u+ on a GNU system)
Just do:
for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done > index.md
Use set -o nullglob
(zsh
, yash
) or shopt -s nullglob
(bash
) for *.html
to expand to nothing instead of *.html
(or report an error in zsh
) when there's no html
file. With zsh
, you can also use *.html(N)
or in ksh93
~(N)*.html
.
Or with one printf
call with zsh
:
files=(*.html)
rootnames=($files:r)
printf '[%s](./%s)n' $basenames:^files > index.md
Note that, depending on which markdown syntax you're using, you may have to HTML-encode the title part and URI-encode the URI part if the file names contain some problematic characters. Not doing so could even end up introducing a form of XSS vulnerability depending on context. With ksh93, you can do it with:
for f in *.html; do
title=$ printf %H "$file%.*";
title=$title//$'n'/"<br/>"
uri=$ printf '%#H' "$file";
uri=$uri//$'n'/%0A
printf '%sn' "[$title]($uri)"
done > index.md
Where %H
ù does the HTML encoding and %#H
the URI encoding, but we still need to address newline characters separately.
Or with perl
:
perl -MURI::Encode=uri_encode -MHTML::Entities -CLSA -le '
for (<*.html>)
$uri = uri_encode("./$_");
s/.htmlz//;
$_ = encode_entities $_;
s:n:<br/>:g;
print "[$_]($uri)"
'
Using <br/>
for newline characters. You may want to use ⤠instead or more generally decide on some form of alternative representation for non-printable characters.
There are a few things wrong in your code:
- parsing the output of
ls
- use a
$
meant to be literal inside double quotes - Using
awk
for something thatgrep
can do (not wrong per se, but overkill) - use
xargs -0
when the input is not NUL-delimited -I
conflicts with-L 1
.-L 1
is to run one command per line of input but with each word in the line passed as separate arguments, while-I @@
runs one command for each line of input with the full line (minus the trailing blanks, and quoting still processed) used to replace@@
.- using
inside the code argument of
sh
(command injection vulnerability) - In
sh
, thevar
in$var%.*
is a variable name, it won't work with arbitrary text. - use
echo
for arbitrary data.
If you wanted to use xargs -0
, you'd need something like:
printf '%s' * | grep -z '.html$' | xargs -r0 sh -c '
for file do
printf "%sn" "[$file%.*](./$file)"
done' sh > file.md
- Replacing
ls
withprintf '%s' *
to get a NUL-delimited output awk
withgrep -z
(GNU extension) to process that NUL-delimited outputxargs -r0
(GNU extensions) without any-n
/-L
/-I
, because while we're at spawning ash
, we might as well have it process as many files as possible- have
xargs
pass the words as extra arguments tosh
(which become the positional parameters inside the inline code), not inside the code argument. - which means we can more easily store them in variables (here with
for file do
which loops over the positional parameters by default) so we can use the$param%pattern
parameter expansion operator. - use
printf
instead ofecho
.
It goes without saying that it makes little sense to use that instead of doing that for
loop directly over the *.html
files like in the top example.
ù It doesn't seem to work properly for multibyte characters in my version of ksh93 though (ksh93u+ on a GNU system)
edited Sep 4 at 9:40
answered Sep 3 at 11:24
Stéphane Chazelas
286k53527866
286k53527866
That overwritesindex.md
though, which OP's code did not.
â weirdan
Sep 3 at 11:27
2
I think this is still what OP wants. OP uses>>
because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
â RoVo
Sep 3 at 11:28
@StéphaneChazelas Thanks for the answer. Butfor f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md
appends[*](./*.html)
when no html file exists.
â Nikhil
Sep 3 at 12:48
1
@Nikhil, see edit.
â Stéphane Chazelas
Sep 3 at 13:02
add a comment |Â
That overwritesindex.md
though, which OP's code did not.
â weirdan
Sep 3 at 11:27
2
I think this is still what OP wants. OP uses>>
because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.
â RoVo
Sep 3 at 11:28
@StéphaneChazelas Thanks for the answer. Butfor f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md
appends[*](./*.html)
when no html file exists.
â Nikhil
Sep 3 at 12:48
1
@Nikhil, see edit.
â Stéphane Chazelas
Sep 3 at 13:02
That overwrites
index.md
though, which OP's code did not.â weirdan
Sep 3 at 11:27
That overwrites
index.md
though, which OP's code did not.â weirdan
Sep 3 at 11:27
2
2
I think this is still what OP wants. OP uses
>>
because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.â RoVo
Sep 3 at 11:28
I think this is still what OP wants. OP uses
>>
because he uses it inside the loop, while this answer after the loop and a second run of the same script doesn't make too much sense to me.â RoVo
Sep 3 at 11:28
@StéphaneChazelas Thanks for the answer. But
for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md
appends [*](./*.html)
when no html file exists.â Nikhil
Sep 3 at 12:48
@StéphaneChazelas Thanks for the answer. But
for f in *.html; do printf '%sn' "[$f%.*](./$f)"; done >> index.md
appends [*](./*.html)
when no html file exists.â Nikhil
Sep 3 at 12:48
1
1
@Nikhil, see edit.
â Stéphane Chazelas
Sep 3 at 13:02
@Nikhil, see edit.
â Stéphane Chazelas
Sep 3 at 13:02
add a comment |Â
up vote
3
down vote
Do you really need xargs
?
ls *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'
(If you have more than 100000 files):
printf "%sn" *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'
or (slower, but shorter):
for f in *.html; do echo "[$f%.*](./$f)"; done
Note that withls *.html
, if any of thosehtml
files are of type directory,ls
will list their content. More generally, when you usels
with a shell wildcard, you want to usels -d -- *.html
(which also addresses the issues with file names starting with-
).
â Stéphane Chazelas
Sep 4 at 7:18
The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally,echo
can't be used for arbitrary data.
â Stéphane Chazelas
Sep 4 at 7:20
add a comment |Â
up vote
3
down vote
Do you really need xargs
?
ls *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'
(If you have more than 100000 files):
printf "%sn" *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'
or (slower, but shorter):
for f in *.html; do echo "[$f%.*](./$f)"; done
Note that withls *.html
, if any of thosehtml
files are of type directory,ls
will list their content. More generally, when you usels
with a shell wildcard, you want to usels -d -- *.html
(which also addresses the issues with file names starting with-
).
â Stéphane Chazelas
Sep 4 at 7:18
The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally,echo
can't be used for arbitrary data.
â Stéphane Chazelas
Sep 4 at 7:20
add a comment |Â
up vote
3
down vote
up vote
3
down vote
Do you really need xargs
?
ls *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'
(If you have more than 100000 files):
printf "%sn" *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'
or (slower, but shorter):
for f in *.html; do echo "[$f%.*](./$f)"; done
Do you really need xargs
?
ls *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'
(If you have more than 100000 files):
printf "%sn" *.html | perl -pe 's/.htmln//;$_="[$_](./$_.html)n"'
or (slower, but shorter):
for f in *.html; do echo "[$f%.*](./$f)"; done
edited Sep 4 at 7:10
answered Sep 3 at 20:46
Ole Tange
11.5k1445103
11.5k1445103
Note that withls *.html
, if any of thosehtml
files are of type directory,ls
will list their content. More generally, when you usels
with a shell wildcard, you want to usels -d -- *.html
(which also addresses the issues with file names starting with-
).
â Stéphane Chazelas
Sep 4 at 7:18
The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally,echo
can't be used for arbitrary data.
â Stéphane Chazelas
Sep 4 at 7:20
add a comment |Â
Note that withls *.html
, if any of thosehtml
files are of type directory,ls
will list their content. More generally, when you usels
with a shell wildcard, you want to usels -d -- *.html
(which also addresses the issues with file names starting with-
).
â Stéphane Chazelas
Sep 4 at 7:18
The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally,echo
can't be used for arbitrary data.
â Stéphane Chazelas
Sep 4 at 7:20
Note that with
ls *.html
, if any of those html
files are of type directory, ls
will list their content. More generally, when you use ls
with a shell wildcard, you want to use ls -d -- *.html
(which also addresses the issues with file names starting with -
).â Stéphane Chazelas
Sep 4 at 7:18
Note that with
ls *.html
, if any of those html
files are of type directory, ls
will list their content. More generally, when you use ls
with a shell wildcard, you want to use ls -d -- *.html
(which also addresses the issues with file names starting with -
).â Stéphane Chazelas
Sep 4 at 7:18
The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally,
echo
can't be used for arbitrary data.â Stéphane Chazelas
Sep 4 at 7:20
The first two approaches assume file names don't contain newline characters (anyway, I suppose those would have to be encoded somehow in the markdown syntax). The third one assumes file names don't contain backslash characters. More generally,
echo
can't be used for arbitrary data.â Stéphane Chazelas
Sep 4 at 7:20
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f466550%2fxargs-to-extract-filename%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
xargs -0
implies null-terminated strings on thexargs
stdin, butawk
does not print them.$
needs a variable name. Both points are addressed in @RoVo's answerâ weirdan
Sep 3 at 11:24
1
Would you please clarify how the content of "index.md" will look like?
â Goro
Sep 3 at 11:32
@Goro I had appended the clarification at the end of question, but unfortunately, it has been edited out!
â Nikhil
Sep 3 at 11:39
@Nikhil. Would you please include it again. Thanks!
â Goro
Sep 3 at 11:40
@Goro Isn't it appropriate to justify accepted answer?
â Nikhil
Sep 3 at 14:37