How can I call an external command from within Miller (mlr)’s DSL?
Clash Royale CLAN TAG#URR8PPP
Suppose I have the following CSV:
$ cat test.csv
id,domain
1,foo.com
2,bar.com
Using mlr put
, I can easily map any function over a field in the CSV, as long as I can define it in the Miller DSL. So, for example, mlr --csv put '$id = $id + 1'
will increment the id
by 1 for each record.
But what if I can’t define the function in Miller’s DSL, perhaps because it is not pure? Suppose I wanted to map each domain in the CSV to an IP address. I’d like to do something like mlr --csv put '$ip = shell("nslookup $domain")
. Is there an easy way to do this?
Currently I am extracting the input field into a separate file, rewriting it in a separate shell script, and adding the result back in with mlr join
. However, this is pretty messy, because my CSV is full of quoted commas and newlines, which I need to carefully handle myself rather than relying on Miller.
shell csv miller
add a comment |
Suppose I have the following CSV:
$ cat test.csv
id,domain
1,foo.com
2,bar.com
Using mlr put
, I can easily map any function over a field in the CSV, as long as I can define it in the Miller DSL. So, for example, mlr --csv put '$id = $id + 1'
will increment the id
by 1 for each record.
But what if I can’t define the function in Miller’s DSL, perhaps because it is not pure? Suppose I wanted to map each domain in the CSV to an IP address. I’d like to do something like mlr --csv put '$ip = shell("nslookup $domain")
. Is there an easy way to do this?
Currently I am extracting the input field into a separate file, rewriting it in a separate shell script, and adding the result back in with mlr join
. However, this is pretty messy, because my CSV is full of quoted commas and newlines, which I need to carefully handle myself rather than relying on Miller.
shell csv miller
add a comment |
Suppose I have the following CSV:
$ cat test.csv
id,domain
1,foo.com
2,bar.com
Using mlr put
, I can easily map any function over a field in the CSV, as long as I can define it in the Miller DSL. So, for example, mlr --csv put '$id = $id + 1'
will increment the id
by 1 for each record.
But what if I can’t define the function in Miller’s DSL, perhaps because it is not pure? Suppose I wanted to map each domain in the CSV to an IP address. I’d like to do something like mlr --csv put '$ip = shell("nslookup $domain")
. Is there an easy way to do this?
Currently I am extracting the input field into a separate file, rewriting it in a separate shell script, and adding the result back in with mlr join
. However, this is pretty messy, because my CSV is full of quoted commas and newlines, which I need to carefully handle myself rather than relying on Miller.
shell csv miller
Suppose I have the following CSV:
$ cat test.csv
id,domain
1,foo.com
2,bar.com
Using mlr put
, I can easily map any function over a field in the CSV, as long as I can define it in the Miller DSL. So, for example, mlr --csv put '$id = $id + 1'
will increment the id
by 1 for each record.
But what if I can’t define the function in Miller’s DSL, perhaps because it is not pure? Suppose I wanted to map each domain in the CSV to an IP address. I’d like to do something like mlr --csv put '$ip = shell("nslookup $domain")
. Is there an easy way to do this?
Currently I am extracting the input field into a separate file, rewriting it in a separate shell script, and adding the result back in with mlr join
. However, this is pretty messy, because my CSV is full of quoted commas and newlines, which I need to carefully handle myself rather than relying on Miller.
shell csv miller
shell csv miller
edited Feb 15 at 11:54
sjy
asked Jan 29 at 8:12
sjysjy
1064
1064
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Calling external commands from the Miller DSL
The Miller DSL reference deals with calling external commands in the section on redirected-output statements:
The print, dump, tee, emitf, emit, and emitp keywords all allow you to redirect output to one or more files or pipe-to commands.
I couldn’t find this in the documentation (other than by inference from the examples), but the syntax for using these statements with a pipe-to command seems to be statement | quoted-shell-command, unquoted-mlr-expression
. For example:
$ mlr --csv put 'tee | "tr [a-z] [A-Z]", $*' test.csv
id,domain
1,foo.com
2,bar.com
ID,DOMAIN
1,FOO.COM
2,BAR.COM
Note that the piped output appears after Miller’s output (in this case, the unchanged input, as tee
does not affect the stream and put
emits it). By suppressing put
’s output with -q
, and extracting a single field with print $domain
rather than tee $*
, we can get a list of IP addresses:
$ mlr --csv put -q 'print | "xargs dig +short", $domain' test.csv
23.23.86.44
104.27.138.186
104.27.139.186
Miller didn’t do much for us here; we still had to use xargs
to convert stdin into an argument (because dig
does not accept domains on stdin). Moreover, dig
’s output contained newlines, meaning that the output no longer matches the input one-to-one. Since mlr
adheres to the Unix philosophy, it would have been easier just to join a pipe to the end of mlr --headerless-csv-output cut -f domain
if this was all I needed.
Joining output from external commands to your input
What I really wanted to do was assign the result of calling an external command to an in-stream variable in the Miller DSL, and as far as I can tell, this is not possible. However, by swapping xargs
for GNU parallel
, we can use the --tag
option to keep track of the argument we gave dig
, and benefit from flexible, concurrent I/O:
$ mlr --csv --headerless-csv-output cut -f domain test.csv | parallel --tag dig +short
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186
Since we are dealing with CSV, parallel
can actually handle this on its own, although we need to access fields by position (2
) rather than name (domain
):
$ < test.csv parallel -C "," --skip-first-line --tagstring 2 dig +short 2
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186
This is a tab-separated list of (domain, ip)
pairs, so we can convert it back to CSV with a header using mlr --t2c --implicit-csv-header label domain,ip
. Then, since both our output and our original test.csv
have a domain
field, we can use mlr join
to produce a single output table, and mlr nest
to implode the multiple values for bar.com
:
$ mlr --csv cut -f domain test.csv |
parallel --skip-first-line --tag dig +short |
mlr --t2c --implicit-csv-header label domain,ip |
mlr --c2p --barred join -f test.csv -j domain then
nest --implode --values --across-records -f ip
+---------+----+-------------------------------+
| domain | id | ip |
+---------+----+-------------------------------+
| foo.com | 1 | 23.23.86.44 |
| bar.com | 2 | 104.27.138.186;104.27.139.186 |
+---------+----+-------------------------------+
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497375%2fhow-can-i-call-an-external-command-from-within-miller-mlr-s-dsl%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Calling external commands from the Miller DSL
The Miller DSL reference deals with calling external commands in the section on redirected-output statements:
The print, dump, tee, emitf, emit, and emitp keywords all allow you to redirect output to one or more files or pipe-to commands.
I couldn’t find this in the documentation (other than by inference from the examples), but the syntax for using these statements with a pipe-to command seems to be statement | quoted-shell-command, unquoted-mlr-expression
. For example:
$ mlr --csv put 'tee | "tr [a-z] [A-Z]", $*' test.csv
id,domain
1,foo.com
2,bar.com
ID,DOMAIN
1,FOO.COM
2,BAR.COM
Note that the piped output appears after Miller’s output (in this case, the unchanged input, as tee
does not affect the stream and put
emits it). By suppressing put
’s output with -q
, and extracting a single field with print $domain
rather than tee $*
, we can get a list of IP addresses:
$ mlr --csv put -q 'print | "xargs dig +short", $domain' test.csv
23.23.86.44
104.27.138.186
104.27.139.186
Miller didn’t do much for us here; we still had to use xargs
to convert stdin into an argument (because dig
does not accept domains on stdin). Moreover, dig
’s output contained newlines, meaning that the output no longer matches the input one-to-one. Since mlr
adheres to the Unix philosophy, it would have been easier just to join a pipe to the end of mlr --headerless-csv-output cut -f domain
if this was all I needed.
Joining output from external commands to your input
What I really wanted to do was assign the result of calling an external command to an in-stream variable in the Miller DSL, and as far as I can tell, this is not possible. However, by swapping xargs
for GNU parallel
, we can use the --tag
option to keep track of the argument we gave dig
, and benefit from flexible, concurrent I/O:
$ mlr --csv --headerless-csv-output cut -f domain test.csv | parallel --tag dig +short
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186
Since we are dealing with CSV, parallel
can actually handle this on its own, although we need to access fields by position (2
) rather than name (domain
):
$ < test.csv parallel -C "," --skip-first-line --tagstring 2 dig +short 2
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186
This is a tab-separated list of (domain, ip)
pairs, so we can convert it back to CSV with a header using mlr --t2c --implicit-csv-header label domain,ip
. Then, since both our output and our original test.csv
have a domain
field, we can use mlr join
to produce a single output table, and mlr nest
to implode the multiple values for bar.com
:
$ mlr --csv cut -f domain test.csv |
parallel --skip-first-line --tag dig +short |
mlr --t2c --implicit-csv-header label domain,ip |
mlr --c2p --barred join -f test.csv -j domain then
nest --implode --values --across-records -f ip
+---------+----+-------------------------------+
| domain | id | ip |
+---------+----+-------------------------------+
| foo.com | 1 | 23.23.86.44 |
| bar.com | 2 | 104.27.138.186;104.27.139.186 |
+---------+----+-------------------------------+
add a comment |
Calling external commands from the Miller DSL
The Miller DSL reference deals with calling external commands in the section on redirected-output statements:
The print, dump, tee, emitf, emit, and emitp keywords all allow you to redirect output to one or more files or pipe-to commands.
I couldn’t find this in the documentation (other than by inference from the examples), but the syntax for using these statements with a pipe-to command seems to be statement | quoted-shell-command, unquoted-mlr-expression
. For example:
$ mlr --csv put 'tee | "tr [a-z] [A-Z]", $*' test.csv
id,domain
1,foo.com
2,bar.com
ID,DOMAIN
1,FOO.COM
2,BAR.COM
Note that the piped output appears after Miller’s output (in this case, the unchanged input, as tee
does not affect the stream and put
emits it). By suppressing put
’s output with -q
, and extracting a single field with print $domain
rather than tee $*
, we can get a list of IP addresses:
$ mlr --csv put -q 'print | "xargs dig +short", $domain' test.csv
23.23.86.44
104.27.138.186
104.27.139.186
Miller didn’t do much for us here; we still had to use xargs
to convert stdin into an argument (because dig
does not accept domains on stdin). Moreover, dig
’s output contained newlines, meaning that the output no longer matches the input one-to-one. Since mlr
adheres to the Unix philosophy, it would have been easier just to join a pipe to the end of mlr --headerless-csv-output cut -f domain
if this was all I needed.
Joining output from external commands to your input
What I really wanted to do was assign the result of calling an external command to an in-stream variable in the Miller DSL, and as far as I can tell, this is not possible. However, by swapping xargs
for GNU parallel
, we can use the --tag
option to keep track of the argument we gave dig
, and benefit from flexible, concurrent I/O:
$ mlr --csv --headerless-csv-output cut -f domain test.csv | parallel --tag dig +short
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186
Since we are dealing with CSV, parallel
can actually handle this on its own, although we need to access fields by position (2
) rather than name (domain
):
$ < test.csv parallel -C "," --skip-first-line --tagstring 2 dig +short 2
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186
This is a tab-separated list of (domain, ip)
pairs, so we can convert it back to CSV with a header using mlr --t2c --implicit-csv-header label domain,ip
. Then, since both our output and our original test.csv
have a domain
field, we can use mlr join
to produce a single output table, and mlr nest
to implode the multiple values for bar.com
:
$ mlr --csv cut -f domain test.csv |
parallel --skip-first-line --tag dig +short |
mlr --t2c --implicit-csv-header label domain,ip |
mlr --c2p --barred join -f test.csv -j domain then
nest --implode --values --across-records -f ip
+---------+----+-------------------------------+
| domain | id | ip |
+---------+----+-------------------------------+
| foo.com | 1 | 23.23.86.44 |
| bar.com | 2 | 104.27.138.186;104.27.139.186 |
+---------+----+-------------------------------+
add a comment |
Calling external commands from the Miller DSL
The Miller DSL reference deals with calling external commands in the section on redirected-output statements:
The print, dump, tee, emitf, emit, and emitp keywords all allow you to redirect output to one or more files or pipe-to commands.
I couldn’t find this in the documentation (other than by inference from the examples), but the syntax for using these statements with a pipe-to command seems to be statement | quoted-shell-command, unquoted-mlr-expression
. For example:
$ mlr --csv put 'tee | "tr [a-z] [A-Z]", $*' test.csv
id,domain
1,foo.com
2,bar.com
ID,DOMAIN
1,FOO.COM
2,BAR.COM
Note that the piped output appears after Miller’s output (in this case, the unchanged input, as tee
does not affect the stream and put
emits it). By suppressing put
’s output with -q
, and extracting a single field with print $domain
rather than tee $*
, we can get a list of IP addresses:
$ mlr --csv put -q 'print | "xargs dig +short", $domain' test.csv
23.23.86.44
104.27.138.186
104.27.139.186
Miller didn’t do much for us here; we still had to use xargs
to convert stdin into an argument (because dig
does not accept domains on stdin). Moreover, dig
’s output contained newlines, meaning that the output no longer matches the input one-to-one. Since mlr
adheres to the Unix philosophy, it would have been easier just to join a pipe to the end of mlr --headerless-csv-output cut -f domain
if this was all I needed.
Joining output from external commands to your input
What I really wanted to do was assign the result of calling an external command to an in-stream variable in the Miller DSL, and as far as I can tell, this is not possible. However, by swapping xargs
for GNU parallel
, we can use the --tag
option to keep track of the argument we gave dig
, and benefit from flexible, concurrent I/O:
$ mlr --csv --headerless-csv-output cut -f domain test.csv | parallel --tag dig +short
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186
Since we are dealing with CSV, parallel
can actually handle this on its own, although we need to access fields by position (2
) rather than name (domain
):
$ < test.csv parallel -C "," --skip-first-line --tagstring 2 dig +short 2
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186
This is a tab-separated list of (domain, ip)
pairs, so we can convert it back to CSV with a header using mlr --t2c --implicit-csv-header label domain,ip
. Then, since both our output and our original test.csv
have a domain
field, we can use mlr join
to produce a single output table, and mlr nest
to implode the multiple values for bar.com
:
$ mlr --csv cut -f domain test.csv |
parallel --skip-first-line --tag dig +short |
mlr --t2c --implicit-csv-header label domain,ip |
mlr --c2p --barred join -f test.csv -j domain then
nest --implode --values --across-records -f ip
+---------+----+-------------------------------+
| domain | id | ip |
+---------+----+-------------------------------+
| foo.com | 1 | 23.23.86.44 |
| bar.com | 2 | 104.27.138.186;104.27.139.186 |
+---------+----+-------------------------------+
Calling external commands from the Miller DSL
The Miller DSL reference deals with calling external commands in the section on redirected-output statements:
The print, dump, tee, emitf, emit, and emitp keywords all allow you to redirect output to one or more files or pipe-to commands.
I couldn’t find this in the documentation (other than by inference from the examples), but the syntax for using these statements with a pipe-to command seems to be statement | quoted-shell-command, unquoted-mlr-expression
. For example:
$ mlr --csv put 'tee | "tr [a-z] [A-Z]", $*' test.csv
id,domain
1,foo.com
2,bar.com
ID,DOMAIN
1,FOO.COM
2,BAR.COM
Note that the piped output appears after Miller’s output (in this case, the unchanged input, as tee
does not affect the stream and put
emits it). By suppressing put
’s output with -q
, and extracting a single field with print $domain
rather than tee $*
, we can get a list of IP addresses:
$ mlr --csv put -q 'print | "xargs dig +short", $domain' test.csv
23.23.86.44
104.27.138.186
104.27.139.186
Miller didn’t do much for us here; we still had to use xargs
to convert stdin into an argument (because dig
does not accept domains on stdin). Moreover, dig
’s output contained newlines, meaning that the output no longer matches the input one-to-one. Since mlr
adheres to the Unix philosophy, it would have been easier just to join a pipe to the end of mlr --headerless-csv-output cut -f domain
if this was all I needed.
Joining output from external commands to your input
What I really wanted to do was assign the result of calling an external command to an in-stream variable in the Miller DSL, and as far as I can tell, this is not possible. However, by swapping xargs
for GNU parallel
, we can use the --tag
option to keep track of the argument we gave dig
, and benefit from flexible, concurrent I/O:
$ mlr --csv --headerless-csv-output cut -f domain test.csv | parallel --tag dig +short
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186
Since we are dealing with CSV, parallel
can actually handle this on its own, although we need to access fields by position (2
) rather than name (domain
):
$ < test.csv parallel -C "," --skip-first-line --tagstring 2 dig +short 2
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186
This is a tab-separated list of (domain, ip)
pairs, so we can convert it back to CSV with a header using mlr --t2c --implicit-csv-header label domain,ip
. Then, since both our output and our original test.csv
have a domain
field, we can use mlr join
to produce a single output table, and mlr nest
to implode the multiple values for bar.com
:
$ mlr --csv cut -f domain test.csv |
parallel --skip-first-line --tag dig +short |
mlr --t2c --implicit-csv-header label domain,ip |
mlr --c2p --barred join -f test.csv -j domain then
nest --implode --values --across-records -f ip
+---------+----+-------------------------------+
| domain | id | ip |
+---------+----+-------------------------------+
| foo.com | 1 | 23.23.86.44 |
| bar.com | 2 | 104.27.138.186;104.27.139.186 |
+---------+----+-------------------------------+
edited Feb 15 at 15:19
answered Feb 15 at 13:28
sjysjy
1064
1064
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497375%2fhow-can-i-call-an-external-command-from-within-miller-mlr-s-dsl%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown