How can I call an external command from within Miller (mlr)’s DSL?

Suppose I have the following CSV:

$ cat test.csv
id,domain
1,foo.com
2,bar.com

Using mlr put, I can easily map any function over a field in the CSV, as long as I can define it in the Miller DSL. So, for example, mlr --csv put '$id = $id + 1' will increment the id by 1 for each record.

But what if I can’t define the function in Miller’s DSL, perhaps because it is not pure? Suppose I wanted to map each domain in the CSV to an IP address. I’d like to do something like mlr --csv put '$ip = shell("nslookup $domain"). Is there an easy way to do this?

Currently I am extracting the input field into a separate file, rewriting it in a separate shell script, and adding the result back in with mlr join. However, this is pretty messy, because my CSV is full of quoted commas and newlines, which I need to carefully handle myself rather than relying on Miller.

edited Feb 15 at 11:54

asked Jan 29 at 8:12

sjy

1064

add a comment |

Suppose I have the following CSV:

$ cat test.csv
id,domain
1,foo.com
2,bar.com

edited Feb 15 at 11:54

asked Jan 29 at 8:12

sjy

1064

add a comment |

Suppose I have the following CSV:

$ cat test.csv
id,domain
1,foo.com
2,bar.com

edited Feb 15 at 11:54

asked Jan 29 at 8:12

sjy

1064

Suppose I have the following CSV:

$ cat test.csv
id,domain
1,foo.com
2,bar.com

shell csv miller

edited Feb 15 at 11:54

asked Jan 29 at 8:12

sjy

1064

edited Feb 15 at 11:54

asked Jan 29 at 8:12

sjy

1064

edited Feb 15 at 11:54

asked Jan 29 at 8:12

sjy

1064

asked Jan 29 at 8:12

sjy

1064

asked Jan 29 at 8:12

sjy

1064

add a comment |

1 Answer
1

active

oldest

votes

Calling external commands from the Miller DSL

The Miller DSL reference deals with calling external commands in the section on redirected-output statements:

The print, dump, tee, emitf, emit, and emitp keywords all allow you to redirect output to one or more files or pipe-to commands.

I couldn’t find this in the documentation (other than by inference from the examples), but the syntax for using these statements with a pipe-to command seems to be statement | quoted-shell-command, unquoted-mlr-expression. For example:

$ mlr --csv put 'tee | "tr [a-z] [A-Z]", $*' test.csv
id,domain
1,foo.com
2,bar.com
ID,DOMAIN
1,FOO.COM
2,BAR.COM

Note that the piped output appears after Miller’s output (in this case, the unchanged input, as tee does not affect the stream and put emits it). By suppressing put’s output with -q, and extracting a single field with print $domain rather than tee $*, we can get a list of IP addresses:

$ mlr --csv put -q 'print | "xargs dig +short", $domain' test.csv
23.23.86.44
104.27.138.186
104.27.139.186

Miller didn’t do much for us here; we still had to use xargs to convert stdin into an argument (because dig does not accept domains on stdin). Moreover, dig’s output contained newlines, meaning that the output no longer matches the input one-to-one. Since mlr adheres to the Unix philosophy, it would have been easier just to join a pipe to the end of mlr --headerless-csv-output cut -f domain if this was all I needed.

Joining output from external commands to your input

What I really wanted to do was assign the result of calling an external command to an in-stream variable in the Miller DSL, and as far as I can tell, this is not possible. However, by swapping xargs for GNU parallel, we can use the --tag option to keep track of the argument we gave dig, and benefit from flexible, concurrent I/O:

$ mlr --csv --headerless-csv-output cut -f domain test.csv | parallel --tag dig +short
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186

Since we are dealing with CSV, parallel can actually handle this on its own, although we need to access fields by position (2) rather than name (domain):

$ < test.csv parallel -C "," --skip-first-line --tagstring 2 dig +short 2
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186

This is a tab-separated list of (domain, ip) pairs, so we can convert it back to CSV with a header using mlr --t2c --implicit-csv-header label domain,ip. Then, since both our output and our original test.csv have a domain field, we can use mlr join to produce a single output table, and mlr nest to implode the multiple values for bar.com:

$ mlr --csv cut -f domain test.csv | 
 parallel --skip-first-line --tag dig +short | 
 mlr --t2c --implicit-csv-header label domain,ip | 
 mlr --c2p --barred join -f test.csv -j domain then 
 nest --implode --values --across-records -f ip
+---------+----+-------------------------------+
| domain | id | ip |
+---------+----+-------------------------------+
| foo.com | 1 | 23.23.86.44 |
| bar.com | 2 | 104.27.138.186;104.27.139.186 |
+---------+----+-------------------------------+

edited Feb 15 at 15:19

answered Feb 15 at 13:28

sjy

1064

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497375%2fhow-can-i-call-an-external-command-from-within-miller-mlr-s-dsl%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Calling external commands from the Miller DSL

The Miller DSL reference deals with calling external commands in the section on redirected-output statements:

The print, dump, tee, emitf, emit, and emitp keywords all allow you to redirect output to one or more files or pipe-to commands.

$ mlr --csv put 'tee | "tr [a-z] [A-Z]", $*' test.csv
id,domain
1,foo.com
2,bar.com
ID,DOMAIN
1,FOO.COM
2,BAR.COM

$ mlr --csv put -q 'print | "xargs dig +short", $domain' test.csv
23.23.86.44
104.27.138.186
104.27.139.186

Joining output from external commands to your input

$ mlr --csv --headerless-csv-output cut -f domain test.csv | parallel --tag dig +short
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186

Since we are dealing with CSV, parallel can actually handle this on its own, although we need to access fields by position (2) rather than name (domain):

$ < test.csv parallel -C "," --skip-first-line --tagstring 2 dig +short 2
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186

$ mlr --csv cut -f domain test.csv | 
 parallel --skip-first-line --tag dig +short | 
 mlr --t2c --implicit-csv-header label domain,ip | 
 mlr --c2p --barred join -f test.csv -j domain then 
 nest --implode --values --across-records -f ip
+---------+----+-------------------------------+
| domain | id | ip |
+---------+----+-------------------------------+
| foo.com | 1 | 23.23.86.44 |
| bar.com | 2 | 104.27.138.186;104.27.139.186 |
+---------+----+-------------------------------+

edited Feb 15 at 15:19

answered Feb 15 at 13:28

sjy

1064

add a comment |

Calling external commands from the Miller DSL

The Miller DSL reference deals with calling external commands in the section on redirected-output statements:

The print, dump, tee, emitf, emit, and emitp keywords all allow you to redirect output to one or more files or pipe-to commands.

$ mlr --csv put 'tee | "tr [a-z] [A-Z]", $*' test.csv
id,domain
1,foo.com
2,bar.com
ID,DOMAIN
1,FOO.COM
2,BAR.COM

$ mlr --csv put -q 'print | "xargs dig +short", $domain' test.csv
23.23.86.44
104.27.138.186
104.27.139.186

Joining output from external commands to your input

$ mlr --csv --headerless-csv-output cut -f domain test.csv | parallel --tag dig +short
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186

Since we are dealing with CSV, parallel can actually handle this on its own, although we need to access fields by position (2) rather than name (domain):

$ < test.csv parallel -C "," --skip-first-line --tagstring 2 dig +short 2
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186

$ mlr --csv cut -f domain test.csv | 
 parallel --skip-first-line --tag dig +short | 
 mlr --t2c --implicit-csv-header label domain,ip | 
 mlr --c2p --barred join -f test.csv -j domain then 
 nest --implode --values --across-records -f ip
+---------+----+-------------------------------+
| domain | id | ip |
+---------+----+-------------------------------+
| foo.com | 1 | 23.23.86.44 |
| bar.com | 2 | 104.27.138.186;104.27.139.186 |
+---------+----+-------------------------------+

edited Feb 15 at 15:19

answered Feb 15 at 13:28

sjy

1064

add a comment |

Calling external commands from the Miller DSL

The Miller DSL reference deals with calling external commands in the section on redirected-output statements:

The print, dump, tee, emitf, emit, and emitp keywords all allow you to redirect output to one or more files or pipe-to commands.

$ mlr --csv put 'tee | "tr [a-z] [A-Z]", $*' test.csv
id,domain
1,foo.com
2,bar.com
ID,DOMAIN
1,FOO.COM
2,BAR.COM

$ mlr --csv put -q 'print | "xargs dig +short", $domain' test.csv
23.23.86.44
104.27.138.186
104.27.139.186

Joining output from external commands to your input

$ mlr --csv --headerless-csv-output cut -f domain test.csv | parallel --tag dig +short
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186

Since we are dealing with CSV, parallel can actually handle this on its own, although we need to access fields by position (2) rather than name (domain):

$ < test.csv parallel -C "," --skip-first-line --tagstring 2 dig +short 2
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186

$ mlr --csv cut -f domain test.csv | 
 parallel --skip-first-line --tag dig +short | 
 mlr --t2c --implicit-csv-header label domain,ip | 
 mlr --c2p --barred join -f test.csv -j domain then 
 nest --implode --values --across-records -f ip
+---------+----+-------------------------------+
| domain | id | ip |
+---------+----+-------------------------------+
| foo.com | 1 | 23.23.86.44 |
| bar.com | 2 | 104.27.138.186;104.27.139.186 |
+---------+----+-------------------------------+

edited Feb 15 at 15:19

answered Feb 15 at 13:28

sjy

1064

Calling external commands from the Miller DSL

The Miller DSL reference deals with calling external commands in the section on redirected-output statements:

The print, dump, tee, emitf, emit, and emitp keywords all allow you to redirect output to one or more files or pipe-to commands.

$ mlr --csv put 'tee | "tr [a-z] [A-Z]", $*' test.csv
id,domain
1,foo.com
2,bar.com
ID,DOMAIN
1,FOO.COM
2,BAR.COM

$ mlr --csv put -q 'print | "xargs dig +short", $domain' test.csv
23.23.86.44
104.27.138.186
104.27.139.186

Joining output from external commands to your input

$ mlr --csv --headerless-csv-output cut -f domain test.csv | parallel --tag dig +short
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186

Since we are dealing with CSV, parallel can actually handle this on its own, although we need to access fields by position (2) rather than name (domain):

$ < test.csv parallel -C "," --skip-first-line --tagstring 2 dig +short 2
foo.com 23.23.86.44
bar.com 104.27.139.186
bar.com 104.27.138.186

$ mlr --csv cut -f domain test.csv | 
 parallel --skip-first-line --tag dig +short | 
 mlr --t2c --implicit-csv-header label domain,ip | 
 mlr --c2p --barred join -f test.csv -j domain then 
 nest --implode --values --across-records -f ip
+---------+----+-------------------------------+
| domain | id | ip |
+---------+----+-------------------------------+
| foo.com | 1 | 23.23.86.44 |
| bar.com | 2 | 104.27.138.186;104.27.139.186 |
+---------+----+-------------------------------+

edited Feb 15 at 15:19

answered Feb 15 at 13:28

sjy

1064

edited Feb 15 at 15:19

answered Feb 15 at 13:28

sjy

1064

answered Feb 15 at 13:28

sjy

1064

answered Feb 15 at 13:28

sjy

1064

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu

How can I call an external command from within Miller (mlr)’s DSL?

1 Answer
1

Calling external commands from the Miller DSL

Joining output from external commands to your input

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Calling external commands from the Miller DSL

Joining output from external commands to your input

Calling external commands from the Miller DSL

Joining output from external commands to your input

Calling external commands from the Miller DSL

Joining output from external commands to your input

Calling external commands from the Miller DSL

Joining output from external commands to your input

Post as a guest

Popular posts from this blog

How to check contact read email or not when send email to Individual?

How many registers does an x86_64 CPU actually have?

Running qemu-guest-agent on windows server 2008

How can I call an external command from within Miller (mlr)’s DSL?

1 Answer 1

Calling external commands from the Miller DSL

Joining output from external commands to your input

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Calling external commands from the Miller DSL

Joining output from external commands to your input

Calling external commands from the Miller DSL

Joining output from external commands to your input

Calling external commands from the Miller DSL

Joining output from external commands to your input

Calling external commands from the Miller DSL

Joining output from external commands to your input

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

How to check contact read email or not when send email to Individual?

How many registers does an x86_64 CPU actually have?

Running qemu-guest-agent on windows server 2008

1 Answer
1

1 Answer
1

1 Answer
1