Parallely running multiple copies of the same file with different inputs using shell script

up vote
2
down vote

favorite

Suppose I have a file "Analysis.C" which takes a data file as input. The data file is named as "a.00001.txt" through "a.01000.txt". One way to loop over all the files is to write a shell script where I use sed to change the input file name in "Analysis.C" over an iteration from 0001 to 1000. However, I have to do this one input file at a time.

What I want is to run multiple instances of the file "Analysis.C" in parallel where it takes different inputs in each instance (the constraint here is the number of cores I can spare on my PC, I suppose), and executes the different instances at the same time. How do I do that?

asked Aug 31 at 5:54

Diptanil Roy

152

add a commentÂ |Â

up vote
2
down vote

favorite

asked Aug 31 at 5:54

Diptanil Roy

152

add a commentÂ |Â

up vote
2
down vote

favorite

asked Aug 31 at 5:54

Diptanil Roy

152

linux shell-script shell gnu-parallel

asked Aug 31 at 5:54

Diptanil Roy

152

asked Aug 31 at 5:54

Diptanil Roy

152

asked Aug 31 at 5:54

Diptanil Roy

152

asked Aug 31 at 5:54

Diptanil Roy

152

asked Aug 31 at 5:54

Diptanil Roy

152

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

With GNU Parallel you can do this:

parallel analysis.C ::: *.txt

Or if you have really many .txt-files:

printf '%s' *.txt | parallel -0 analysis.C

It will default to run one job per CPU thread. This can be adjusted with -j20 for 20 jobs in parallel.

Contrary to the parallel.moreutils-solution you can post process the output: The output is serialized, so you will never see output from two jobs mix.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Read the book: https://doi.org/10.5281/zenodo.1146014

answered Aug 31 at 7:10

Ole Tange

11.5k1445103

add a commentÂ |Â

up vote
1
down vote

See the parallel command (from the moreutils package in many distros). From the man page:

parallel runs the specified command, passing it a single one of the specified
arguments. This is repeated for each argument. Jobs may be run in parallel. The default is to run one job per CPU.

So:

parallel analysis.C -- a.0????.txt

edited Aug 31 at 6:33

answered Aug 31 at 6:24

xenoid

1,7171620

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f465926%2fparallely-running-multiple-copies-of-the-same-file-with-different-inputs-using-s%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

With GNU Parallel you can do this:

parallel analysis.C ::: *.txt

Or if you have really many .txt-files:

printf '%s' *.txt | parallel -0 analysis.C

It will default to run one job per CPU thread. This can be adjusted with -j20 for 20 jobs in parallel.

Contrary to the parallel.moreutils-solution you can post process the output: The output is serialized, so you will never see output from two jobs mix.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Read the book: https://doi.org/10.5281/zenodo.1146014

answered Aug 31 at 7:10

Ole Tange

11.5k1445103

add a commentÂ |Â

up vote
1
down vote

accepted

With GNU Parallel you can do this:

parallel analysis.C ::: *.txt

Or if you have really many .txt-files:

printf '%s' *.txt | parallel -0 analysis.C

It will default to run one job per CPU thread. This can be adjusted with -j20 for 20 jobs in parallel.

Contrary to the parallel.moreutils-solution you can post process the output: The output is serialized, so you will never see output from two jobs mix.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Read the book: https://doi.org/10.5281/zenodo.1146014

answered Aug 31 at 7:10

Ole Tange

11.5k1445103

add a commentÂ |Â

up vote
1
down vote

accepted

With GNU Parallel you can do this:

parallel analysis.C ::: *.txt

Or if you have really many .txt-files:

printf '%s' *.txt | parallel -0 analysis.C

It will default to run one job per CPU thread. This can be adjusted with -j20 for 20 jobs in parallel.

Contrary to the parallel.moreutils-solution you can post process the output: The output is serialized, so you will never see output from two jobs mix.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Read the book: https://doi.org/10.5281/zenodo.1146014

answered Aug 31 at 7:10

Ole Tange

11.5k1445103

With GNU Parallel you can do this:

parallel analysis.C ::: *.txt

Or if you have really many .txt-files:

printf '%s' *.txt | parallel -0 analysis.C

It will default to run one job per CPU thread. This can be adjusted with -j20 for 20 jobs in parallel.

Contrary to the parallel.moreutils-solution you can post process the output: The output is serialized, so you will never see output from two jobs mix.

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Read the book: https://doi.org/10.5281/zenodo.1146014

answered Aug 31 at 7:10

Ole Tange

11.5k1445103

answered Aug 31 at 7:10

Ole Tange

11.5k1445103

answered Aug 31 at 7:10

Ole Tange

11.5k1445103

answered Aug 31 at 7:10

Ole Tange

11.5k1445103

add a commentÂ |Â

up vote
1
down vote

See the parallel command (from the moreutils package in many distros). From the man page:

parallel runs the specified command, passing it a single one of the specified
arguments. This is repeated for each argument. Jobs may be run in parallel. The default is to run one job per CPU.

So:

parallel analysis.C -- a.0????.txt

edited Aug 31 at 6:33

answered Aug 31 at 6:24

xenoid

1,7171620

add a commentÂ |Â

up vote
1
down vote

See the parallel command (from the moreutils package in many distros). From the man page:

parallel runs the specified command, passing it a single one of the specified
arguments. This is repeated for each argument. Jobs may be run in parallel. The default is to run one job per CPU.

So:

parallel analysis.C -- a.0????.txt

edited Aug 31 at 6:33

answered Aug 31 at 6:24

xenoid

1,7171620

add a commentÂ |Â

up vote
1
down vote

See the parallel command (from the moreutils package in many distros). From the man page:

parallel runs the specified command, passing it a single one of the specified
arguments. This is repeated for each argument. Jobs may be run in parallel. The default is to run one job per CPU.

So:

parallel analysis.C -- a.0????.txt

edited Aug 31 at 6:33

answered Aug 31 at 6:24

xenoid

1,7171620

See the parallel command (from the moreutils package in many distros). From the man page:

parallel runs the specified command, passing it a single one of the specified
arguments. This is repeated for each argument. Jobs may be run in parallel. The default is to run one job per CPU.

So:

parallel analysis.C -- a.0????.txt

edited Aug 31 at 6:33

answered Aug 31 at 6:24

xenoid

1,7171620

edited Aug 31 at 6:33

answered Aug 31 at 6:24

xenoid

1,7171620

answered Aug 31 at 6:24

xenoid

1,7171620

answered Aug 31 at 6:24

xenoid

1,7171620

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu