GNU parallel vs & (I mean background) vs xargs -P

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
34
down vote

favorite
17












I'm confused about the difference or advantage (if any) of running a set of tasks in a .sh script using GNU parallel



E.g. Ole Tange's answer:



parallel ./pngout -s0 R ::: *.png


rather than say looping through them putting them in the background &.



E.g. frostschutz's answer:



#copied from the link for illustration
for stuff in things
do
( something
with
stuff ) &
done
wait # for all the something with stuff


In short are they just syntactically or practically different? And if practically different when should I use each?










share|improve this question



























    up vote
    34
    down vote

    favorite
    17












    I'm confused about the difference or advantage (if any) of running a set of tasks in a .sh script using GNU parallel



    E.g. Ole Tange's answer:



    parallel ./pngout -s0 R ::: *.png


    rather than say looping through them putting them in the background &.



    E.g. frostschutz's answer:



    #copied from the link for illustration
    for stuff in things
    do
    ( something
    with
    stuff ) &
    done
    wait # for all the something with stuff


    In short are they just syntactically or practically different? And if practically different when should I use each?










    share|improve this question

























      up vote
      34
      down vote

      favorite
      17









      up vote
      34
      down vote

      favorite
      17






      17





      I'm confused about the difference or advantage (if any) of running a set of tasks in a .sh script using GNU parallel



      E.g. Ole Tange's answer:



      parallel ./pngout -s0 R ::: *.png


      rather than say looping through them putting them in the background &.



      E.g. frostschutz's answer:



      #copied from the link for illustration
      for stuff in things
      do
      ( something
      with
      stuff ) &
      done
      wait # for all the something with stuff


      In short are they just syntactically or practically different? And if practically different when should I use each?










      share|improve this question















      I'm confused about the difference or advantage (if any) of running a set of tasks in a .sh script using GNU parallel



      E.g. Ole Tange's answer:



      parallel ./pngout -s0 R ::: *.png


      rather than say looping through them putting them in the background &.



      E.g. frostschutz's answer:



      #copied from the link for illustration
      for stuff in things
      do
      ( something
      with
      stuff ) &
      done
      wait # for all the something with stuff


      In short are they just syntactically or practically different? And if practically different when should I use each?







      shell-script background-process xargs gnu-parallel






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Sep 11 '17 at 10:58









      Jeff Schaller

      32.4k849110




      32.4k849110










      asked Dec 12 '13 at 0:08









      Stephen Henderson

      4243715




      4243715




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          39
          down vote



          accepted










          Putting multiple jobs in the background is a good way of using the multiple cores of a single machine. parallel however, allows you to spread jobs across multiple servers of your network. From man parallel:




          GNU parallel is a shell tool for executing jobs in parallel using
          one or more computers. The typical input is a list of
          files, a list of hosts, a list of users, a list of URLs, or a list of tables.




          Even when running on a single computer, parallel gives you far greater control over how your jobs are parallelized. Take this example from the manpage:



           To convert *.wav to *.mp3 using LAME running one process per CPU core
          run:

          parallel lame -o ..mp3 ::: *.wav


          OK, you could do the same with



           for i in *wav; do lame "$i" -o "$i%.wav.mp3" & done


          However, that is longer and more cumbersome and, more importantly, will launch as many jobs as there are .wav files. If you run this on a few thousand files, it is likely to bring a normal laptop to its knees. parallel on the other hand, will launch one job per CPU core and keep everything nice and tidy.



          Basically, parallel offers you the ability to fine tune how your jobs are run and how much of available resources they should use. If you really want to see the power of this tool, go through its manual or, at the very least, the examples it offers.



          Simple backgrounding really has nowhere near the level of sophistication to be compared to parallel. As for how parallel differs from xargs, the GNU crowd give a nice breakdown here. Some of the more salient points are:



          • xargs deals badly with special characters (such as space, ' and ").

          • xargs can run a given number of jobs in parallel, but has no support for running number-of-cpu-cores jobs in parallel.

          • xargs has no support for grouping the output, therefore output may run together, e.g. the first half of a line is from one process and the last half of the line is from another process.

          • xargs has no support for keeping the order of the output, therefore if running jobs in parallel using xargs the output of the second job cannot be postponed till the first job is done.

          • xargs has no support for running jobs on remote computers.

          • xargs has no support for context replace, so you will have to create the arguments.





          share|improve this answer


















          • 1




            That's a good answer, thx. It sort of confirms what I guessed. I hate the parallel syntax, yet another new brand of keyboard-faceroll to memorise. But I guess the auto balancing across cores/jobs is worth it...?
            – Stephen Henderson
            Dec 12 '13 at 8:02






          • 3




            Have a look at sem which is part of the GNU Parallel package. That might suit your syntax requirements better.
            – Ole Tange
            Dec 12 '13 at 10:53






          • 1




            @OleTange thx, good call
            – Stephen Henderson
            Dec 12 '13 at 11:37










          • > xargs has no support for context replace, so you will have to create the arguments. --- What does this mean? Isn't it xargs -I %
            – raine
            Feb 18 '16 at 11:00







          • 2




            It's true that parallel is more powerful than xargs, but that comparison is rather biased. For example, xargs supports null-terminated strings as input to avoid problems with spaces and quotes, and can also -d to emulate parallel (even mentioned in the comparison!). xargs -I is sufficient context replacement for most simple cases, and I usually know the number of the cores on the machine. I never experienced a problem with ungrouped output.
            – Sam Brightman
            Aug 26 '16 at 10:10










          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f104778%2fgnu-parallel-vs-i-mean-background-vs-xargs-p%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          39
          down vote



          accepted










          Putting multiple jobs in the background is a good way of using the multiple cores of a single machine. parallel however, allows you to spread jobs across multiple servers of your network. From man parallel:




          GNU parallel is a shell tool for executing jobs in parallel using
          one or more computers. The typical input is a list of
          files, a list of hosts, a list of users, a list of URLs, or a list of tables.




          Even when running on a single computer, parallel gives you far greater control over how your jobs are parallelized. Take this example from the manpage:



           To convert *.wav to *.mp3 using LAME running one process per CPU core
          run:

          parallel lame -o ..mp3 ::: *.wav


          OK, you could do the same with



           for i in *wav; do lame "$i" -o "$i%.wav.mp3" & done


          However, that is longer and more cumbersome and, more importantly, will launch as many jobs as there are .wav files. If you run this on a few thousand files, it is likely to bring a normal laptop to its knees. parallel on the other hand, will launch one job per CPU core and keep everything nice and tidy.



          Basically, parallel offers you the ability to fine tune how your jobs are run and how much of available resources they should use. If you really want to see the power of this tool, go through its manual or, at the very least, the examples it offers.



          Simple backgrounding really has nowhere near the level of sophistication to be compared to parallel. As for how parallel differs from xargs, the GNU crowd give a nice breakdown here. Some of the more salient points are:



          • xargs deals badly with special characters (such as space, ' and ").

          • xargs can run a given number of jobs in parallel, but has no support for running number-of-cpu-cores jobs in parallel.

          • xargs has no support for grouping the output, therefore output may run together, e.g. the first half of a line is from one process and the last half of the line is from another process.

          • xargs has no support for keeping the order of the output, therefore if running jobs in parallel using xargs the output of the second job cannot be postponed till the first job is done.

          • xargs has no support for running jobs on remote computers.

          • xargs has no support for context replace, so you will have to create the arguments.





          share|improve this answer


















          • 1




            That's a good answer, thx. It sort of confirms what I guessed. I hate the parallel syntax, yet another new brand of keyboard-faceroll to memorise. But I guess the auto balancing across cores/jobs is worth it...?
            – Stephen Henderson
            Dec 12 '13 at 8:02






          • 3




            Have a look at sem which is part of the GNU Parallel package. That might suit your syntax requirements better.
            – Ole Tange
            Dec 12 '13 at 10:53






          • 1




            @OleTange thx, good call
            – Stephen Henderson
            Dec 12 '13 at 11:37










          • > xargs has no support for context replace, so you will have to create the arguments. --- What does this mean? Isn't it xargs -I %
            – raine
            Feb 18 '16 at 11:00







          • 2




            It's true that parallel is more powerful than xargs, but that comparison is rather biased. For example, xargs supports null-terminated strings as input to avoid problems with spaces and quotes, and can also -d to emulate parallel (even mentioned in the comparison!). xargs -I is sufficient context replacement for most simple cases, and I usually know the number of the cores on the machine. I never experienced a problem with ungrouped output.
            – Sam Brightman
            Aug 26 '16 at 10:10














          up vote
          39
          down vote



          accepted










          Putting multiple jobs in the background is a good way of using the multiple cores of a single machine. parallel however, allows you to spread jobs across multiple servers of your network. From man parallel:




          GNU parallel is a shell tool for executing jobs in parallel using
          one or more computers. The typical input is a list of
          files, a list of hosts, a list of users, a list of URLs, or a list of tables.




          Even when running on a single computer, parallel gives you far greater control over how your jobs are parallelized. Take this example from the manpage:



           To convert *.wav to *.mp3 using LAME running one process per CPU core
          run:

          parallel lame -o ..mp3 ::: *.wav


          OK, you could do the same with



           for i in *wav; do lame "$i" -o "$i%.wav.mp3" & done


          However, that is longer and more cumbersome and, more importantly, will launch as many jobs as there are .wav files. If you run this on a few thousand files, it is likely to bring a normal laptop to its knees. parallel on the other hand, will launch one job per CPU core and keep everything nice and tidy.



          Basically, parallel offers you the ability to fine tune how your jobs are run and how much of available resources they should use. If you really want to see the power of this tool, go through its manual or, at the very least, the examples it offers.



          Simple backgrounding really has nowhere near the level of sophistication to be compared to parallel. As for how parallel differs from xargs, the GNU crowd give a nice breakdown here. Some of the more salient points are:



          • xargs deals badly with special characters (such as space, ' and ").

          • xargs can run a given number of jobs in parallel, but has no support for running number-of-cpu-cores jobs in parallel.

          • xargs has no support for grouping the output, therefore output may run together, e.g. the first half of a line is from one process and the last half of the line is from another process.

          • xargs has no support for keeping the order of the output, therefore if running jobs in parallel using xargs the output of the second job cannot be postponed till the first job is done.

          • xargs has no support for running jobs on remote computers.

          • xargs has no support for context replace, so you will have to create the arguments.





          share|improve this answer


















          • 1




            That's a good answer, thx. It sort of confirms what I guessed. I hate the parallel syntax, yet another new brand of keyboard-faceroll to memorise. But I guess the auto balancing across cores/jobs is worth it...?
            – Stephen Henderson
            Dec 12 '13 at 8:02






          • 3




            Have a look at sem which is part of the GNU Parallel package. That might suit your syntax requirements better.
            – Ole Tange
            Dec 12 '13 at 10:53






          • 1




            @OleTange thx, good call
            – Stephen Henderson
            Dec 12 '13 at 11:37










          • > xargs has no support for context replace, so you will have to create the arguments. --- What does this mean? Isn't it xargs -I %
            – raine
            Feb 18 '16 at 11:00







          • 2




            It's true that parallel is more powerful than xargs, but that comparison is rather biased. For example, xargs supports null-terminated strings as input to avoid problems with spaces and quotes, and can also -d to emulate parallel (even mentioned in the comparison!). xargs -I is sufficient context replacement for most simple cases, and I usually know the number of the cores on the machine. I never experienced a problem with ungrouped output.
            – Sam Brightman
            Aug 26 '16 at 10:10












          up vote
          39
          down vote



          accepted







          up vote
          39
          down vote



          accepted






          Putting multiple jobs in the background is a good way of using the multiple cores of a single machine. parallel however, allows you to spread jobs across multiple servers of your network. From man parallel:




          GNU parallel is a shell tool for executing jobs in parallel using
          one or more computers. The typical input is a list of
          files, a list of hosts, a list of users, a list of URLs, or a list of tables.




          Even when running on a single computer, parallel gives you far greater control over how your jobs are parallelized. Take this example from the manpage:



           To convert *.wav to *.mp3 using LAME running one process per CPU core
          run:

          parallel lame -o ..mp3 ::: *.wav


          OK, you could do the same with



           for i in *wav; do lame "$i" -o "$i%.wav.mp3" & done


          However, that is longer and more cumbersome and, more importantly, will launch as many jobs as there are .wav files. If you run this on a few thousand files, it is likely to bring a normal laptop to its knees. parallel on the other hand, will launch one job per CPU core and keep everything nice and tidy.



          Basically, parallel offers you the ability to fine tune how your jobs are run and how much of available resources they should use. If you really want to see the power of this tool, go through its manual or, at the very least, the examples it offers.



          Simple backgrounding really has nowhere near the level of sophistication to be compared to parallel. As for how parallel differs from xargs, the GNU crowd give a nice breakdown here. Some of the more salient points are:



          • xargs deals badly with special characters (such as space, ' and ").

          • xargs can run a given number of jobs in parallel, but has no support for running number-of-cpu-cores jobs in parallel.

          • xargs has no support for grouping the output, therefore output may run together, e.g. the first half of a line is from one process and the last half of the line is from another process.

          • xargs has no support for keeping the order of the output, therefore if running jobs in parallel using xargs the output of the second job cannot be postponed till the first job is done.

          • xargs has no support for running jobs on remote computers.

          • xargs has no support for context replace, so you will have to create the arguments.





          share|improve this answer














          Putting multiple jobs in the background is a good way of using the multiple cores of a single machine. parallel however, allows you to spread jobs across multiple servers of your network. From man parallel:




          GNU parallel is a shell tool for executing jobs in parallel using
          one or more computers. The typical input is a list of
          files, a list of hosts, a list of users, a list of URLs, or a list of tables.




          Even when running on a single computer, parallel gives you far greater control over how your jobs are parallelized. Take this example from the manpage:



           To convert *.wav to *.mp3 using LAME running one process per CPU core
          run:

          parallel lame -o ..mp3 ::: *.wav


          OK, you could do the same with



           for i in *wav; do lame "$i" -o "$i%.wav.mp3" & done


          However, that is longer and more cumbersome and, more importantly, will launch as many jobs as there are .wav files. If you run this on a few thousand files, it is likely to bring a normal laptop to its knees. parallel on the other hand, will launch one job per CPU core and keep everything nice and tidy.



          Basically, parallel offers you the ability to fine tune how your jobs are run and how much of available resources they should use. If you really want to see the power of this tool, go through its manual or, at the very least, the examples it offers.



          Simple backgrounding really has nowhere near the level of sophistication to be compared to parallel. As for how parallel differs from xargs, the GNU crowd give a nice breakdown here. Some of the more salient points are:



          • xargs deals badly with special characters (such as space, ' and ").

          • xargs can run a given number of jobs in parallel, but has no support for running number-of-cpu-cores jobs in parallel.

          • xargs has no support for grouping the output, therefore output may run together, e.g. the first half of a line is from one process and the last half of the line is from another process.

          • xargs has no support for keeping the order of the output, therefore if running jobs in parallel using xargs the output of the second job cannot be postponed till the first job is done.

          • xargs has no support for running jobs on remote computers.

          • xargs has no support for context replace, so you will have to create the arguments.






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Aug 10 at 19:12









          Ole Tange

          11.4k1344102




          11.4k1344102










          answered Dec 12 '13 at 4:09









          terdon♦

          123k28232404




          123k28232404







          • 1




            That's a good answer, thx. It sort of confirms what I guessed. I hate the parallel syntax, yet another new brand of keyboard-faceroll to memorise. But I guess the auto balancing across cores/jobs is worth it...?
            – Stephen Henderson
            Dec 12 '13 at 8:02






          • 3




            Have a look at sem which is part of the GNU Parallel package. That might suit your syntax requirements better.
            – Ole Tange
            Dec 12 '13 at 10:53






          • 1




            @OleTange thx, good call
            – Stephen Henderson
            Dec 12 '13 at 11:37










          • > xargs has no support for context replace, so you will have to create the arguments. --- What does this mean? Isn't it xargs -I %
            – raine
            Feb 18 '16 at 11:00







          • 2




            It's true that parallel is more powerful than xargs, but that comparison is rather biased. For example, xargs supports null-terminated strings as input to avoid problems with spaces and quotes, and can also -d to emulate parallel (even mentioned in the comparison!). xargs -I is sufficient context replacement for most simple cases, and I usually know the number of the cores on the machine. I never experienced a problem with ungrouped output.
            – Sam Brightman
            Aug 26 '16 at 10:10












          • 1




            That's a good answer, thx. It sort of confirms what I guessed. I hate the parallel syntax, yet another new brand of keyboard-faceroll to memorise. But I guess the auto balancing across cores/jobs is worth it...?
            – Stephen Henderson
            Dec 12 '13 at 8:02






          • 3




            Have a look at sem which is part of the GNU Parallel package. That might suit your syntax requirements better.
            – Ole Tange
            Dec 12 '13 at 10:53






          • 1




            @OleTange thx, good call
            – Stephen Henderson
            Dec 12 '13 at 11:37










          • > xargs has no support for context replace, so you will have to create the arguments. --- What does this mean? Isn't it xargs -I %
            – raine
            Feb 18 '16 at 11:00







          • 2




            It's true that parallel is more powerful than xargs, but that comparison is rather biased. For example, xargs supports null-terminated strings as input to avoid problems with spaces and quotes, and can also -d to emulate parallel (even mentioned in the comparison!). xargs -I is sufficient context replacement for most simple cases, and I usually know the number of the cores on the machine. I never experienced a problem with ungrouped output.
            – Sam Brightman
            Aug 26 '16 at 10:10







          1




          1




          That's a good answer, thx. It sort of confirms what I guessed. I hate the parallel syntax, yet another new brand of keyboard-faceroll to memorise. But I guess the auto balancing across cores/jobs is worth it...?
          – Stephen Henderson
          Dec 12 '13 at 8:02




          That's a good answer, thx. It sort of confirms what I guessed. I hate the parallel syntax, yet another new brand of keyboard-faceroll to memorise. But I guess the auto balancing across cores/jobs is worth it...?
          – Stephen Henderson
          Dec 12 '13 at 8:02




          3




          3




          Have a look at sem which is part of the GNU Parallel package. That might suit your syntax requirements better.
          – Ole Tange
          Dec 12 '13 at 10:53




          Have a look at sem which is part of the GNU Parallel package. That might suit your syntax requirements better.
          – Ole Tange
          Dec 12 '13 at 10:53




          1




          1




          @OleTange thx, good call
          – Stephen Henderson
          Dec 12 '13 at 11:37




          @OleTange thx, good call
          – Stephen Henderson
          Dec 12 '13 at 11:37












          > xargs has no support for context replace, so you will have to create the arguments. --- What does this mean? Isn't it xargs -I %
          – raine
          Feb 18 '16 at 11:00





          > xargs has no support for context replace, so you will have to create the arguments. --- What does this mean? Isn't it xargs -I %
          – raine
          Feb 18 '16 at 11:00





          2




          2




          It's true that parallel is more powerful than xargs, but that comparison is rather biased. For example, xargs supports null-terminated strings as input to avoid problems with spaces and quotes, and can also -d to emulate parallel (even mentioned in the comparison!). xargs -I is sufficient context replacement for most simple cases, and I usually know the number of the cores on the machine. I never experienced a problem with ungrouped output.
          – Sam Brightman
          Aug 26 '16 at 10:10




          It's true that parallel is more powerful than xargs, but that comparison is rather biased. For example, xargs supports null-terminated strings as input to avoid problems with spaces and quotes, and can also -d to emulate parallel (even mentioned in the comparison!). xargs -I is sufficient context replacement for most simple cases, and I usually know the number of the cores on the machine. I never experienced a problem with ungrouped output.
          – Sam Brightman
          Aug 26 '16 at 10:10

















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f104778%2fgnu-parallel-vs-i-mean-background-vs-xargs-p%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Displaying single band from multi-band raster using QGIS

          How many registers does an x86_64 CPU actually have?