Do file descriptors optimise writing to files?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite












Is it equivalent to have commands print to a file directly, as opposed to writing to a file descriptor?



Illustration



Writing to file directly:



for i in 1..1000; do >>x echo "$i"; done


Using an fd:



exec 3>&1 1>x
for i in 1..1000; do echo "$i"; done
exec 1>&3 3>&-



Is the latter one more efficient?







share|improve this question

















  • 5




    both write(2) to a file descriptor, how do you think one is more "direct" ?
    – thrig
    May 18 at 14:32






  • 3




    What about for i in 1..1000; do echo "$i"; done > x? That only opens the file once.
    – RonJohn
    May 18 at 22:05










  • @RonJohn It's already mentioned in Matija Nalis's answer.
    – Tomasz
    May 18 at 22:10














up vote
6
down vote

favorite












Is it equivalent to have commands print to a file directly, as opposed to writing to a file descriptor?



Illustration



Writing to file directly:



for i in 1..1000; do >>x echo "$i"; done


Using an fd:



exec 3>&1 1>x
for i in 1..1000; do echo "$i"; done
exec 1>&3 3>&-



Is the latter one more efficient?







share|improve this question

















  • 5




    both write(2) to a file descriptor, how do you think one is more "direct" ?
    – thrig
    May 18 at 14:32






  • 3




    What about for i in 1..1000; do echo "$i"; done > x? That only opens the file once.
    – RonJohn
    May 18 at 22:05










  • @RonJohn It's already mentioned in Matija Nalis's answer.
    – Tomasz
    May 18 at 22:10












up vote
6
down vote

favorite









up vote
6
down vote

favorite











Is it equivalent to have commands print to a file directly, as opposed to writing to a file descriptor?



Illustration



Writing to file directly:



for i in 1..1000; do >>x echo "$i"; done


Using an fd:



exec 3>&1 1>x
for i in 1..1000; do echo "$i"; done
exec 1>&3 3>&-



Is the latter one more efficient?







share|improve this question













Is it equivalent to have commands print to a file directly, as opposed to writing to a file descriptor?



Illustration



Writing to file directly:



for i in 1..1000; do >>x echo "$i"; done


Using an fd:



exec 3>&1 1>x
for i in 1..1000; do echo "$i"; done
exec 1>&3 3>&-



Is the latter one more efficient?









share|improve this question












share|improve this question




share|improve this question








edited May 19 at 9:18









Gilles

503k1189951522




503k1189951522









asked May 18 at 14:28









Tomasz

8,03052560




8,03052560







  • 5




    both write(2) to a file descriptor, how do you think one is more "direct" ?
    – thrig
    May 18 at 14:32






  • 3




    What about for i in 1..1000; do echo "$i"; done > x? That only opens the file once.
    – RonJohn
    May 18 at 22:05










  • @RonJohn It's already mentioned in Matija Nalis's answer.
    – Tomasz
    May 18 at 22:10












  • 5




    both write(2) to a file descriptor, how do you think one is more "direct" ?
    – thrig
    May 18 at 14:32






  • 3




    What about for i in 1..1000; do echo "$i"; done > x? That only opens the file once.
    – RonJohn
    May 18 at 22:05










  • @RonJohn It's already mentioned in Matija Nalis's answer.
    – Tomasz
    May 18 at 22:10







5




5




both write(2) to a file descriptor, how do you think one is more "direct" ?
– thrig
May 18 at 14:32




both write(2) to a file descriptor, how do you think one is more "direct" ?
– thrig
May 18 at 14:32




3




3




What about for i in 1..1000; do echo "$i"; done > x? That only opens the file once.
– RonJohn
May 18 at 22:05




What about for i in 1..1000; do echo "$i"; done > x? That only opens the file once.
– RonJohn
May 18 at 22:05












@RonJohn It's already mentioned in Matija Nalis's answer.
– Tomasz
May 18 at 22:10




@RonJohn It's already mentioned in Matija Nalis's answer.
– Tomasz
May 18 at 22:10










3 Answers
3






active

oldest

votes

















up vote
12
down vote



accepted










The main difference between opening the file before the loop with exec, and putting the redirection in the command in the loop is that the former requires setting up the file descriptor just once, while the latter opens and closes the file for each iteration of the loop.



Doing it once is likely to be more efficient, but if you were to run an external command inside the loop, the difference would probably disappear in the cost of launching the command. (echo here is probably builtin, so that doesn't apply)



If the output is going to be sent to something other than a regular file (e.g. if x is a named pipe), the act of opening and closing the file may be visible to other processes, so there may be differences in behaviour, too.




Note that there's really no difference between a redirection through exec and a redirection on the command, they both open the file and juggle file descriptor numbers.



These two should be pretty much equivalent, in that they both open() the file and write() to it. (There's differences in how fd 1 is stored for the duration of the command, though.):



for i in 1..1000; do 
>>x echo "$i"
done


for i in 1..1000; do
exec 3>&1 1>>x # assuming fd 3 is available
echo "$i" # here, fd 3 is visible to the command
exec 1>&3 3>&-
done







share|improve this answer



















  • 6




    Of course, you could also redirect the whole loop into x: for i in 1..1000; do echo $i; done >x. I'd say this is the most readable version.
    – Beat Bolli
    May 18 at 18:33










  • I've measured the difference between both used in every iteration for 1m iterations, and the difference is rather substantial (70-90%). See my answer: unix.stackexchange.com/a/444729/181255 The ratio is even higher with just 1k iterations (ca. 300%).
    – Tomasz
    May 19 at 0:03

















up vote
7
down vote













Yes, it is more efficient



Easiest way to test is to increase count to say 500000 and time it:



> time bash s1.sh; time bash s2.sh
bash s1.sh 16,47s user 10,00s system 99% cpu 26,537 total
bash s2.sh 10,51s user 3,50s system 99% cpu 14,008 total


strace(1) reveals why (we have one simple write, instead of open+5*fcntl+2*dup+2*close+write):



for for i in 1..1000; do >>x echo "$i"; done we get:



open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
fcntl(1, F_GETFD) = 0
fcntl(1, F_DUPFD, 10) = 10
fcntl(1, F_GETFD) = 0
fcntl(10, F_SETFD, FD_CLOEXEC) = 0
dup2(3, 1) = 1
close(3) = 0
write(1, "997n", 4) = 4
dup2(10, 1) = 1
fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(10) = 0
open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
fcntl(1, F_GETFD) = 0
fcntl(1, F_DUPFD, 10) = 10
fcntl(1, F_GETFD) = 0
fcntl(10, F_SETFD, FD_CLOEXEC) = 0
dup2(3, 1) = 1
close(3) = 0
write(1, "998n", 4) = 4
dup2(10, 1) = 1
fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(10) = 0
open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
fcntl(1, F_GETFD) = 0
fcntl(1, F_DUPFD, 10) = 10
fcntl(1, F_GETFD) = 0
fcntl(10, F_SETFD, FD_CLOEXEC) = 0
dup2(3, 1) = 1
close(3) = 0
write(1, "999n", 4) = 4
dup2(10, 1) = 1
fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(10) = 0
open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
fcntl(1, F_GETFD) = 0
fcntl(1, F_DUPFD, 10) = 10
fcntl(1, F_GETFD) = 0
fcntl(10, F_SETFD, FD_CLOEXEC) = 0
dup2(3, 1) = 1
close(3) = 0
write(1, "1000n", 5) = 5
dup2(10, 1) = 1
fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(10) = 0


while for exec 3>&1 1>x we get much cleaner



write(1, "995n", 4) = 4
write(1, "996n", 4) = 4
write(1, "997n", 4) = 4
write(1, "998n", 4) = 4
write(1, "999n", 4) = 4
write(1, "1000n", 5) = 5


But note that the difference is not due to "using a FD", but because of place where you do redirection. For example, if you were to do for i in 1..1000; do echo "$i"; done > x you would get pretty much the same performance as your second example:



bash s3.sh 10,35s user 3,70s system 100% cpu 14,042 total





share|improve this answer






























    up vote
    0
    down vote













    To sum things up and add a new bit of information in this thread, here's a comparison of four ways to do it, ordered by efficiency. I estimate the efficiency by time measurement (user + sys) for 1 million iterations, based on two test series.



    1. These two are about the same:

      • Simple > loop redirection (time: 100%)

      • Using exec once for the whole loop (time: ~100%)


    2. Using >> for each iteration (time: 200% - 250%)

    3. Using exec for each iteration (time: 340% - 480%)

    The conclusion's this:



    There's a small difference between using exec vs. simple redirections like >>. (Simple is cheaper). It doesn't show on the single command execution level, but with a high number of repetitions, the difference becomes visible. Though the execution weight of the command redirected to shadows the differences, as noticed by ikkachu in the other answer.






    share|improve this answer





















    • The relative costs depend on the OS and hardware, and the cost of what's in the loop. (e.g. on x86, Spectre + Meltdown mitigation significantly increased the cost of system calls, making redirection inside the loop worse). If you literally want to generate a sequence of integers, seq 1000, or maybe printf "%sn" 1..1000 is going to be faster than a bash loop. (seq 1000000 > /dev/null runs in ~0.03s, time printf "%sn" 1..1000000 > /dev/null in ~0.7s, time for i in 1..1000000; do echo "$i"; done > /dev/null in 2.1s, on a 3.9GHz Skylake i7-6700k, Linux 4.15.8-1-ARCH.
      – Peter Cordes
      May 19 at 3:50










    • Now I'm curious where the break-even point is for builtin printf vs. fork+exec of seq. I guess I could wrap a repeat loop around the whole thing and time that. And BTW, time awk 'BEGIN for(i=1; i<1000000 ; i++) print i' > /dev/null is about 3x faster than printf "%sn" 1..1000000, so even if you don't have seq, a good AWK implementation can beat builtins for large problem sizes.
      – Peter Cordes
      May 19 at 3:55










    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f444624%2fdo-file-descriptors-optimise-writing-to-files%23new-answer', 'question_page');

    );

    Post as a guest






























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    12
    down vote



    accepted










    The main difference between opening the file before the loop with exec, and putting the redirection in the command in the loop is that the former requires setting up the file descriptor just once, while the latter opens and closes the file for each iteration of the loop.



    Doing it once is likely to be more efficient, but if you were to run an external command inside the loop, the difference would probably disappear in the cost of launching the command. (echo here is probably builtin, so that doesn't apply)



    If the output is going to be sent to something other than a regular file (e.g. if x is a named pipe), the act of opening and closing the file may be visible to other processes, so there may be differences in behaviour, too.




    Note that there's really no difference between a redirection through exec and a redirection on the command, they both open the file and juggle file descriptor numbers.



    These two should be pretty much equivalent, in that they both open() the file and write() to it. (There's differences in how fd 1 is stored for the duration of the command, though.):



    for i in 1..1000; do 
    >>x echo "$i"
    done


    for i in 1..1000; do
    exec 3>&1 1>>x # assuming fd 3 is available
    echo "$i" # here, fd 3 is visible to the command
    exec 1>&3 3>&-
    done







    share|improve this answer



















    • 6




      Of course, you could also redirect the whole loop into x: for i in 1..1000; do echo $i; done >x. I'd say this is the most readable version.
      – Beat Bolli
      May 18 at 18:33










    • I've measured the difference between both used in every iteration for 1m iterations, and the difference is rather substantial (70-90%). See my answer: unix.stackexchange.com/a/444729/181255 The ratio is even higher with just 1k iterations (ca. 300%).
      – Tomasz
      May 19 at 0:03














    up vote
    12
    down vote



    accepted










    The main difference between opening the file before the loop with exec, and putting the redirection in the command in the loop is that the former requires setting up the file descriptor just once, while the latter opens and closes the file for each iteration of the loop.



    Doing it once is likely to be more efficient, but if you were to run an external command inside the loop, the difference would probably disappear in the cost of launching the command. (echo here is probably builtin, so that doesn't apply)



    If the output is going to be sent to something other than a regular file (e.g. if x is a named pipe), the act of opening and closing the file may be visible to other processes, so there may be differences in behaviour, too.




    Note that there's really no difference between a redirection through exec and a redirection on the command, they both open the file and juggle file descriptor numbers.



    These two should be pretty much equivalent, in that they both open() the file and write() to it. (There's differences in how fd 1 is stored for the duration of the command, though.):



    for i in 1..1000; do 
    >>x echo "$i"
    done


    for i in 1..1000; do
    exec 3>&1 1>>x # assuming fd 3 is available
    echo "$i" # here, fd 3 is visible to the command
    exec 1>&3 3>&-
    done







    share|improve this answer



















    • 6




      Of course, you could also redirect the whole loop into x: for i in 1..1000; do echo $i; done >x. I'd say this is the most readable version.
      – Beat Bolli
      May 18 at 18:33










    • I've measured the difference between both used in every iteration for 1m iterations, and the difference is rather substantial (70-90%). See my answer: unix.stackexchange.com/a/444729/181255 The ratio is even higher with just 1k iterations (ca. 300%).
      – Tomasz
      May 19 at 0:03












    up vote
    12
    down vote



    accepted







    up vote
    12
    down vote



    accepted






    The main difference between opening the file before the loop with exec, and putting the redirection in the command in the loop is that the former requires setting up the file descriptor just once, while the latter opens and closes the file for each iteration of the loop.



    Doing it once is likely to be more efficient, but if you were to run an external command inside the loop, the difference would probably disappear in the cost of launching the command. (echo here is probably builtin, so that doesn't apply)



    If the output is going to be sent to something other than a regular file (e.g. if x is a named pipe), the act of opening and closing the file may be visible to other processes, so there may be differences in behaviour, too.




    Note that there's really no difference between a redirection through exec and a redirection on the command, they both open the file and juggle file descriptor numbers.



    These two should be pretty much equivalent, in that they both open() the file and write() to it. (There's differences in how fd 1 is stored for the duration of the command, though.):



    for i in 1..1000; do 
    >>x echo "$i"
    done


    for i in 1..1000; do
    exec 3>&1 1>>x # assuming fd 3 is available
    echo "$i" # here, fd 3 is visible to the command
    exec 1>&3 3>&-
    done







    share|improve this answer















    The main difference between opening the file before the loop with exec, and putting the redirection in the command in the loop is that the former requires setting up the file descriptor just once, while the latter opens and closes the file for each iteration of the loop.



    Doing it once is likely to be more efficient, but if you were to run an external command inside the loop, the difference would probably disappear in the cost of launching the command. (echo here is probably builtin, so that doesn't apply)



    If the output is going to be sent to something other than a regular file (e.g. if x is a named pipe), the act of opening and closing the file may be visible to other processes, so there may be differences in behaviour, too.




    Note that there's really no difference between a redirection through exec and a redirection on the command, they both open the file and juggle file descriptor numbers.



    These two should be pretty much equivalent, in that they both open() the file and write() to it. (There's differences in how fd 1 is stored for the duration of the command, though.):



    for i in 1..1000; do 
    >>x echo "$i"
    done


    for i in 1..1000; do
    exec 3>&1 1>>x # assuming fd 3 is available
    echo "$i" # here, fd 3 is visible to the command
    exec 1>&3 3>&-
    done








    share|improve this answer















    share|improve this answer



    share|improve this answer








    edited May 18 at 14:47


























    answered May 18 at 14:32









    ilkkachu

    48.1k669132




    48.1k669132







    • 6




      Of course, you could also redirect the whole loop into x: for i in 1..1000; do echo $i; done >x. I'd say this is the most readable version.
      – Beat Bolli
      May 18 at 18:33










    • I've measured the difference between both used in every iteration for 1m iterations, and the difference is rather substantial (70-90%). See my answer: unix.stackexchange.com/a/444729/181255 The ratio is even higher with just 1k iterations (ca. 300%).
      – Tomasz
      May 19 at 0:03












    • 6




      Of course, you could also redirect the whole loop into x: for i in 1..1000; do echo $i; done >x. I'd say this is the most readable version.
      – Beat Bolli
      May 18 at 18:33










    • I've measured the difference between both used in every iteration for 1m iterations, and the difference is rather substantial (70-90%). See my answer: unix.stackexchange.com/a/444729/181255 The ratio is even higher with just 1k iterations (ca. 300%).
      – Tomasz
      May 19 at 0:03







    6




    6




    Of course, you could also redirect the whole loop into x: for i in 1..1000; do echo $i; done >x. I'd say this is the most readable version.
    – Beat Bolli
    May 18 at 18:33




    Of course, you could also redirect the whole loop into x: for i in 1..1000; do echo $i; done >x. I'd say this is the most readable version.
    – Beat Bolli
    May 18 at 18:33












    I've measured the difference between both used in every iteration for 1m iterations, and the difference is rather substantial (70-90%). See my answer: unix.stackexchange.com/a/444729/181255 The ratio is even higher with just 1k iterations (ca. 300%).
    – Tomasz
    May 19 at 0:03




    I've measured the difference between both used in every iteration for 1m iterations, and the difference is rather substantial (70-90%). See my answer: unix.stackexchange.com/a/444729/181255 The ratio is even higher with just 1k iterations (ca. 300%).
    – Tomasz
    May 19 at 0:03












    up vote
    7
    down vote













    Yes, it is more efficient



    Easiest way to test is to increase count to say 500000 and time it:



    > time bash s1.sh; time bash s2.sh
    bash s1.sh 16,47s user 10,00s system 99% cpu 26,537 total
    bash s2.sh 10,51s user 3,50s system 99% cpu 14,008 total


    strace(1) reveals why (we have one simple write, instead of open+5*fcntl+2*dup+2*close+write):



    for for i in 1..1000; do >>x echo "$i"; done we get:



    open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
    fcntl(1, F_GETFD) = 0
    fcntl(1, F_DUPFD, 10) = 10
    fcntl(1, F_GETFD) = 0
    fcntl(10, F_SETFD, FD_CLOEXEC) = 0
    dup2(3, 1) = 1
    close(3) = 0
    write(1, "997n", 4) = 4
    dup2(10, 1) = 1
    fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
    close(10) = 0
    open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
    fcntl(1, F_GETFD) = 0
    fcntl(1, F_DUPFD, 10) = 10
    fcntl(1, F_GETFD) = 0
    fcntl(10, F_SETFD, FD_CLOEXEC) = 0
    dup2(3, 1) = 1
    close(3) = 0
    write(1, "998n", 4) = 4
    dup2(10, 1) = 1
    fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
    close(10) = 0
    open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
    fcntl(1, F_GETFD) = 0
    fcntl(1, F_DUPFD, 10) = 10
    fcntl(1, F_GETFD) = 0
    fcntl(10, F_SETFD, FD_CLOEXEC) = 0
    dup2(3, 1) = 1
    close(3) = 0
    write(1, "999n", 4) = 4
    dup2(10, 1) = 1
    fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
    close(10) = 0
    open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
    fcntl(1, F_GETFD) = 0
    fcntl(1, F_DUPFD, 10) = 10
    fcntl(1, F_GETFD) = 0
    fcntl(10, F_SETFD, FD_CLOEXEC) = 0
    dup2(3, 1) = 1
    close(3) = 0
    write(1, "1000n", 5) = 5
    dup2(10, 1) = 1
    fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
    close(10) = 0


    while for exec 3>&1 1>x we get much cleaner



    write(1, "995n", 4) = 4
    write(1, "996n", 4) = 4
    write(1, "997n", 4) = 4
    write(1, "998n", 4) = 4
    write(1, "999n", 4) = 4
    write(1, "1000n", 5) = 5


    But note that the difference is not due to "using a FD", but because of place where you do redirection. For example, if you were to do for i in 1..1000; do echo "$i"; done > x you would get pretty much the same performance as your second example:



    bash s3.sh 10,35s user 3,70s system 100% cpu 14,042 total





    share|improve this answer



























      up vote
      7
      down vote













      Yes, it is more efficient



      Easiest way to test is to increase count to say 500000 and time it:



      > time bash s1.sh; time bash s2.sh
      bash s1.sh 16,47s user 10,00s system 99% cpu 26,537 total
      bash s2.sh 10,51s user 3,50s system 99% cpu 14,008 total


      strace(1) reveals why (we have one simple write, instead of open+5*fcntl+2*dup+2*close+write):



      for for i in 1..1000; do >>x echo "$i"; done we get:



      open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
      fcntl(1, F_GETFD) = 0
      fcntl(1, F_DUPFD, 10) = 10
      fcntl(1, F_GETFD) = 0
      fcntl(10, F_SETFD, FD_CLOEXEC) = 0
      dup2(3, 1) = 1
      close(3) = 0
      write(1, "997n", 4) = 4
      dup2(10, 1) = 1
      fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
      close(10) = 0
      open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
      fcntl(1, F_GETFD) = 0
      fcntl(1, F_DUPFD, 10) = 10
      fcntl(1, F_GETFD) = 0
      fcntl(10, F_SETFD, FD_CLOEXEC) = 0
      dup2(3, 1) = 1
      close(3) = 0
      write(1, "998n", 4) = 4
      dup2(10, 1) = 1
      fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
      close(10) = 0
      open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
      fcntl(1, F_GETFD) = 0
      fcntl(1, F_DUPFD, 10) = 10
      fcntl(1, F_GETFD) = 0
      fcntl(10, F_SETFD, FD_CLOEXEC) = 0
      dup2(3, 1) = 1
      close(3) = 0
      write(1, "999n", 4) = 4
      dup2(10, 1) = 1
      fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
      close(10) = 0
      open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
      fcntl(1, F_GETFD) = 0
      fcntl(1, F_DUPFD, 10) = 10
      fcntl(1, F_GETFD) = 0
      fcntl(10, F_SETFD, FD_CLOEXEC) = 0
      dup2(3, 1) = 1
      close(3) = 0
      write(1, "1000n", 5) = 5
      dup2(10, 1) = 1
      fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
      close(10) = 0


      while for exec 3>&1 1>x we get much cleaner



      write(1, "995n", 4) = 4
      write(1, "996n", 4) = 4
      write(1, "997n", 4) = 4
      write(1, "998n", 4) = 4
      write(1, "999n", 4) = 4
      write(1, "1000n", 5) = 5


      But note that the difference is not due to "using a FD", but because of place where you do redirection. For example, if you were to do for i in 1..1000; do echo "$i"; done > x you would get pretty much the same performance as your second example:



      bash s3.sh 10,35s user 3,70s system 100% cpu 14,042 total





      share|improve this answer

























        up vote
        7
        down vote










        up vote
        7
        down vote









        Yes, it is more efficient



        Easiest way to test is to increase count to say 500000 and time it:



        > time bash s1.sh; time bash s2.sh
        bash s1.sh 16,47s user 10,00s system 99% cpu 26,537 total
        bash s2.sh 10,51s user 3,50s system 99% cpu 14,008 total


        strace(1) reveals why (we have one simple write, instead of open+5*fcntl+2*dup+2*close+write):



        for for i in 1..1000; do >>x echo "$i"; done we get:



        open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
        fcntl(1, F_GETFD) = 0
        fcntl(1, F_DUPFD, 10) = 10
        fcntl(1, F_GETFD) = 0
        fcntl(10, F_SETFD, FD_CLOEXEC) = 0
        dup2(3, 1) = 1
        close(3) = 0
        write(1, "997n", 4) = 4
        dup2(10, 1) = 1
        fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
        close(10) = 0
        open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
        fcntl(1, F_GETFD) = 0
        fcntl(1, F_DUPFD, 10) = 10
        fcntl(1, F_GETFD) = 0
        fcntl(10, F_SETFD, FD_CLOEXEC) = 0
        dup2(3, 1) = 1
        close(3) = 0
        write(1, "998n", 4) = 4
        dup2(10, 1) = 1
        fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
        close(10) = 0
        open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
        fcntl(1, F_GETFD) = 0
        fcntl(1, F_DUPFD, 10) = 10
        fcntl(1, F_GETFD) = 0
        fcntl(10, F_SETFD, FD_CLOEXEC) = 0
        dup2(3, 1) = 1
        close(3) = 0
        write(1, "999n", 4) = 4
        dup2(10, 1) = 1
        fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
        close(10) = 0
        open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
        fcntl(1, F_GETFD) = 0
        fcntl(1, F_DUPFD, 10) = 10
        fcntl(1, F_GETFD) = 0
        fcntl(10, F_SETFD, FD_CLOEXEC) = 0
        dup2(3, 1) = 1
        close(3) = 0
        write(1, "1000n", 5) = 5
        dup2(10, 1) = 1
        fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
        close(10) = 0


        while for exec 3>&1 1>x we get much cleaner



        write(1, "995n", 4) = 4
        write(1, "996n", 4) = 4
        write(1, "997n", 4) = 4
        write(1, "998n", 4) = 4
        write(1, "999n", 4) = 4
        write(1, "1000n", 5) = 5


        But note that the difference is not due to "using a FD", but because of place where you do redirection. For example, if you were to do for i in 1..1000; do echo "$i"; done > x you would get pretty much the same performance as your second example:



        bash s3.sh 10,35s user 3,70s system 100% cpu 14,042 total





        share|improve this answer















        Yes, it is more efficient



        Easiest way to test is to increase count to say 500000 and time it:



        > time bash s1.sh; time bash s2.sh
        bash s1.sh 16,47s user 10,00s system 99% cpu 26,537 total
        bash s2.sh 10,51s user 3,50s system 99% cpu 14,008 total


        strace(1) reveals why (we have one simple write, instead of open+5*fcntl+2*dup+2*close+write):



        for for i in 1..1000; do >>x echo "$i"; done we get:



        open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
        fcntl(1, F_GETFD) = 0
        fcntl(1, F_DUPFD, 10) = 10
        fcntl(1, F_GETFD) = 0
        fcntl(10, F_SETFD, FD_CLOEXEC) = 0
        dup2(3, 1) = 1
        close(3) = 0
        write(1, "997n", 4) = 4
        dup2(10, 1) = 1
        fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
        close(10) = 0
        open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
        fcntl(1, F_GETFD) = 0
        fcntl(1, F_DUPFD, 10) = 10
        fcntl(1, F_GETFD) = 0
        fcntl(10, F_SETFD, FD_CLOEXEC) = 0
        dup2(3, 1) = 1
        close(3) = 0
        write(1, "998n", 4) = 4
        dup2(10, 1) = 1
        fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
        close(10) = 0
        open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
        fcntl(1, F_GETFD) = 0
        fcntl(1, F_DUPFD, 10) = 10
        fcntl(1, F_GETFD) = 0
        fcntl(10, F_SETFD, FD_CLOEXEC) = 0
        dup2(3, 1) = 1
        close(3) = 0
        write(1, "999n", 4) = 4
        dup2(10, 1) = 1
        fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
        close(10) = 0
        open("x", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
        fcntl(1, F_GETFD) = 0
        fcntl(1, F_DUPFD, 10) = 10
        fcntl(1, F_GETFD) = 0
        fcntl(10, F_SETFD, FD_CLOEXEC) = 0
        dup2(3, 1) = 1
        close(3) = 0
        write(1, "1000n", 5) = 5
        dup2(10, 1) = 1
        fcntl(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
        close(10) = 0


        while for exec 3>&1 1>x we get much cleaner



        write(1, "995n", 4) = 4
        write(1, "996n", 4) = 4
        write(1, "997n", 4) = 4
        write(1, "998n", 4) = 4
        write(1, "999n", 4) = 4
        write(1, "1000n", 5) = 5


        But note that the difference is not due to "using a FD", but because of place where you do redirection. For example, if you were to do for i in 1..1000; do echo "$i"; done > x you would get pretty much the same performance as your second example:



        bash s3.sh 10,35s user 3,70s system 100% cpu 14,042 total






        share|improve this answer















        share|improve this answer



        share|improve this answer








        edited May 18 at 16:54


























        answered May 18 at 16:47









        Matija Nalis

        2,282617




        2,282617




















            up vote
            0
            down vote













            To sum things up and add a new bit of information in this thread, here's a comparison of four ways to do it, ordered by efficiency. I estimate the efficiency by time measurement (user + sys) for 1 million iterations, based on two test series.



            1. These two are about the same:

              • Simple > loop redirection (time: 100%)

              • Using exec once for the whole loop (time: ~100%)


            2. Using >> for each iteration (time: 200% - 250%)

            3. Using exec for each iteration (time: 340% - 480%)

            The conclusion's this:



            There's a small difference between using exec vs. simple redirections like >>. (Simple is cheaper). It doesn't show on the single command execution level, but with a high number of repetitions, the difference becomes visible. Though the execution weight of the command redirected to shadows the differences, as noticed by ikkachu in the other answer.






            share|improve this answer





















            • The relative costs depend on the OS and hardware, and the cost of what's in the loop. (e.g. on x86, Spectre + Meltdown mitigation significantly increased the cost of system calls, making redirection inside the loop worse). If you literally want to generate a sequence of integers, seq 1000, or maybe printf "%sn" 1..1000 is going to be faster than a bash loop. (seq 1000000 > /dev/null runs in ~0.03s, time printf "%sn" 1..1000000 > /dev/null in ~0.7s, time for i in 1..1000000; do echo "$i"; done > /dev/null in 2.1s, on a 3.9GHz Skylake i7-6700k, Linux 4.15.8-1-ARCH.
              – Peter Cordes
              May 19 at 3:50










            • Now I'm curious where the break-even point is for builtin printf vs. fork+exec of seq. I guess I could wrap a repeat loop around the whole thing and time that. And BTW, time awk 'BEGIN for(i=1; i<1000000 ; i++) print i' > /dev/null is about 3x faster than printf "%sn" 1..1000000, so even if you don't have seq, a good AWK implementation can beat builtins for large problem sizes.
              – Peter Cordes
              May 19 at 3:55














            up vote
            0
            down vote













            To sum things up and add a new bit of information in this thread, here's a comparison of four ways to do it, ordered by efficiency. I estimate the efficiency by time measurement (user + sys) for 1 million iterations, based on two test series.



            1. These two are about the same:

              • Simple > loop redirection (time: 100%)

              • Using exec once for the whole loop (time: ~100%)


            2. Using >> for each iteration (time: 200% - 250%)

            3. Using exec for each iteration (time: 340% - 480%)

            The conclusion's this:



            There's a small difference between using exec vs. simple redirections like >>. (Simple is cheaper). It doesn't show on the single command execution level, but with a high number of repetitions, the difference becomes visible. Though the execution weight of the command redirected to shadows the differences, as noticed by ikkachu in the other answer.






            share|improve this answer





















            • The relative costs depend on the OS and hardware, and the cost of what's in the loop. (e.g. on x86, Spectre + Meltdown mitigation significantly increased the cost of system calls, making redirection inside the loop worse). If you literally want to generate a sequence of integers, seq 1000, or maybe printf "%sn" 1..1000 is going to be faster than a bash loop. (seq 1000000 > /dev/null runs in ~0.03s, time printf "%sn" 1..1000000 > /dev/null in ~0.7s, time for i in 1..1000000; do echo "$i"; done > /dev/null in 2.1s, on a 3.9GHz Skylake i7-6700k, Linux 4.15.8-1-ARCH.
              – Peter Cordes
              May 19 at 3:50










            • Now I'm curious where the break-even point is for builtin printf vs. fork+exec of seq. I guess I could wrap a repeat loop around the whole thing and time that. And BTW, time awk 'BEGIN for(i=1; i<1000000 ; i++) print i' > /dev/null is about 3x faster than printf "%sn" 1..1000000, so even if you don't have seq, a good AWK implementation can beat builtins for large problem sizes.
              – Peter Cordes
              May 19 at 3:55












            up vote
            0
            down vote










            up vote
            0
            down vote









            To sum things up and add a new bit of information in this thread, here's a comparison of four ways to do it, ordered by efficiency. I estimate the efficiency by time measurement (user + sys) for 1 million iterations, based on two test series.



            1. These two are about the same:

              • Simple > loop redirection (time: 100%)

              • Using exec once for the whole loop (time: ~100%)


            2. Using >> for each iteration (time: 200% - 250%)

            3. Using exec for each iteration (time: 340% - 480%)

            The conclusion's this:



            There's a small difference between using exec vs. simple redirections like >>. (Simple is cheaper). It doesn't show on the single command execution level, but with a high number of repetitions, the difference becomes visible. Though the execution weight of the command redirected to shadows the differences, as noticed by ikkachu in the other answer.






            share|improve this answer













            To sum things up and add a new bit of information in this thread, here's a comparison of four ways to do it, ordered by efficiency. I estimate the efficiency by time measurement (user + sys) for 1 million iterations, based on two test series.



            1. These two are about the same:

              • Simple > loop redirection (time: 100%)

              • Using exec once for the whole loop (time: ~100%)


            2. Using >> for each iteration (time: 200% - 250%)

            3. Using exec for each iteration (time: 340% - 480%)

            The conclusion's this:



            There's a small difference between using exec vs. simple redirections like >>. (Simple is cheaper). It doesn't show on the single command execution level, but with a high number of repetitions, the difference becomes visible. Though the execution weight of the command redirected to shadows the differences, as noticed by ikkachu in the other answer.







            share|improve this answer













            share|improve this answer



            share|improve this answer











            answered May 18 at 23:55









            Tomasz

            8,03052560




            8,03052560











            • The relative costs depend on the OS and hardware, and the cost of what's in the loop. (e.g. on x86, Spectre + Meltdown mitigation significantly increased the cost of system calls, making redirection inside the loop worse). If you literally want to generate a sequence of integers, seq 1000, or maybe printf "%sn" 1..1000 is going to be faster than a bash loop. (seq 1000000 > /dev/null runs in ~0.03s, time printf "%sn" 1..1000000 > /dev/null in ~0.7s, time for i in 1..1000000; do echo "$i"; done > /dev/null in 2.1s, on a 3.9GHz Skylake i7-6700k, Linux 4.15.8-1-ARCH.
              – Peter Cordes
              May 19 at 3:50










            • Now I'm curious where the break-even point is for builtin printf vs. fork+exec of seq. I guess I could wrap a repeat loop around the whole thing and time that. And BTW, time awk 'BEGIN for(i=1; i<1000000 ; i++) print i' > /dev/null is about 3x faster than printf "%sn" 1..1000000, so even if you don't have seq, a good AWK implementation can beat builtins for large problem sizes.
              – Peter Cordes
              May 19 at 3:55
















            • The relative costs depend on the OS and hardware, and the cost of what's in the loop. (e.g. on x86, Spectre + Meltdown mitigation significantly increased the cost of system calls, making redirection inside the loop worse). If you literally want to generate a sequence of integers, seq 1000, or maybe printf "%sn" 1..1000 is going to be faster than a bash loop. (seq 1000000 > /dev/null runs in ~0.03s, time printf "%sn" 1..1000000 > /dev/null in ~0.7s, time for i in 1..1000000; do echo "$i"; done > /dev/null in 2.1s, on a 3.9GHz Skylake i7-6700k, Linux 4.15.8-1-ARCH.
              – Peter Cordes
              May 19 at 3:50










            • Now I'm curious where the break-even point is for builtin printf vs. fork+exec of seq. I guess I could wrap a repeat loop around the whole thing and time that. And BTW, time awk 'BEGIN for(i=1; i<1000000 ; i++) print i' > /dev/null is about 3x faster than printf "%sn" 1..1000000, so even if you don't have seq, a good AWK implementation can beat builtins for large problem sizes.
              – Peter Cordes
              May 19 at 3:55















            The relative costs depend on the OS and hardware, and the cost of what's in the loop. (e.g. on x86, Spectre + Meltdown mitigation significantly increased the cost of system calls, making redirection inside the loop worse). If you literally want to generate a sequence of integers, seq 1000, or maybe printf "%sn" 1..1000 is going to be faster than a bash loop. (seq 1000000 > /dev/null runs in ~0.03s, time printf "%sn" 1..1000000 > /dev/null in ~0.7s, time for i in 1..1000000; do echo "$i"; done > /dev/null in 2.1s, on a 3.9GHz Skylake i7-6700k, Linux 4.15.8-1-ARCH.
            – Peter Cordes
            May 19 at 3:50




            The relative costs depend on the OS and hardware, and the cost of what's in the loop. (e.g. on x86, Spectre + Meltdown mitigation significantly increased the cost of system calls, making redirection inside the loop worse). If you literally want to generate a sequence of integers, seq 1000, or maybe printf "%sn" 1..1000 is going to be faster than a bash loop. (seq 1000000 > /dev/null runs in ~0.03s, time printf "%sn" 1..1000000 > /dev/null in ~0.7s, time for i in 1..1000000; do echo "$i"; done > /dev/null in 2.1s, on a 3.9GHz Skylake i7-6700k, Linux 4.15.8-1-ARCH.
            – Peter Cordes
            May 19 at 3:50












            Now I'm curious where the break-even point is for builtin printf vs. fork+exec of seq. I guess I could wrap a repeat loop around the whole thing and time that. And BTW, time awk 'BEGIN for(i=1; i<1000000 ; i++) print i' > /dev/null is about 3x faster than printf "%sn" 1..1000000, so even if you don't have seq, a good AWK implementation can beat builtins for large problem sizes.
            – Peter Cordes
            May 19 at 3:55




            Now I'm curious where the break-even point is for builtin printf vs. fork+exec of seq. I guess I could wrap a repeat loop around the whole thing and time that. And BTW, time awk 'BEGIN for(i=1; i<1000000 ; i++) print i' > /dev/null is about 3x faster than printf "%sn" 1..1000000, so even if you don't have seq, a good AWK implementation can beat builtins for large problem sizes.
            – Peter Cordes
            May 19 at 3:55












             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f444624%2fdo-file-descriptors-optimise-writing-to-files%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Peggy Mitchell

            Palaiologos

            The Forum (Inglewood, California)