Race Condition for Shell Blocks in Bash?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite













Update: This behavior is observed on Windows Subsystem for Linux. It seems there are two issues we are dealing with here:



  1. Some bug/race condition internal to the system. This is incorrect, see answers.


  2. Default buffer size for head.


For (2), as @kusalanda mentioned, head may have some default buffer size that consumes the input up to a certain point. On ArchLinux, we can see that for i < 10, we consistently see no output from tail. The same is true for Windows Subsystem for Linux (i.e. no inconsistent output for tail).
For (1), it is possible that there is some bug internal to the Windows Subsystem for Linux itself that causes this race condition, as we do not observe such behavior in ArchLinux. This is incorrect, see answers. There is a "point 1", but it is different.




I am trying to run the following commands in bash version 4.4.19:



 for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 


Sometimes, I see the expected results:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
999
$ ~


However, more often than not, I see the following:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
$ ~


I suspect this is a race condition. However, if I add a sleep at the beginning of the second block of commands, the "race condition" still happens:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | sleep 10; head -n 1; echo ...; tail -n 1; 
0
...
$ ~


Is this actually a race condition? What should I do to make the second block of code see the whole input? Note that if I use 10000 instead of 1000, then I do not see this issue (it is possible that these all just happen to be lucky cases though):



$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~









share|improve this question



















  • 2




    What Unix are you on? On some Unices, the head implementation buffers too much of the input data and leaves nothing for tail to work on (this is an error). This ought to be deterministic though.
    – Kusalananda
    Nov 25 at 8:35










  • @Kusalananda I see this with i < 1000 on Windows Subsystem for Linux, and i < 10 for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.
    – nehcsivart
    Nov 25 at 9:12










  • WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
    – Rui F Ribeiro
    Nov 25 at 9:39







  • 1




    @Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline like foo | bar, how much a read(2) inside bar will return is dependent not only on how much foo is writing into the pipe, but also on how the kernel schedules the foo and bar processes to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes in foo may result in a single read in bar.
    – mosvy
    Nov 25 at 20:53















up vote
0
down vote

favorite













Update: This behavior is observed on Windows Subsystem for Linux. It seems there are two issues we are dealing with here:



  1. Some bug/race condition internal to the system. This is incorrect, see answers.


  2. Default buffer size for head.


For (2), as @kusalanda mentioned, head may have some default buffer size that consumes the input up to a certain point. On ArchLinux, we can see that for i < 10, we consistently see no output from tail. The same is true for Windows Subsystem for Linux (i.e. no inconsistent output for tail).
For (1), it is possible that there is some bug internal to the Windows Subsystem for Linux itself that causes this race condition, as we do not observe such behavior in ArchLinux. This is incorrect, see answers. There is a "point 1", but it is different.




I am trying to run the following commands in bash version 4.4.19:



 for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 


Sometimes, I see the expected results:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
999
$ ~


However, more often than not, I see the following:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
$ ~


I suspect this is a race condition. However, if I add a sleep at the beginning of the second block of commands, the "race condition" still happens:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | sleep 10; head -n 1; echo ...; tail -n 1; 
0
...
$ ~


Is this actually a race condition? What should I do to make the second block of code see the whole input? Note that if I use 10000 instead of 1000, then I do not see this issue (it is possible that these all just happen to be lucky cases though):



$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~









share|improve this question



















  • 2




    What Unix are you on? On some Unices, the head implementation buffers too much of the input data and leaves nothing for tail to work on (this is an error). This ought to be deterministic though.
    – Kusalananda
    Nov 25 at 8:35










  • @Kusalananda I see this with i < 1000 on Windows Subsystem for Linux, and i < 10 for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.
    – nehcsivart
    Nov 25 at 9:12










  • WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
    – Rui F Ribeiro
    Nov 25 at 9:39







  • 1




    @Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline like foo | bar, how much a read(2) inside bar will return is dependent not only on how much foo is writing into the pipe, but also on how the kernel schedules the foo and bar processes to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes in foo may result in a single read in bar.
    – mosvy
    Nov 25 at 20:53













up vote
0
down vote

favorite









up vote
0
down vote

favorite












Update: This behavior is observed on Windows Subsystem for Linux. It seems there are two issues we are dealing with here:



  1. Some bug/race condition internal to the system. This is incorrect, see answers.


  2. Default buffer size for head.


For (2), as @kusalanda mentioned, head may have some default buffer size that consumes the input up to a certain point. On ArchLinux, we can see that for i < 10, we consistently see no output from tail. The same is true for Windows Subsystem for Linux (i.e. no inconsistent output for tail).
For (1), it is possible that there is some bug internal to the Windows Subsystem for Linux itself that causes this race condition, as we do not observe such behavior in ArchLinux. This is incorrect, see answers. There is a "point 1", but it is different.




I am trying to run the following commands in bash version 4.4.19:



 for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 


Sometimes, I see the expected results:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
999
$ ~


However, more often than not, I see the following:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
$ ~


I suspect this is a race condition. However, if I add a sleep at the beginning of the second block of commands, the "race condition" still happens:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | sleep 10; head -n 1; echo ...; tail -n 1; 
0
...
$ ~


Is this actually a race condition? What should I do to make the second block of code see the whole input? Note that if I use 10000 instead of 1000, then I do not see this issue (it is possible that these all just happen to be lucky cases though):



$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~









share|improve this question
















Update: This behavior is observed on Windows Subsystem for Linux. It seems there are two issues we are dealing with here:



  1. Some bug/race condition internal to the system. This is incorrect, see answers.


  2. Default buffer size for head.


For (2), as @kusalanda mentioned, head may have some default buffer size that consumes the input up to a certain point. On ArchLinux, we can see that for i < 10, we consistently see no output from tail. The same is true for Windows Subsystem for Linux (i.e. no inconsistent output for tail).
For (1), it is possible that there is some bug internal to the Windows Subsystem for Linux itself that causes this race condition, as we do not observe such behavior in ArchLinux. This is incorrect, see answers. There is a "point 1", but it is different.




I am trying to run the following commands in bash version 4.4.19:



 for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 


Sometimes, I see the expected results:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
999
$ ~


However, more often than not, I see the following:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
$ ~


I suspect this is a race condition. However, if I add a sleep at the beginning of the second block of commands, the "race condition" still happens:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | sleep 10; head -n 1; echo ...; tail -n 1; 
0
...
$ ~


Is this actually a race condition? What should I do to make the second block of code see the whole input? Note that if I use 10000 instead of 1000, then I do not see this issue (it is possible that these all just happen to be lucky cases though):



$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1; 
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~






bash shell-script






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 28 at 10:01

























asked Nov 25 at 8:20









nehcsivart

333310




333310







  • 2




    What Unix are you on? On some Unices, the head implementation buffers too much of the input data and leaves nothing for tail to work on (this is an error). This ought to be deterministic though.
    – Kusalananda
    Nov 25 at 8:35










  • @Kusalananda I see this with i < 1000 on Windows Subsystem for Linux, and i < 10 for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.
    – nehcsivart
    Nov 25 at 9:12










  • WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
    – Rui F Ribeiro
    Nov 25 at 9:39







  • 1




    @Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline like foo | bar, how much a read(2) inside bar will return is dependent not only on how much foo is writing into the pipe, but also on how the kernel schedules the foo and bar processes to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes in foo may result in a single read in bar.
    – mosvy
    Nov 25 at 20:53













  • 2




    What Unix are you on? On some Unices, the head implementation buffers too much of the input data and leaves nothing for tail to work on (this is an error). This ought to be deterministic though.
    – Kusalananda
    Nov 25 at 8:35










  • @Kusalananda I see this with i < 1000 on Windows Subsystem for Linux, and i < 10 for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.
    – nehcsivart
    Nov 25 at 9:12










  • WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
    – Rui F Ribeiro
    Nov 25 at 9:39







  • 1




    @Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline like foo | bar, how much a read(2) inside bar will return is dependent not only on how much foo is writing into the pipe, but also on how the kernel schedules the foo and bar processes to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes in foo may result in a single read in bar.
    – mosvy
    Nov 25 at 20:53








2




2




What Unix are you on? On some Unices, the head implementation buffers too much of the input data and leaves nothing for tail to work on (this is an error). This ought to be deterministic though.
– Kusalananda
Nov 25 at 8:35




What Unix are you on? On some Unices, the head implementation buffers too much of the input data and leaves nothing for tail to work on (this is an error). This ought to be deterministic though.
– Kusalananda
Nov 25 at 8:35












@Kusalananda I see this with i < 1000 on Windows Subsystem for Linux, and i < 10 for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.
– nehcsivart
Nov 25 at 9:12




@Kusalananda I see this with i < 1000 on Windows Subsystem for Linux, and i < 10 for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.
– nehcsivart
Nov 25 at 9:12












WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
– Rui F Ribeiro
Nov 25 at 9:39





WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
– Rui F Ribeiro
Nov 25 at 9:39





1




1




@Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline like foo | bar, how much a read(2) inside bar will return is dependent not only on how much foo is writing into the pipe, but also on how the kernel schedules the foo and bar processes to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes in foo may result in a single read in bar.
– mosvy
Nov 25 at 20:53





@Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline like foo | bar, how much a read(2) inside bar will return is dependent not only on how much foo is writing into the pipe, but also on how the kernel schedules the foo and bar processes to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes in foo may result in a single read in bar.
– mosvy
Nov 25 at 20:53











2 Answers
2






active

oldest

votes

















up vote
1
down vote













This is no race condition and no bug in WSL or ArchLinux.



As you mention, it's because head is reading more than it "should", and so it may not leave enough or anything at all for tail to work on. But there is nothing in the standard or elsewhere which says that head should only read a certain amount of bytes; it could just as well read the whole file and then discard everything but its first line.



In order to "fix" that in all possible cases, head would have to always read its input byte by byte (ie do a system call for each byte) and that would be horrendously inefficient, and absolutely useless in 99.999% of cases.



If you want to avoid that, you can



1) use a temporary file instead of a pipe; then



 head -n <tmpfile; tail -n <tmpfile; 


will work as expected.



2) reimplement your head/tail combination with something else, eg. in awk:



$ seq 10000 20000 | awk -vH=2 -vT=3 'if(NR<=H)print; else a[i++%T]=$0ENDif((j=i-T)>0)print "..."; else j=0; while(j<i)print a[j++%T]'
10000
10001
...
19998
19999
20000





share|improve this answer






















  • I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case of foo | bar where foo and bar executes at different schedules each time we run it, wouldn't the output of bar still be the same, provided bar itself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case of head and tail, would the output not always be the same given the same input, albeit with different buffered reads?
    – nehcsivart
    Nov 27 at 3:23






  • 1




    Keep in mind that if bar does eg. a read(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how much foo was able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions of foo, bar and all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
    – mosvy
    Nov 27 at 3:52







  • 1




    @nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tiny sleep into each iteration - and see that it behaves consistently then. It's no more correct either way, though, on head's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
    – Michael Homer
    Nov 27 at 5:44










  • @MichaelHomer @mosvy Thanks, I think I understand now. Long story short, suppose head will consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
    – nehcsivart
    Nov 28 at 9:54

















up vote
1
down vote













Note: If any information is incorrect, please comment so I can fix or delete.



As @mosvy and @MichaelHomer mentioned in the comments, this is due to the scheduler scheduling each side of the pipe differently, and at different times. To be clear, we are answering why the following has inconsistent output:



 for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; tail -n 1; 


With output like:



0
...


and:



0
...
999


Two key points are at play here. The short answer is that because the input into the right side of the pipe is not always all available at once (point 1), head will "consume" different amounts. If the whole input is available (meaning the left side finished first), then the whole input will be consumed due to the implementation of head as explained by @Kusalananda and @mosvy (point 2).



We will first show point 1. The easiest way to show this is to replace tail with head:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1; 
0
...
878
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
820
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
796


As we can see, the output of the second head is different each time. This shows that the input from the left side is not always available all at once (point 1).



For each of the case in which there is a number after ..., we will get an output of 999 if we used tail instead. For the case in which nothing came after ..., we will see the same for tail. To prove this, we will show point 2.



Although there is nothing we can really do about point 1, we can make it more stable by writing it to a file:



$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done >input


With the file, we will read it through a pipe (see below for redirection case):



$ ~ cat input | head -n 1; echo ...; tail -n 1; 
0
...


And indeed, head consumes everything, leaving nothing for tail. As such, we have point 2. So with point 1 and point 2, we can explain the inconsistent behavior:




In my version of head, at least 1000 lines will be consumed at a time if read through a pipe, and at least 1000 lines are available (the whole thing if less). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail, thus leaving an output.




Redirection



So in the above example, we used a pipe to provide the result. The reasoning is that if we used redirection, we will end up with the following result:



$ ~ head -n 1; echo ...; tail -n 1; <input
0
...
999


Which is different from the explanation above. The reasoning is that when used this way, it seems head only reads 1 line:



$ ~ head -n 1; echo ...; head -n 1; <input
0
...
1


The way to explain this is to reference the answer here. In short:




  • pipes are not lseek()'able so commands can't read some data and then rewind back, but when you redirect with > or < usually it's a file which is lseek() able object, so commands can navigate however they please.



In other words, head need not consume everything if it is able to seek the file directly. It only need to read as much as it need. Once it finds a newline, it can put everything back. We can prove this by using a file with 1 byte after a newline:



$ ~ cat input
0123456789
1
$ ~ head -n 1; head -c 1; <input
0123456789
1$ ~


If we were to use a pipe, the whole input is consumed, with nothing left for the second head:



$ ~ cat input | head -n 1; head -c 1; 
0123456789
$ ~


As a side note, if we used process substitution (which results in a non-seekable read as I understand it), we will get the same result:



$ ~ head -n 1; head -c 1; < <(cat input)
0123456789
$ ~





share|improve this answer






















    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f484000%2frace-condition-for-shell-blocks-in-bash%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    This is no race condition and no bug in WSL or ArchLinux.



    As you mention, it's because head is reading more than it "should", and so it may not leave enough or anything at all for tail to work on. But there is nothing in the standard or elsewhere which says that head should only read a certain amount of bytes; it could just as well read the whole file and then discard everything but its first line.



    In order to "fix" that in all possible cases, head would have to always read its input byte by byte (ie do a system call for each byte) and that would be horrendously inefficient, and absolutely useless in 99.999% of cases.



    If you want to avoid that, you can



    1) use a temporary file instead of a pipe; then



     head -n <tmpfile; tail -n <tmpfile; 


    will work as expected.



    2) reimplement your head/tail combination with something else, eg. in awk:



    $ seq 10000 20000 | awk -vH=2 -vT=3 'if(NR<=H)print; else a[i++%T]=$0ENDif((j=i-T)>0)print "..."; else j=0; while(j<i)print a[j++%T]'
    10000
    10001
    ...
    19998
    19999
    20000





    share|improve this answer






















    • I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case of foo | bar where foo and bar executes at different schedules each time we run it, wouldn't the output of bar still be the same, provided bar itself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case of head and tail, would the output not always be the same given the same input, albeit with different buffered reads?
      – nehcsivart
      Nov 27 at 3:23






    • 1




      Keep in mind that if bar does eg. a read(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how much foo was able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions of foo, bar and all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
      – mosvy
      Nov 27 at 3:52







    • 1




      @nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tiny sleep into each iteration - and see that it behaves consistently then. It's no more correct either way, though, on head's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
      – Michael Homer
      Nov 27 at 5:44










    • @MichaelHomer @mosvy Thanks, I think I understand now. Long story short, suppose head will consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
      – nehcsivart
      Nov 28 at 9:54














    up vote
    1
    down vote













    This is no race condition and no bug in WSL or ArchLinux.



    As you mention, it's because head is reading more than it "should", and so it may not leave enough or anything at all for tail to work on. But there is nothing in the standard or elsewhere which says that head should only read a certain amount of bytes; it could just as well read the whole file and then discard everything but its first line.



    In order to "fix" that in all possible cases, head would have to always read its input byte by byte (ie do a system call for each byte) and that would be horrendously inefficient, and absolutely useless in 99.999% of cases.



    If you want to avoid that, you can



    1) use a temporary file instead of a pipe; then



     head -n <tmpfile; tail -n <tmpfile; 


    will work as expected.



    2) reimplement your head/tail combination with something else, eg. in awk:



    $ seq 10000 20000 | awk -vH=2 -vT=3 'if(NR<=H)print; else a[i++%T]=$0ENDif((j=i-T)>0)print "..."; else j=0; while(j<i)print a[j++%T]'
    10000
    10001
    ...
    19998
    19999
    20000





    share|improve this answer






















    • I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case of foo | bar where foo and bar executes at different schedules each time we run it, wouldn't the output of bar still be the same, provided bar itself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case of head and tail, would the output not always be the same given the same input, albeit with different buffered reads?
      – nehcsivart
      Nov 27 at 3:23






    • 1




      Keep in mind that if bar does eg. a read(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how much foo was able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions of foo, bar and all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
      – mosvy
      Nov 27 at 3:52







    • 1




      @nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tiny sleep into each iteration - and see that it behaves consistently then. It's no more correct either way, though, on head's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
      – Michael Homer
      Nov 27 at 5:44










    • @MichaelHomer @mosvy Thanks, I think I understand now. Long story short, suppose head will consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
      – nehcsivart
      Nov 28 at 9:54












    up vote
    1
    down vote










    up vote
    1
    down vote









    This is no race condition and no bug in WSL or ArchLinux.



    As you mention, it's because head is reading more than it "should", and so it may not leave enough or anything at all for tail to work on. But there is nothing in the standard or elsewhere which says that head should only read a certain amount of bytes; it could just as well read the whole file and then discard everything but its first line.



    In order to "fix" that in all possible cases, head would have to always read its input byte by byte (ie do a system call for each byte) and that would be horrendously inefficient, and absolutely useless in 99.999% of cases.



    If you want to avoid that, you can



    1) use a temporary file instead of a pipe; then



     head -n <tmpfile; tail -n <tmpfile; 


    will work as expected.



    2) reimplement your head/tail combination with something else, eg. in awk:



    $ seq 10000 20000 | awk -vH=2 -vT=3 'if(NR<=H)print; else a[i++%T]=$0ENDif((j=i-T)>0)print "..."; else j=0; while(j<i)print a[j++%T]'
    10000
    10001
    ...
    19998
    19999
    20000





    share|improve this answer














    This is no race condition and no bug in WSL or ArchLinux.



    As you mention, it's because head is reading more than it "should", and so it may not leave enough or anything at all for tail to work on. But there is nothing in the standard or elsewhere which says that head should only read a certain amount of bytes; it could just as well read the whole file and then discard everything but its first line.



    In order to "fix" that in all possible cases, head would have to always read its input byte by byte (ie do a system call for each byte) and that would be horrendously inefficient, and absolutely useless in 99.999% of cases.



    If you want to avoid that, you can



    1) use a temporary file instead of a pipe; then



     head -n <tmpfile; tail -n <tmpfile; 


    will work as expected.



    2) reimplement your head/tail combination with something else, eg. in awk:



    $ seq 10000 20000 | awk -vH=2 -vT=3 'if(NR<=H)print; else a[i++%T]=$0ENDif((j=i-T)>0)print "..."; else j=0; while(j<i)print a[j++%T]'
    10000
    10001
    ...
    19998
    19999
    20000






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 26 at 1:20

























    answered Nov 25 at 20:28









    mosvy

    5,046323




    5,046323











    • I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case of foo | bar where foo and bar executes at different schedules each time we run it, wouldn't the output of bar still be the same, provided bar itself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case of head and tail, would the output not always be the same given the same input, albeit with different buffered reads?
      – nehcsivart
      Nov 27 at 3:23






    • 1




      Keep in mind that if bar does eg. a read(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how much foo was able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions of foo, bar and all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
      – mosvy
      Nov 27 at 3:52







    • 1




      @nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tiny sleep into each iteration - and see that it behaves consistently then. It's no more correct either way, though, on head's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
      – Michael Homer
      Nov 27 at 5:44










    • @MichaelHomer @mosvy Thanks, I think I understand now. Long story short, suppose head will consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
      – nehcsivart
      Nov 28 at 9:54
















    • I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case of foo | bar where foo and bar executes at different schedules each time we run it, wouldn't the output of bar still be the same, provided bar itself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case of head and tail, would the output not always be the same given the same input, albeit with different buffered reads?
      – nehcsivart
      Nov 27 at 3:23






    • 1




      Keep in mind that if bar does eg. a read(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how much foo was able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions of foo, bar and all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
      – mosvy
      Nov 27 at 3:52







    • 1




      @nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tiny sleep into each iteration - and see that it behaves consistently then. It's no more correct either way, though, on head's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
      – Michael Homer
      Nov 27 at 5:44










    • @MichaelHomer @mosvy Thanks, I think I understand now. Long story short, suppose head will consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
      – nehcsivart
      Nov 28 at 9:54















    I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case of foo | bar where foo and bar executes at different schedules each time we run it, wouldn't the output of bar still be the same, provided bar itself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case of head and tail, would the output not always be the same given the same input, albeit with different buffered reads?
    – nehcsivart
    Nov 27 at 3:23




    I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case of foo | bar where foo and bar executes at different schedules each time we run it, wouldn't the output of bar still be the same, provided bar itself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case of head and tail, would the output not always be the same given the same input, albeit with different buffered reads?
    – nehcsivart
    Nov 27 at 3:23




    1




    1




    Keep in mind that if bar does eg. a read(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how much foo was able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions of foo, bar and all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
    – mosvy
    Nov 27 at 3:52





    Keep in mind that if bar does eg. a read(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how much foo was able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions of foo, bar and all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
    – mosvy
    Nov 27 at 3:52





    1




    1




    @nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tiny sleep into each iteration - and see that it behaves consistently then. It's no more correct either way, though, on head's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
    – Michael Homer
    Nov 27 at 5:44




    @nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tiny sleep into each iteration - and see that it behaves consistently then. It's no more correct either way, though, on head's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
    – Michael Homer
    Nov 27 at 5:44












    @MichaelHomer @mosvy Thanks, I think I understand now. Long story short, suppose head will consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
    – nehcsivart
    Nov 28 at 9:54




    @MichaelHomer @mosvy Thanks, I think I understand now. Long story short, suppose head will consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
    – nehcsivart
    Nov 28 at 9:54












    up vote
    1
    down vote













    Note: If any information is incorrect, please comment so I can fix or delete.



    As @mosvy and @MichaelHomer mentioned in the comments, this is due to the scheduler scheduling each side of the pipe differently, and at different times. To be clear, we are answering why the following has inconsistent output:



     for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; tail -n 1; 


    With output like:



    0
    ...


    and:



    0
    ...
    999


    Two key points are at play here. The short answer is that because the input into the right side of the pipe is not always all available at once (point 1), head will "consume" different amounts. If the whole input is available (meaning the left side finished first), then the whole input will be consumed due to the implementation of head as explained by @Kusalananda and @mosvy (point 2).



    We will first show point 1. The easiest way to show this is to replace tail with head:



    $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1; 
    0
    ...
    878
    $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
    0
    ...
    820
    $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
    0
    ...
    $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
    0
    ...
    796


    As we can see, the output of the second head is different each time. This shows that the input from the left side is not always available all at once (point 1).



    For each of the case in which there is a number after ..., we will get an output of 999 if we used tail instead. For the case in which nothing came after ..., we will see the same for tail. To prove this, we will show point 2.



    Although there is nothing we can really do about point 1, we can make it more stable by writing it to a file:



    $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done >input


    With the file, we will read it through a pipe (see below for redirection case):



    $ ~ cat input | head -n 1; echo ...; tail -n 1; 
    0
    ...


    And indeed, head consumes everything, leaving nothing for tail. As such, we have point 2. So with point 1 and point 2, we can explain the inconsistent behavior:




    In my version of head, at least 1000 lines will be consumed at a time if read through a pipe, and at least 1000 lines are available (the whole thing if less). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail, thus leaving an output.




    Redirection



    So in the above example, we used a pipe to provide the result. The reasoning is that if we used redirection, we will end up with the following result:



    $ ~ head -n 1; echo ...; tail -n 1; <input
    0
    ...
    999


    Which is different from the explanation above. The reasoning is that when used this way, it seems head only reads 1 line:



    $ ~ head -n 1; echo ...; head -n 1; <input
    0
    ...
    1


    The way to explain this is to reference the answer here. In short:




    • pipes are not lseek()'able so commands can't read some data and then rewind back, but when you redirect with > or < usually it's a file which is lseek() able object, so commands can navigate however they please.



    In other words, head need not consume everything if it is able to seek the file directly. It only need to read as much as it need. Once it finds a newline, it can put everything back. We can prove this by using a file with 1 byte after a newline:



    $ ~ cat input
    0123456789
    1
    $ ~ head -n 1; head -c 1; <input
    0123456789
    1$ ~


    If we were to use a pipe, the whole input is consumed, with nothing left for the second head:



    $ ~ cat input | head -n 1; head -c 1; 
    0123456789
    $ ~


    As a side note, if we used process substitution (which results in a non-seekable read as I understand it), we will get the same result:



    $ ~ head -n 1; head -c 1; < <(cat input)
    0123456789
    $ ~





    share|improve this answer


























      up vote
      1
      down vote













      Note: If any information is incorrect, please comment so I can fix or delete.



      As @mosvy and @MichaelHomer mentioned in the comments, this is due to the scheduler scheduling each side of the pipe differently, and at different times. To be clear, we are answering why the following has inconsistent output:



       for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; tail -n 1; 


      With output like:



      0
      ...


      and:



      0
      ...
      999


      Two key points are at play here. The short answer is that because the input into the right side of the pipe is not always all available at once (point 1), head will "consume" different amounts. If the whole input is available (meaning the left side finished first), then the whole input will be consumed due to the implementation of head as explained by @Kusalananda and @mosvy (point 2).



      We will first show point 1. The easiest way to show this is to replace tail with head:



      $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1; 
      0
      ...
      878
      $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
      0
      ...
      820
      $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
      0
      ...
      $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
      0
      ...
      796


      As we can see, the output of the second head is different each time. This shows that the input from the left side is not always available all at once (point 1).



      For each of the case in which there is a number after ..., we will get an output of 999 if we used tail instead. For the case in which nothing came after ..., we will see the same for tail. To prove this, we will show point 2.



      Although there is nothing we can really do about point 1, we can make it more stable by writing it to a file:



      $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done >input


      With the file, we will read it through a pipe (see below for redirection case):



      $ ~ cat input | head -n 1; echo ...; tail -n 1; 
      0
      ...


      And indeed, head consumes everything, leaving nothing for tail. As such, we have point 2. So with point 1 and point 2, we can explain the inconsistent behavior:




      In my version of head, at least 1000 lines will be consumed at a time if read through a pipe, and at least 1000 lines are available (the whole thing if less). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail, thus leaving an output.




      Redirection



      So in the above example, we used a pipe to provide the result. The reasoning is that if we used redirection, we will end up with the following result:



      $ ~ head -n 1; echo ...; tail -n 1; <input
      0
      ...
      999


      Which is different from the explanation above. The reasoning is that when used this way, it seems head only reads 1 line:



      $ ~ head -n 1; echo ...; head -n 1; <input
      0
      ...
      1


      The way to explain this is to reference the answer here. In short:




      • pipes are not lseek()'able so commands can't read some data and then rewind back, but when you redirect with > or < usually it's a file which is lseek() able object, so commands can navigate however they please.



      In other words, head need not consume everything if it is able to seek the file directly. It only need to read as much as it need. Once it finds a newline, it can put everything back. We can prove this by using a file with 1 byte after a newline:



      $ ~ cat input
      0123456789
      1
      $ ~ head -n 1; head -c 1; <input
      0123456789
      1$ ~


      If we were to use a pipe, the whole input is consumed, with nothing left for the second head:



      $ ~ cat input | head -n 1; head -c 1; 
      0123456789
      $ ~


      As a side note, if we used process substitution (which results in a non-seekable read as I understand it), we will get the same result:



      $ ~ head -n 1; head -c 1; < <(cat input)
      0123456789
      $ ~





      share|improve this answer
























        up vote
        1
        down vote










        up vote
        1
        down vote









        Note: If any information is incorrect, please comment so I can fix or delete.



        As @mosvy and @MichaelHomer mentioned in the comments, this is due to the scheduler scheduling each side of the pipe differently, and at different times. To be clear, we are answering why the following has inconsistent output:



         for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; tail -n 1; 


        With output like:



        0
        ...


        and:



        0
        ...
        999


        Two key points are at play here. The short answer is that because the input into the right side of the pipe is not always all available at once (point 1), head will "consume" different amounts. If the whole input is available (meaning the left side finished first), then the whole input will be consumed due to the implementation of head as explained by @Kusalananda and @mosvy (point 2).



        We will first show point 1. The easiest way to show this is to replace tail with head:



        $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1; 
        0
        ...
        878
        $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
        0
        ...
        820
        $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
        0
        ...
        $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
        0
        ...
        796


        As we can see, the output of the second head is different each time. This shows that the input from the left side is not always available all at once (point 1).



        For each of the case in which there is a number after ..., we will get an output of 999 if we used tail instead. For the case in which nothing came after ..., we will see the same for tail. To prove this, we will show point 2.



        Although there is nothing we can really do about point 1, we can make it more stable by writing it to a file:



        $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done >input


        With the file, we will read it through a pipe (see below for redirection case):



        $ ~ cat input | head -n 1; echo ...; tail -n 1; 
        0
        ...


        And indeed, head consumes everything, leaving nothing for tail. As such, we have point 2. So with point 1 and point 2, we can explain the inconsistent behavior:




        In my version of head, at least 1000 lines will be consumed at a time if read through a pipe, and at least 1000 lines are available (the whole thing if less). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail, thus leaving an output.




        Redirection



        So in the above example, we used a pipe to provide the result. The reasoning is that if we used redirection, we will end up with the following result:



        $ ~ head -n 1; echo ...; tail -n 1; <input
        0
        ...
        999


        Which is different from the explanation above. The reasoning is that when used this way, it seems head only reads 1 line:



        $ ~ head -n 1; echo ...; head -n 1; <input
        0
        ...
        1


        The way to explain this is to reference the answer here. In short:




        • pipes are not lseek()'able so commands can't read some data and then rewind back, but when you redirect with > or < usually it's a file which is lseek() able object, so commands can navigate however they please.



        In other words, head need not consume everything if it is able to seek the file directly. It only need to read as much as it need. Once it finds a newline, it can put everything back. We can prove this by using a file with 1 byte after a newline:



        $ ~ cat input
        0123456789
        1
        $ ~ head -n 1; head -c 1; <input
        0123456789
        1$ ~


        If we were to use a pipe, the whole input is consumed, with nothing left for the second head:



        $ ~ cat input | head -n 1; head -c 1; 
        0123456789
        $ ~


        As a side note, if we used process substitution (which results in a non-seekable read as I understand it), we will get the same result:



        $ ~ head -n 1; head -c 1; < <(cat input)
        0123456789
        $ ~





        share|improve this answer














        Note: If any information is incorrect, please comment so I can fix or delete.



        As @mosvy and @MichaelHomer mentioned in the comments, this is due to the scheduler scheduling each side of the pipe differently, and at different times. To be clear, we are answering why the following has inconsistent output:



         for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; tail -n 1; 


        With output like:



        0
        ...


        and:



        0
        ...
        999


        Two key points are at play here. The short answer is that because the input into the right side of the pipe is not always all available at once (point 1), head will "consume" different amounts. If the whole input is available (meaning the left side finished first), then the whole input will be consumed due to the implementation of head as explained by @Kusalananda and @mosvy (point 2).



        We will first show point 1. The easiest way to show this is to replace tail with head:



        $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1; 
        0
        ...
        878
        $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
        0
        ...
        820
        $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
        0
        ...
        $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
        0
        ...
        796


        As we can see, the output of the second head is different each time. This shows that the input from the left side is not always available all at once (point 1).



        For each of the case in which there is a number after ..., we will get an output of 999 if we used tail instead. For the case in which nothing came after ..., we will see the same for tail. To prove this, we will show point 2.



        Although there is nothing we can really do about point 1, we can make it more stable by writing it to a file:



        $ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done >input


        With the file, we will read it through a pipe (see below for redirection case):



        $ ~ cat input | head -n 1; echo ...; tail -n 1; 
        0
        ...


        And indeed, head consumes everything, leaving nothing for tail. As such, we have point 2. So with point 1 and point 2, we can explain the inconsistent behavior:




        In my version of head, at least 1000 lines will be consumed at a time if read through a pipe, and at least 1000 lines are available (the whole thing if less). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail, thus leaving an output.




        Redirection



        So in the above example, we used a pipe to provide the result. The reasoning is that if we used redirection, we will end up with the following result:



        $ ~ head -n 1; echo ...; tail -n 1; <input
        0
        ...
        999


        Which is different from the explanation above. The reasoning is that when used this way, it seems head only reads 1 line:



        $ ~ head -n 1; echo ...; head -n 1; <input
        0
        ...
        1


        The way to explain this is to reference the answer here. In short:




        • pipes are not lseek()'able so commands can't read some data and then rewind back, but when you redirect with > or < usually it's a file which is lseek() able object, so commands can navigate however they please.



        In other words, head need not consume everything if it is able to seek the file directly. It only need to read as much as it need. Once it finds a newline, it can put everything back. We can prove this by using a file with 1 byte after a newline:



        $ ~ cat input
        0123456789
        1
        $ ~ head -n 1; head -c 1; <input
        0123456789
        1$ ~


        If we were to use a pipe, the whole input is consumed, with nothing left for the second head:



        $ ~ cat input | head -n 1; head -c 1; 
        0123456789
        $ ~


        As a side note, if we used process substitution (which results in a non-seekable read as I understand it), we will get the same result:



        $ ~ head -n 1; head -c 1; < <(cat input)
        0123456789
        $ ~






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 28 at 10:01

























        answered Nov 28 at 9:55









        nehcsivart

        333310




        333310



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f484000%2frace-condition-for-shell-blocks-in-bash%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown






            Popular posts from this blog

            Peggy Mitchell

            Palaiologos

            The Forum (Inglewood, California)