Race Condition for Shell Blocks in Bash?

Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
Update: This behavior is observed on Windows Subsystem for Linux. It seems there are two issues we are dealing with here:
Some bug/race condition internal to the system.This is incorrect, see answers.
Default buffer size for
head.
For (2), as @kusalanda mentioned,
headmay have some default buffer size that consumes the input up to a certain point. On ArchLinux, we can see that fori < 10, we consistently see no output fromtail. The same is true for Windows Subsystem for Linux (i.e. no inconsistent output fortail).
For (1), it is possible that there is some bug internal to the Windows Subsystem for Linux itself that causes this race condition, as we do not observe such behavior in ArchLinux.This is incorrect, see answers. There is a "point 1", but it is different.
I am trying to run the following commands in bash version 4.4.19:
for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
Sometimes, I see the expected results:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
999
$ ~
However, more often than not, I see the following:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
$ ~
I suspect this is a race condition. However, if I add a sleep at the beginning of the second block of commands, the "race condition" still happens:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | sleep 10; head -n 1; echo ...; tail -n 1;
0
...
$ ~
Is this actually a race condition? What should I do to make the second block of code see the whole input? Note that if I use 10000 instead of 1000, then I do not see this issue (it is possible that these all just happen to be lucky cases though):
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~
bash shell-script
add a comment |
up vote
0
down vote
favorite
Update: This behavior is observed on Windows Subsystem for Linux. It seems there are two issues we are dealing with here:
Some bug/race condition internal to the system.This is incorrect, see answers.
Default buffer size for
head.
For (2), as @kusalanda mentioned,
headmay have some default buffer size that consumes the input up to a certain point. On ArchLinux, we can see that fori < 10, we consistently see no output fromtail. The same is true for Windows Subsystem for Linux (i.e. no inconsistent output fortail).
For (1), it is possible that there is some bug internal to the Windows Subsystem for Linux itself that causes this race condition, as we do not observe such behavior in ArchLinux.This is incorrect, see answers. There is a "point 1", but it is different.
I am trying to run the following commands in bash version 4.4.19:
for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
Sometimes, I see the expected results:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
999
$ ~
However, more often than not, I see the following:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
$ ~
I suspect this is a race condition. However, if I add a sleep at the beginning of the second block of commands, the "race condition" still happens:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | sleep 10; head -n 1; echo ...; tail -n 1;
0
...
$ ~
Is this actually a race condition? What should I do to make the second block of code see the whole input? Note that if I use 10000 instead of 1000, then I do not see this issue (it is possible that these all just happen to be lucky cases though):
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~
bash shell-script
2
What Unix are you on? On some Unices, theheadimplementation buffers too much of the input data and leaves nothing fortailto work on (this is an error). This ought to be deterministic though.
– Kusalananda
Nov 25 at 8:35
@Kusalananda I see this withi < 1000on Windows Subsystem for Linux, andi < 10for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.
– nehcsivart
Nov 25 at 9:12
WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
– Rui F Ribeiro
Nov 25 at 9:39
1
@Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline likefoo | bar, how much a read(2) insidebarwill return is dependent not only on how muchfoois writing into the pipe, but also on how the kernel schedules thefooandbarprocesses to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes infoomay result in a single read inbar.
– mosvy
Nov 25 at 20:53
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Update: This behavior is observed on Windows Subsystem for Linux. It seems there are two issues we are dealing with here:
Some bug/race condition internal to the system.This is incorrect, see answers.
Default buffer size for
head.
For (2), as @kusalanda mentioned,
headmay have some default buffer size that consumes the input up to a certain point. On ArchLinux, we can see that fori < 10, we consistently see no output fromtail. The same is true for Windows Subsystem for Linux (i.e. no inconsistent output fortail).
For (1), it is possible that there is some bug internal to the Windows Subsystem for Linux itself that causes this race condition, as we do not observe such behavior in ArchLinux.This is incorrect, see answers. There is a "point 1", but it is different.
I am trying to run the following commands in bash version 4.4.19:
for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
Sometimes, I see the expected results:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
999
$ ~
However, more often than not, I see the following:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
$ ~
I suspect this is a race condition. However, if I add a sleep at the beginning of the second block of commands, the "race condition" still happens:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | sleep 10; head -n 1; echo ...; tail -n 1;
0
...
$ ~
Is this actually a race condition? What should I do to make the second block of code see the whole input? Note that if I use 10000 instead of 1000, then I do not see this issue (it is possible that these all just happen to be lucky cases though):
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~
bash shell-script
Update: This behavior is observed on Windows Subsystem for Linux. It seems there are two issues we are dealing with here:
Some bug/race condition internal to the system.This is incorrect, see answers.
Default buffer size for
head.
For (2), as @kusalanda mentioned,
headmay have some default buffer size that consumes the input up to a certain point. On ArchLinux, we can see that fori < 10, we consistently see no output fromtail. The same is true for Windows Subsystem for Linux (i.e. no inconsistent output fortail).
For (1), it is possible that there is some bug internal to the Windows Subsystem for Linux itself that causes this race condition, as we do not observe such behavior in ArchLinux.This is incorrect, see answers. There is a "point 1", but it is different.
I am trying to run the following commands in bash version 4.4.19:
for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
Sometimes, I see the expected results:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
999
$ ~
However, more often than not, I see the following:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
$ ~
I suspect this is a race condition. However, if I add a sleep at the beginning of the second block of commands, the "race condition" still happens:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done; | sleep 10; head -n 1; echo ...; tail -n 1;
0
...
$ ~
Is this actually a race condition? What should I do to make the second block of code see the whole input? Note that if I use 10000 instead of 1000, then I do not see this issue (it is possible that these all just happen to be lucky cases though):
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~ for ((i = 0; i < 10000; ++i)); do echo $i; done; | head -n 1; echo ...; tail -n 1;
0
...
9999
$ ~
bash shell-script
bash shell-script
edited Nov 28 at 10:01
asked Nov 25 at 8:20
nehcsivart
333310
333310
2
What Unix are you on? On some Unices, theheadimplementation buffers too much of the input data and leaves nothing fortailto work on (this is an error). This ought to be deterministic though.
– Kusalananda
Nov 25 at 8:35
@Kusalananda I see this withi < 1000on Windows Subsystem for Linux, andi < 10for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.
– nehcsivart
Nov 25 at 9:12
WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
– Rui F Ribeiro
Nov 25 at 9:39
1
@Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline likefoo | bar, how much a read(2) insidebarwill return is dependent not only on how muchfoois writing into the pipe, but also on how the kernel schedules thefooandbarprocesses to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes infoomay result in a single read inbar.
– mosvy
Nov 25 at 20:53
add a comment |
2
What Unix are you on? On some Unices, theheadimplementation buffers too much of the input data and leaves nothing fortailto work on (this is an error). This ought to be deterministic though.
– Kusalananda
Nov 25 at 8:35
@Kusalananda I see this withi < 1000on Windows Subsystem for Linux, andi < 10for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.
– nehcsivart
Nov 25 at 9:12
WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
– Rui F Ribeiro
Nov 25 at 9:39
1
@Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline likefoo | bar, how much a read(2) insidebarwill return is dependent not only on how muchfoois writing into the pipe, but also on how the kernel schedules thefooandbarprocesses to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes infoomay result in a single read inbar.
– mosvy
Nov 25 at 20:53
2
2
What Unix are you on? On some Unices, the
head implementation buffers too much of the input data and leaves nothing for tail to work on (this is an error). This ought to be deterministic though.– Kusalananda
Nov 25 at 8:35
What Unix are you on? On some Unices, the
head implementation buffers too much of the input data and leaves nothing for tail to work on (this is an error). This ought to be deterministic though.– Kusalananda
Nov 25 at 8:35
@Kusalananda I see this with
i < 1000 on Windows Subsystem for Linux, and i < 10 for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.– nehcsivart
Nov 25 at 9:12
@Kusalananda I see this with
i < 1000 on Windows Subsystem for Linux, and i < 10 for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.– nehcsivart
Nov 25 at 9:12
WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
– Rui F Ribeiro
Nov 25 at 9:39
WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
– Rui F Ribeiro
Nov 25 at 9:39
1
1
@Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline like
foo | bar, how much a read(2) inside bar will return is dependent not only on how much foo is writing into the pipe, but also on how the kernel schedules the foo and bar processes to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes in foo may result in a single read in bar.– mosvy
Nov 25 at 20:53
@Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline like
foo | bar, how much a read(2) inside bar will return is dependent not only on how much foo is writing into the pipe, but also on how the kernel schedules the foo and bar processes to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes in foo may result in a single read in bar.– mosvy
Nov 25 at 20:53
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
This is no race condition and no bug in WSL or ArchLinux.
As you mention, it's because head is reading more than it "should", and so it may not leave enough or anything at all for tail to work on. But there is nothing in the standard or elsewhere which says that head should only read a certain amount of bytes; it could just as well read the whole file and then discard everything but its first line.
In order to "fix" that in all possible cases, head would have to always read its input byte by byte (ie do a system call for each byte) and that would be horrendously inefficient, and absolutely useless in 99.999% of cases.
If you want to avoid that, you can
1) use a temporary file instead of a pipe; then
head -n <tmpfile; tail -n <tmpfile;
will work as expected.
2) reimplement your head/tail combination with something else, eg. in awk:
$ seq 10000 20000 | awk -vH=2 -vT=3 'if(NR<=H)print; else a[i++%T]=$0ENDif((j=i-T)>0)print "..."; else j=0; while(j<i)print a[j++%T]'
10000
10001
...
19998
19999
20000
I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case offoo | barwherefooandbarexecutes at different schedules each time we run it, wouldn't the output ofbarstill be the same, providedbaritself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case ofheadandtail, would the output not always be the same given the same input, albeit with different buffered reads?
– nehcsivart
Nov 27 at 3:23
1
Keep in mind that ifbardoes eg. aread(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how muchfoowas able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions offoo,barand all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
– mosvy
Nov 27 at 3:52
1
@nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tinysleepinto each iteration - and see that it behaves consistently then. It's no more correct either way, though, onhead's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
– Michael Homer
Nov 27 at 5:44
@MichaelHomer @mosvy Thanks, I think I understand now. Long story short, supposeheadwill consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts,headwill consume everything, leaving nothing fortail. If, however, the left side does not finish,headwill only consume those that are done. This means something is leftover fortail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
– nehcsivart
Nov 28 at 9:54
add a comment |
up vote
1
down vote
Note: If any information is incorrect, please comment so I can fix or delete.
As @mosvy and @MichaelHomer mentioned in the comments, this is due to the scheduler scheduling each side of the pipe differently, and at different times. To be clear, we are answering why the following has inconsistent output:
for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; tail -n 1;
With output like:
0
...
and:
0
...
999
Two key points are at play here. The short answer is that because the input into the right side of the pipe is not always all available at once (point 1), head will "consume" different amounts. If the whole input is available (meaning the left side finished first), then the whole input will be consumed due to the implementation of head as explained by @Kusalananda and @mosvy (point 2).
We will first show point 1. The easiest way to show this is to replace tail with head:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
878
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
820
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
796
As we can see, the output of the second head is different each time. This shows that the input from the left side is not always available all at once (point 1).
For each of the case in which there is a number after ..., we will get an output of 999 if we used tail instead. For the case in which nothing came after ..., we will see the same for tail. To prove this, we will show point 2.
Although there is nothing we can really do about point 1, we can make it more stable by writing it to a file:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done >input
With the file, we will read it through a pipe (see below for redirection case):
$ ~ cat input | head -n 1; echo ...; tail -n 1;
0
...
And indeed, head consumes everything, leaving nothing for tail. As such, we have point 2. So with point 1 and point 2, we can explain the inconsistent behavior:
In my version of
head, at least 1000 lines will be consumed at a time if read through a pipe, and at least 1000 lines are available (the whole thing if less). If all of the left side finishes before the right side even starts,headwill consume everything, leaving nothing fortail. If, however, the left side does not finish,headwill only consume those that are done. This means something is leftover fortail, thus leaving an output.
Redirection
So in the above example, we used a pipe to provide the result. The reasoning is that if we used redirection, we will end up with the following result:
$ ~ head -n 1; echo ...; tail -n 1; <input
0
...
999
Which is different from the explanation above. The reasoning is that when used this way, it seems head only reads 1 line:
$ ~ head -n 1; echo ...; head -n 1; <input
0
...
1
The way to explain this is to reference the answer here. In short:
- pipes are not lseek()'able so commands can't read some data and then rewind back, but when you redirect with > or < usually it's a file which is lseek() able object, so commands can navigate however they please.
In other words, head need not consume everything if it is able to seek the file directly. It only need to read as much as it need. Once it finds a newline, it can put everything back. We can prove this by using a file with 1 byte after a newline:
$ ~ cat input
0123456789
1
$ ~ head -n 1; head -c 1; <input
0123456789
1$ ~
If we were to use a pipe, the whole input is consumed, with nothing left for the second head:
$ ~ cat input | head -n 1; head -c 1;
0123456789
$ ~
As a side note, if we used process substitution (which results in a non-seekable read as I understand it), we will get the same result:
$ ~ head -n 1; head -c 1; < <(cat input)
0123456789
$ ~
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
This is no race condition and no bug in WSL or ArchLinux.
As you mention, it's because head is reading more than it "should", and so it may not leave enough or anything at all for tail to work on. But there is nothing in the standard or elsewhere which says that head should only read a certain amount of bytes; it could just as well read the whole file and then discard everything but its first line.
In order to "fix" that in all possible cases, head would have to always read its input byte by byte (ie do a system call for each byte) and that would be horrendously inefficient, and absolutely useless in 99.999% of cases.
If you want to avoid that, you can
1) use a temporary file instead of a pipe; then
head -n <tmpfile; tail -n <tmpfile;
will work as expected.
2) reimplement your head/tail combination with something else, eg. in awk:
$ seq 10000 20000 | awk -vH=2 -vT=3 'if(NR<=H)print; else a[i++%T]=$0ENDif((j=i-T)>0)print "..."; else j=0; while(j<i)print a[j++%T]'
10000
10001
...
19998
19999
20000
I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case offoo | barwherefooandbarexecutes at different schedules each time we run it, wouldn't the output ofbarstill be the same, providedbaritself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case ofheadandtail, would the output not always be the same given the same input, albeit with different buffered reads?
– nehcsivart
Nov 27 at 3:23
1
Keep in mind that ifbardoes eg. aread(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how muchfoowas able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions offoo,barand all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
– mosvy
Nov 27 at 3:52
1
@nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tinysleepinto each iteration - and see that it behaves consistently then. It's no more correct either way, though, onhead's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
– Michael Homer
Nov 27 at 5:44
@MichaelHomer @mosvy Thanks, I think I understand now. Long story short, supposeheadwill consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts,headwill consume everything, leaving nothing fortail. If, however, the left side does not finish,headwill only consume those that are done. This means something is leftover fortail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
– nehcsivart
Nov 28 at 9:54
add a comment |
up vote
1
down vote
This is no race condition and no bug in WSL or ArchLinux.
As you mention, it's because head is reading more than it "should", and so it may not leave enough or anything at all for tail to work on. But there is nothing in the standard or elsewhere which says that head should only read a certain amount of bytes; it could just as well read the whole file and then discard everything but its first line.
In order to "fix" that in all possible cases, head would have to always read its input byte by byte (ie do a system call for each byte) and that would be horrendously inefficient, and absolutely useless in 99.999% of cases.
If you want to avoid that, you can
1) use a temporary file instead of a pipe; then
head -n <tmpfile; tail -n <tmpfile;
will work as expected.
2) reimplement your head/tail combination with something else, eg. in awk:
$ seq 10000 20000 | awk -vH=2 -vT=3 'if(NR<=H)print; else a[i++%T]=$0ENDif((j=i-T)>0)print "..."; else j=0; while(j<i)print a[j++%T]'
10000
10001
...
19998
19999
20000
I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case offoo | barwherefooandbarexecutes at different schedules each time we run it, wouldn't the output ofbarstill be the same, providedbaritself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case ofheadandtail, would the output not always be the same given the same input, albeit with different buffered reads?
– nehcsivart
Nov 27 at 3:23
1
Keep in mind that ifbardoes eg. aread(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how muchfoowas able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions offoo,barand all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
– mosvy
Nov 27 at 3:52
1
@nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tinysleepinto each iteration - and see that it behaves consistently then. It's no more correct either way, though, onhead's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
– Michael Homer
Nov 27 at 5:44
@MichaelHomer @mosvy Thanks, I think I understand now. Long story short, supposeheadwill consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts,headwill consume everything, leaving nothing fortail. If, however, the left side does not finish,headwill only consume those that are done. This means something is leftover fortail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
– nehcsivart
Nov 28 at 9:54
add a comment |
up vote
1
down vote
up vote
1
down vote
This is no race condition and no bug in WSL or ArchLinux.
As you mention, it's because head is reading more than it "should", and so it may not leave enough or anything at all for tail to work on. But there is nothing in the standard or elsewhere which says that head should only read a certain amount of bytes; it could just as well read the whole file and then discard everything but its first line.
In order to "fix" that in all possible cases, head would have to always read its input byte by byte (ie do a system call for each byte) and that would be horrendously inefficient, and absolutely useless in 99.999% of cases.
If you want to avoid that, you can
1) use a temporary file instead of a pipe; then
head -n <tmpfile; tail -n <tmpfile;
will work as expected.
2) reimplement your head/tail combination with something else, eg. in awk:
$ seq 10000 20000 | awk -vH=2 -vT=3 'if(NR<=H)print; else a[i++%T]=$0ENDif((j=i-T)>0)print "..."; else j=0; while(j<i)print a[j++%T]'
10000
10001
...
19998
19999
20000
This is no race condition and no bug in WSL or ArchLinux.
As you mention, it's because head is reading more than it "should", and so it may not leave enough or anything at all for tail to work on. But there is nothing in the standard or elsewhere which says that head should only read a certain amount of bytes; it could just as well read the whole file and then discard everything but its first line.
In order to "fix" that in all possible cases, head would have to always read its input byte by byte (ie do a system call for each byte) and that would be horrendously inefficient, and absolutely useless in 99.999% of cases.
If you want to avoid that, you can
1) use a temporary file instead of a pipe; then
head -n <tmpfile; tail -n <tmpfile;
will work as expected.
2) reimplement your head/tail combination with something else, eg. in awk:
$ seq 10000 20000 | awk -vH=2 -vT=3 'if(NR<=H)print; else a[i++%T]=$0ENDif((j=i-T)>0)print "..."; else j=0; while(j<i)print a[j++%T]'
10000
10001
...
19998
19999
20000
edited Nov 26 at 1:20
answered Nov 25 at 20:28
mosvy
5,046323
5,046323
I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case offoo | barwherefooandbarexecutes at different schedules each time we run it, wouldn't the output ofbarstill be the same, providedbaritself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case ofheadandtail, would the output not always be the same given the same input, albeit with different buffered reads?
– nehcsivart
Nov 27 at 3:23
1
Keep in mind that ifbardoes eg. aread(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how muchfoowas able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions offoo,barand all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
– mosvy
Nov 27 at 3:52
1
@nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tinysleepinto each iteration - and see that it behaves consistently then. It's no more correct either way, though, onhead's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
– Michael Homer
Nov 27 at 5:44
@MichaelHomer @mosvy Thanks, I think I understand now. Long story short, supposeheadwill consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts,headwill consume everything, leaving nothing fortail. If, however, the left side does not finish,headwill only consume those that are done. This means something is leftover fortail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
– nehcsivart
Nov 28 at 9:54
add a comment |
I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case offoo | barwherefooandbarexecutes at different schedules each time we run it, wouldn't the output ofbarstill be the same, providedbaritself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case ofheadandtail, would the output not always be the same given the same input, albeit with different buffered reads?
– nehcsivart
Nov 27 at 3:23
1
Keep in mind that ifbardoes eg. aread(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how muchfoowas able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions offoo,barand all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.
– mosvy
Nov 27 at 3:52
1
@nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tinysleepinto each iteration - and see that it behaves consistently then. It's no more correct either way, though, onhead's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".
– Michael Homer
Nov 27 at 5:44
@MichaelHomer @mosvy Thanks, I think I understand now. Long story short, supposeheadwill consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts,headwill consume everything, leaving nothing fortail. If, however, the left side does not finish,headwill only consume those that are done. This means something is leftover fortail. I will add an answer that explains this experimentally in more detail. Thanks for the info!
– nehcsivart
Nov 28 at 9:54
I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case of
foo | bar where foo and bar executes at different schedules each time we run it, wouldn't the output of bar still be the same, provided bar itself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case of head and tail, would the output not always be the same given the same input, albeit with different buffered reads?– nehcsivart
Nov 27 at 3:23
I see, but can you explain in more detail the reason for the inconsistent behavior? Even in case of
foo | bar where foo and bar executes at different schedules each time we run it, wouldn't the output of bar still be the same, provided bar itself is "deterministic"? I mean regardless of how the kernel schedules it, at least in the case of head and tail, would the output not always be the same given the same input, albeit with different buffered reads?– nehcsivart
Nov 27 at 3:23
1
1
Keep in mind that if
bar does eg. a read(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how much foo was able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions of foo, bar and all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.– mosvy
Nov 27 at 3:52
Keep in mind that if
bar does eg. a read(0, buf, 4096), the kernel will NOT wait until 4096 bytes have accumulated in the pipe buffer but will just return whatever is already there. But that depends on how much foo was able to write, which in turn depends on the load of the system, on how the kernel decided to interspede the executions of foo, bar and all the other processes, handle interrupt storms, user input, etc. As I already said in another comment, a single write may result in multiple reads and multiple writes in a single read.– mosvy
Nov 27 at 3:52
1
1
@nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tiny
sleep into each iteration - and see that it behaves consistently then. It's no more correct either way, though, on head's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".– Michael Homer
Nov 27 at 5:44
@nehcsivart: You can quasi-observe the scheduling issue by delaying the left-hand side of the pipe - just put a tiny
sleep into each iteration - and see that it behaves consistently then. It's no more correct either way, though, on head's part or the system's. Regardless, any time (descendents of) the same file handle are accessed from multiple processes you're well into the woods: POSIX says that "It is implementation-defined whether, and under what conditions, all input is seen exactly once.".– Michael Homer
Nov 27 at 5:44
@MichaelHomer @mosvy Thanks, I think I understand now. Long story short, suppose
head will consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail. I will add an answer that explains this experimentally in more detail. Thanks for the info!– nehcsivart
Nov 28 at 9:54
@MichaelHomer @mosvy Thanks, I think I understand now. Long story short, suppose
head will consume input of 1000 (which it probably does). If all of the left side finishes before the right side even starts, head will consume everything, leaving nothing for tail. If, however, the left side does not finish, head will only consume those that are done. This means something is leftover for tail. I will add an answer that explains this experimentally in more detail. Thanks for the info!– nehcsivart
Nov 28 at 9:54
add a comment |
up vote
1
down vote
Note: If any information is incorrect, please comment so I can fix or delete.
As @mosvy and @MichaelHomer mentioned in the comments, this is due to the scheduler scheduling each side of the pipe differently, and at different times. To be clear, we are answering why the following has inconsistent output:
for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; tail -n 1;
With output like:
0
...
and:
0
...
999
Two key points are at play here. The short answer is that because the input into the right side of the pipe is not always all available at once (point 1), head will "consume" different amounts. If the whole input is available (meaning the left side finished first), then the whole input will be consumed due to the implementation of head as explained by @Kusalananda and @mosvy (point 2).
We will first show point 1. The easiest way to show this is to replace tail with head:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
878
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
820
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
796
As we can see, the output of the second head is different each time. This shows that the input from the left side is not always available all at once (point 1).
For each of the case in which there is a number after ..., we will get an output of 999 if we used tail instead. For the case in which nothing came after ..., we will see the same for tail. To prove this, we will show point 2.
Although there is nothing we can really do about point 1, we can make it more stable by writing it to a file:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done >input
With the file, we will read it through a pipe (see below for redirection case):
$ ~ cat input | head -n 1; echo ...; tail -n 1;
0
...
And indeed, head consumes everything, leaving nothing for tail. As such, we have point 2. So with point 1 and point 2, we can explain the inconsistent behavior:
In my version of
head, at least 1000 lines will be consumed at a time if read through a pipe, and at least 1000 lines are available (the whole thing if less). If all of the left side finishes before the right side even starts,headwill consume everything, leaving nothing fortail. If, however, the left side does not finish,headwill only consume those that are done. This means something is leftover fortail, thus leaving an output.
Redirection
So in the above example, we used a pipe to provide the result. The reasoning is that if we used redirection, we will end up with the following result:
$ ~ head -n 1; echo ...; tail -n 1; <input
0
...
999
Which is different from the explanation above. The reasoning is that when used this way, it seems head only reads 1 line:
$ ~ head -n 1; echo ...; head -n 1; <input
0
...
1
The way to explain this is to reference the answer here. In short:
- pipes are not lseek()'able so commands can't read some data and then rewind back, but when you redirect with > or < usually it's a file which is lseek() able object, so commands can navigate however they please.
In other words, head need not consume everything if it is able to seek the file directly. It only need to read as much as it need. Once it finds a newline, it can put everything back. We can prove this by using a file with 1 byte after a newline:
$ ~ cat input
0123456789
1
$ ~ head -n 1; head -c 1; <input
0123456789
1$ ~
If we were to use a pipe, the whole input is consumed, with nothing left for the second head:
$ ~ cat input | head -n 1; head -c 1;
0123456789
$ ~
As a side note, if we used process substitution (which results in a non-seekable read as I understand it), we will get the same result:
$ ~ head -n 1; head -c 1; < <(cat input)
0123456789
$ ~
add a comment |
up vote
1
down vote
Note: If any information is incorrect, please comment so I can fix or delete.
As @mosvy and @MichaelHomer mentioned in the comments, this is due to the scheduler scheduling each side of the pipe differently, and at different times. To be clear, we are answering why the following has inconsistent output:
for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; tail -n 1;
With output like:
0
...
and:
0
...
999
Two key points are at play here. The short answer is that because the input into the right side of the pipe is not always all available at once (point 1), head will "consume" different amounts. If the whole input is available (meaning the left side finished first), then the whole input will be consumed due to the implementation of head as explained by @Kusalananda and @mosvy (point 2).
We will first show point 1. The easiest way to show this is to replace tail with head:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
878
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
820
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
796
As we can see, the output of the second head is different each time. This shows that the input from the left side is not always available all at once (point 1).
For each of the case in which there is a number after ..., we will get an output of 999 if we used tail instead. For the case in which nothing came after ..., we will see the same for tail. To prove this, we will show point 2.
Although there is nothing we can really do about point 1, we can make it more stable by writing it to a file:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done >input
With the file, we will read it through a pipe (see below for redirection case):
$ ~ cat input | head -n 1; echo ...; tail -n 1;
0
...
And indeed, head consumes everything, leaving nothing for tail. As such, we have point 2. So with point 1 and point 2, we can explain the inconsistent behavior:
In my version of
head, at least 1000 lines will be consumed at a time if read through a pipe, and at least 1000 lines are available (the whole thing if less). If all of the left side finishes before the right side even starts,headwill consume everything, leaving nothing fortail. If, however, the left side does not finish,headwill only consume those that are done. This means something is leftover fortail, thus leaving an output.
Redirection
So in the above example, we used a pipe to provide the result. The reasoning is that if we used redirection, we will end up with the following result:
$ ~ head -n 1; echo ...; tail -n 1; <input
0
...
999
Which is different from the explanation above. The reasoning is that when used this way, it seems head only reads 1 line:
$ ~ head -n 1; echo ...; head -n 1; <input
0
...
1
The way to explain this is to reference the answer here. In short:
- pipes are not lseek()'able so commands can't read some data and then rewind back, but when you redirect with > or < usually it's a file which is lseek() able object, so commands can navigate however they please.
In other words, head need not consume everything if it is able to seek the file directly. It only need to read as much as it need. Once it finds a newline, it can put everything back. We can prove this by using a file with 1 byte after a newline:
$ ~ cat input
0123456789
1
$ ~ head -n 1; head -c 1; <input
0123456789
1$ ~
If we were to use a pipe, the whole input is consumed, with nothing left for the second head:
$ ~ cat input | head -n 1; head -c 1;
0123456789
$ ~
As a side note, if we used process substitution (which results in a non-seekable read as I understand it), we will get the same result:
$ ~ head -n 1; head -c 1; < <(cat input)
0123456789
$ ~
add a comment |
up vote
1
down vote
up vote
1
down vote
Note: If any information is incorrect, please comment so I can fix or delete.
As @mosvy and @MichaelHomer mentioned in the comments, this is due to the scheduler scheduling each side of the pipe differently, and at different times. To be clear, we are answering why the following has inconsistent output:
for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; tail -n 1;
With output like:
0
...
and:
0
...
999
Two key points are at play here. The short answer is that because the input into the right side of the pipe is not always all available at once (point 1), head will "consume" different amounts. If the whole input is available (meaning the left side finished first), then the whole input will be consumed due to the implementation of head as explained by @Kusalananda and @mosvy (point 2).
We will first show point 1. The easiest way to show this is to replace tail with head:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
878
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
820
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
796
As we can see, the output of the second head is different each time. This shows that the input from the left side is not always available all at once (point 1).
For each of the case in which there is a number after ..., we will get an output of 999 if we used tail instead. For the case in which nothing came after ..., we will see the same for tail. To prove this, we will show point 2.
Although there is nothing we can really do about point 1, we can make it more stable by writing it to a file:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done >input
With the file, we will read it through a pipe (see below for redirection case):
$ ~ cat input | head -n 1; echo ...; tail -n 1;
0
...
And indeed, head consumes everything, leaving nothing for tail. As such, we have point 2. So with point 1 and point 2, we can explain the inconsistent behavior:
In my version of
head, at least 1000 lines will be consumed at a time if read through a pipe, and at least 1000 lines are available (the whole thing if less). If all of the left side finishes before the right side even starts,headwill consume everything, leaving nothing fortail. If, however, the left side does not finish,headwill only consume those that are done. This means something is leftover fortail, thus leaving an output.
Redirection
So in the above example, we used a pipe to provide the result. The reasoning is that if we used redirection, we will end up with the following result:
$ ~ head -n 1; echo ...; tail -n 1; <input
0
...
999
Which is different from the explanation above. The reasoning is that when used this way, it seems head only reads 1 line:
$ ~ head -n 1; echo ...; head -n 1; <input
0
...
1
The way to explain this is to reference the answer here. In short:
- pipes are not lseek()'able so commands can't read some data and then rewind back, but when you redirect with > or < usually it's a file which is lseek() able object, so commands can navigate however they please.
In other words, head need not consume everything if it is able to seek the file directly. It only need to read as much as it need. Once it finds a newline, it can put everything back. We can prove this by using a file with 1 byte after a newline:
$ ~ cat input
0123456789
1
$ ~ head -n 1; head -c 1; <input
0123456789
1$ ~
If we were to use a pipe, the whole input is consumed, with nothing left for the second head:
$ ~ cat input | head -n 1; head -c 1;
0123456789
$ ~
As a side note, if we used process substitution (which results in a non-seekable read as I understand it), we will get the same result:
$ ~ head -n 1; head -c 1; < <(cat input)
0123456789
$ ~
Note: If any information is incorrect, please comment so I can fix or delete.
As @mosvy and @MichaelHomer mentioned in the comments, this is due to the scheduler scheduling each side of the pipe differently, and at different times. To be clear, we are answering why the following has inconsistent output:
for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; tail -n 1;
With output like:
0
...
and:
0
...
999
Two key points are at play here. The short answer is that because the input into the right side of the pipe is not always all available at once (point 1), head will "consume" different amounts. If the whole input is available (meaning the left side finished first), then the whole input will be consumed due to the implementation of head as explained by @Kusalananda and @mosvy (point 2).
We will first show point 1. The easiest way to show this is to replace tail with head:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
878
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
820
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done | head -n 1; echo ...; head -n 1;
0
...
796
As we can see, the output of the second head is different each time. This shows that the input from the left side is not always available all at once (point 1).
For each of the case in which there is a number after ..., we will get an output of 999 if we used tail instead. For the case in which nothing came after ..., we will see the same for tail. To prove this, we will show point 2.
Although there is nothing we can really do about point 1, we can make it more stable by writing it to a file:
$ ~ for ((i = 0; i < 1000; ++i)); do echo $i; done >input
With the file, we will read it through a pipe (see below for redirection case):
$ ~ cat input | head -n 1; echo ...; tail -n 1;
0
...
And indeed, head consumes everything, leaving nothing for tail. As such, we have point 2. So with point 1 and point 2, we can explain the inconsistent behavior:
In my version of
head, at least 1000 lines will be consumed at a time if read through a pipe, and at least 1000 lines are available (the whole thing if less). If all of the left side finishes before the right side even starts,headwill consume everything, leaving nothing fortail. If, however, the left side does not finish,headwill only consume those that are done. This means something is leftover fortail, thus leaving an output.
Redirection
So in the above example, we used a pipe to provide the result. The reasoning is that if we used redirection, we will end up with the following result:
$ ~ head -n 1; echo ...; tail -n 1; <input
0
...
999
Which is different from the explanation above. The reasoning is that when used this way, it seems head only reads 1 line:
$ ~ head -n 1; echo ...; head -n 1; <input
0
...
1
The way to explain this is to reference the answer here. In short:
- pipes are not lseek()'able so commands can't read some data and then rewind back, but when you redirect with > or < usually it's a file which is lseek() able object, so commands can navigate however they please.
In other words, head need not consume everything if it is able to seek the file directly. It only need to read as much as it need. Once it finds a newline, it can put everything back. We can prove this by using a file with 1 byte after a newline:
$ ~ cat input
0123456789
1
$ ~ head -n 1; head -c 1; <input
0123456789
1$ ~
If we were to use a pipe, the whole input is consumed, with nothing left for the second head:
$ ~ cat input | head -n 1; head -c 1;
0123456789
$ ~
As a side note, if we used process substitution (which results in a non-seekable read as I understand it), we will get the same result:
$ ~ head -n 1; head -c 1; < <(cat input)
0123456789
$ ~
edited Nov 28 at 10:01
answered Nov 28 at 9:55
nehcsivart
333310
333310
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f484000%2frace-condition-for-shell-blocks-in-bash%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
What Unix are you on? On some Unices, the
headimplementation buffers too much of the input data and leaves nothing fortailto work on (this is an error). This ought to be deterministic though.– Kusalananda
Nov 25 at 8:35
@Kusalananda I see this with
i < 1000on Windows Subsystem for Linux, andi < 10for ArchLinux. Now that you mention it, this might be a Windows Subsystem for Linux issue (with regards to the non-determinism). For ArchLinux, at least so far, the behavior seems deterministic. I will update the question with this information.– nehcsivart
Nov 25 at 9:12
WSL is not Linux....it emulates a Linux environment for all purposes, on top of the Windows HAL. Hmmmm...I wonder.where is Ipor Sircer and his "File a bug report".
– Rui F Ribeiro
Nov 25 at 9:39
1
@Kusalananda this doesn't have to be deterministic (unless you mean it in a philosophical sense). In a pipeline like
foo | bar, how much a read(2) insidebarwill return is dependent not only on how muchfoois writing into the pipe, but also on how the kernel schedules thefooandbarprocesses to run. Notice that pipes on Unix are not message/boundary preserving; 5 writes infoomay result in a single read inbar.– mosvy
Nov 25 at 20:53