Explanation of unpredictable behaviour of tee

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite
1












I encountered a behaviour that I don't understand while testing a script that sums the outputs from repeated executions of a program. To reproduce it create the text files out, which represents the output of my program, and sum, the file that holds the sum of the values returned on previous executions and which starts out as a copy of out,



cat > out << EOF
2 20
5 50
EOF
cp out sum


The strange thing happens on running



paste out sum | awk '$1 += $3; $2 += $4; NF = 2; print' | tee sum


several times (15-20 times might be needed). Each time it runs, this command should add to the values in sum the corresponding values in out and write the results back to sum. What I get is that it works an unpredictable number of times, then sum reverts back to



2 20
5 50


I have later learned that I cannot redirect or tee output to the same file I'm working on and solved the issue using a temporary file, still, this behaviour baffles me:



  • why does … | tee sum work at all (even if only for a limited number of iterations), while … > sum never overwrites sum?


  • why doesn't it work a predictable number of times?










share|improve this question























  • "while … > sum never overwrites sum" -- can you elaborate on that? What command, exactly, never overwrites sum?
    – ilkkachu
    Aug 26 at 18:49










  • paste out sum | awk '$1 += $3; $2 += $4; NF = 2; print' > sum. If you try that and then cat sum, you should see that sum has not changed.
    – Arch Stanton
    Aug 26 at 18:52















up vote
1
down vote

favorite
1












I encountered a behaviour that I don't understand while testing a script that sums the outputs from repeated executions of a program. To reproduce it create the text files out, which represents the output of my program, and sum, the file that holds the sum of the values returned on previous executions and which starts out as a copy of out,



cat > out << EOF
2 20
5 50
EOF
cp out sum


The strange thing happens on running



paste out sum | awk '$1 += $3; $2 += $4; NF = 2; print' | tee sum


several times (15-20 times might be needed). Each time it runs, this command should add to the values in sum the corresponding values in out and write the results back to sum. What I get is that it works an unpredictable number of times, then sum reverts back to



2 20
5 50


I have later learned that I cannot redirect or tee output to the same file I'm working on and solved the issue using a temporary file, still, this behaviour baffles me:



  • why does … | tee sum work at all (even if only for a limited number of iterations), while … > sum never overwrites sum?


  • why doesn't it work a predictable number of times?










share|improve this question























  • "while … > sum never overwrites sum" -- can you elaborate on that? What command, exactly, never overwrites sum?
    – ilkkachu
    Aug 26 at 18:49










  • paste out sum | awk '$1 += $3; $2 += $4; NF = 2; print' > sum. If you try that and then cat sum, you should see that sum has not changed.
    – Arch Stanton
    Aug 26 at 18:52













up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





I encountered a behaviour that I don't understand while testing a script that sums the outputs from repeated executions of a program. To reproduce it create the text files out, which represents the output of my program, and sum, the file that holds the sum of the values returned on previous executions and which starts out as a copy of out,



cat > out << EOF
2 20
5 50
EOF
cp out sum


The strange thing happens on running



paste out sum | awk '$1 += $3; $2 += $4; NF = 2; print' | tee sum


several times (15-20 times might be needed). Each time it runs, this command should add to the values in sum the corresponding values in out and write the results back to sum. What I get is that it works an unpredictable number of times, then sum reverts back to



2 20
5 50


I have later learned that I cannot redirect or tee output to the same file I'm working on and solved the issue using a temporary file, still, this behaviour baffles me:



  • why does … | tee sum work at all (even if only for a limited number of iterations), while … > sum never overwrites sum?


  • why doesn't it work a predictable number of times?










share|improve this question















I encountered a behaviour that I don't understand while testing a script that sums the outputs from repeated executions of a program. To reproduce it create the text files out, which represents the output of my program, and sum, the file that holds the sum of the values returned on previous executions and which starts out as a copy of out,



cat > out << EOF
2 20
5 50
EOF
cp out sum


The strange thing happens on running



paste out sum | awk '$1 += $3; $2 += $4; NF = 2; print' | tee sum


several times (15-20 times might be needed). Each time it runs, this command should add to the values in sum the corresponding values in out and write the results back to sum. What I get is that it works an unpredictable number of times, then sum reverts back to



2 20
5 50


I have later learned that I cannot redirect or tee output to the same file I'm working on and solved the issue using a temporary file, still, this behaviour baffles me:



  • why does … | tee sum work at all (even if only for a limited number of iterations), while … > sum never overwrites sum?


  • why doesn't it work a predictable number of times?







bash io-redirection tee






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 26 at 18:29









Jeff Schaller

32.7k849110




32.7k849110










asked Aug 26 at 17:21









Arch Stanton

160213




160213











  • "while … > sum never overwrites sum" -- can you elaborate on that? What command, exactly, never overwrites sum?
    – ilkkachu
    Aug 26 at 18:49










  • paste out sum | awk '$1 += $3; $2 += $4; NF = 2; print' > sum. If you try that and then cat sum, you should see that sum has not changed.
    – Arch Stanton
    Aug 26 at 18:52

















  • "while … > sum never overwrites sum" -- can you elaborate on that? What command, exactly, never overwrites sum?
    – ilkkachu
    Aug 26 at 18:49










  • paste out sum | awk '$1 += $3; $2 += $4; NF = 2; print' > sum. If you try that and then cat sum, you should see that sum has not changed.
    – Arch Stanton
    Aug 26 at 18:52
















"while … > sum never overwrites sum" -- can you elaborate on that? What command, exactly, never overwrites sum?
– ilkkachu
Aug 26 at 18:49




"while … > sum never overwrites sum" -- can you elaborate on that? What command, exactly, never overwrites sum?
– ilkkachu
Aug 26 at 18:49












paste out sum | awk '$1 += $3; $2 += $4; NF = 2; print' > sum. If you try that and then cat sum, you should see that sum has not changed.
– Arch Stanton
Aug 26 at 18:52





paste out sum | awk '$1 += $3; $2 += $4; NF = 2; print' > sum. If you try that and then cat sum, you should see that sum has not changed.
– Arch Stanton
Aug 26 at 18:52











1 Answer
1






active

oldest

votes

















up vote
6
down vote



accepted










This,



paste out sum | awk ... | tee sum


has a race condition. paste opens sum to read it, and tee opens it for writing, truncating it. The shell starts both at approximately the same time, so it's up to chance which one gets to open the file first.



Of course in practice, the shell has to start the utilities one at a time, in some particular order. It probably does that from left to right, so paste might have a better chance of going first, but that's an implementation detail, and in any case the OS scheduler decides what runs when.



If paste gets to go first, it opens the file with the data still intact, and probably has enough time to read the data too. If tee gets to open the file before paste has read it, then paste sees an empty file instead.



Here,



paste out sum | awk ... > sum


The shell opens sum for writing, truncating it. It might do that in parallel to starting paste, but since truncating sum doesn't involve starting another utility, it probably happens first. (I'm not exactly sure if there's a rule about the order of processing redirections and starting the commands in a pipeline like this, but I wouldn't count on it.)



There's a tool called sponge to fix this issue (and a dozen questions about it). It collects the input it gets and only writes it after the input is closed. This should have sum updated correctly, always:



paste out sum | awk ... | sponge sum





share|improve this answer






















  • Interesting, thanks! By the way, I guess with "If sum gets to open the file before paste" you really meant "If tee gets to open..."
    – Arch Stanton
    Aug 26 at 21:50











  • @ArchStanton, yep, exactly. Thanks.
    – ilkkachu
    Aug 26 at 22:06










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f464943%2fexplanation-of-unpredictable-behaviour-of-tee%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
6
down vote



accepted










This,



paste out sum | awk ... | tee sum


has a race condition. paste opens sum to read it, and tee opens it for writing, truncating it. The shell starts both at approximately the same time, so it's up to chance which one gets to open the file first.



Of course in practice, the shell has to start the utilities one at a time, in some particular order. It probably does that from left to right, so paste might have a better chance of going first, but that's an implementation detail, and in any case the OS scheduler decides what runs when.



If paste gets to go first, it opens the file with the data still intact, and probably has enough time to read the data too. If tee gets to open the file before paste has read it, then paste sees an empty file instead.



Here,



paste out sum | awk ... > sum


The shell opens sum for writing, truncating it. It might do that in parallel to starting paste, but since truncating sum doesn't involve starting another utility, it probably happens first. (I'm not exactly sure if there's a rule about the order of processing redirections and starting the commands in a pipeline like this, but I wouldn't count on it.)



There's a tool called sponge to fix this issue (and a dozen questions about it). It collects the input it gets and only writes it after the input is closed. This should have sum updated correctly, always:



paste out sum | awk ... | sponge sum





share|improve this answer






















  • Interesting, thanks! By the way, I guess with "If sum gets to open the file before paste" you really meant "If tee gets to open..."
    – Arch Stanton
    Aug 26 at 21:50











  • @ArchStanton, yep, exactly. Thanks.
    – ilkkachu
    Aug 26 at 22:06














up vote
6
down vote



accepted










This,



paste out sum | awk ... | tee sum


has a race condition. paste opens sum to read it, and tee opens it for writing, truncating it. The shell starts both at approximately the same time, so it's up to chance which one gets to open the file first.



Of course in practice, the shell has to start the utilities one at a time, in some particular order. It probably does that from left to right, so paste might have a better chance of going first, but that's an implementation detail, and in any case the OS scheduler decides what runs when.



If paste gets to go first, it opens the file with the data still intact, and probably has enough time to read the data too. If tee gets to open the file before paste has read it, then paste sees an empty file instead.



Here,



paste out sum | awk ... > sum


The shell opens sum for writing, truncating it. It might do that in parallel to starting paste, but since truncating sum doesn't involve starting another utility, it probably happens first. (I'm not exactly sure if there's a rule about the order of processing redirections and starting the commands in a pipeline like this, but I wouldn't count on it.)



There's a tool called sponge to fix this issue (and a dozen questions about it). It collects the input it gets and only writes it after the input is closed. This should have sum updated correctly, always:



paste out sum | awk ... | sponge sum





share|improve this answer






















  • Interesting, thanks! By the way, I guess with "If sum gets to open the file before paste" you really meant "If tee gets to open..."
    – Arch Stanton
    Aug 26 at 21:50











  • @ArchStanton, yep, exactly. Thanks.
    – ilkkachu
    Aug 26 at 22:06












up vote
6
down vote



accepted







up vote
6
down vote



accepted






This,



paste out sum | awk ... | tee sum


has a race condition. paste opens sum to read it, and tee opens it for writing, truncating it. The shell starts both at approximately the same time, so it's up to chance which one gets to open the file first.



Of course in practice, the shell has to start the utilities one at a time, in some particular order. It probably does that from left to right, so paste might have a better chance of going first, but that's an implementation detail, and in any case the OS scheduler decides what runs when.



If paste gets to go first, it opens the file with the data still intact, and probably has enough time to read the data too. If tee gets to open the file before paste has read it, then paste sees an empty file instead.



Here,



paste out sum | awk ... > sum


The shell opens sum for writing, truncating it. It might do that in parallel to starting paste, but since truncating sum doesn't involve starting another utility, it probably happens first. (I'm not exactly sure if there's a rule about the order of processing redirections and starting the commands in a pipeline like this, but I wouldn't count on it.)



There's a tool called sponge to fix this issue (and a dozen questions about it). It collects the input it gets and only writes it after the input is closed. This should have sum updated correctly, always:



paste out sum | awk ... | sponge sum





share|improve this answer














This,



paste out sum | awk ... | tee sum


has a race condition. paste opens sum to read it, and tee opens it for writing, truncating it. The shell starts both at approximately the same time, so it's up to chance which one gets to open the file first.



Of course in practice, the shell has to start the utilities one at a time, in some particular order. It probably does that from left to right, so paste might have a better chance of going first, but that's an implementation detail, and in any case the OS scheduler decides what runs when.



If paste gets to go first, it opens the file with the data still intact, and probably has enough time to read the data too. If tee gets to open the file before paste has read it, then paste sees an empty file instead.



Here,



paste out sum | awk ... > sum


The shell opens sum for writing, truncating it. It might do that in parallel to starting paste, but since truncating sum doesn't involve starting another utility, it probably happens first. (I'm not exactly sure if there's a rule about the order of processing redirections and starting the commands in a pipeline like this, but I wouldn't count on it.)



There's a tool called sponge to fix this issue (and a dozen questions about it). It collects the input it gets and only writes it after the input is closed. This should have sum updated correctly, always:



paste out sum | awk ... | sponge sum






share|improve this answer














share|improve this answer



share|improve this answer








edited Aug 26 at 22:04

























answered Aug 26 at 18:52









ilkkachu

51.3k678141




51.3k678141











  • Interesting, thanks! By the way, I guess with "If sum gets to open the file before paste" you really meant "If tee gets to open..."
    – Arch Stanton
    Aug 26 at 21:50











  • @ArchStanton, yep, exactly. Thanks.
    – ilkkachu
    Aug 26 at 22:06
















  • Interesting, thanks! By the way, I guess with "If sum gets to open the file before paste" you really meant "If tee gets to open..."
    – Arch Stanton
    Aug 26 at 21:50











  • @ArchStanton, yep, exactly. Thanks.
    – ilkkachu
    Aug 26 at 22:06















Interesting, thanks! By the way, I guess with "If sum gets to open the file before paste" you really meant "If tee gets to open..."
– Arch Stanton
Aug 26 at 21:50





Interesting, thanks! By the way, I guess with "If sum gets to open the file before paste" you really meant "If tee gets to open..."
– Arch Stanton
Aug 26 at 21:50













@ArchStanton, yep, exactly. Thanks.
– ilkkachu
Aug 26 at 22:06




@ArchStanton, yep, exactly. Thanks.
– ilkkachu
Aug 26 at 22:06

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f464943%2fexplanation-of-unpredictable-behaviour-of-tee%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?