Why do `time` and `strace -c` differ?

 Clash Royale CLAN TAG#URR8PPP
Clash Royale CLAN TAG#URR8PPP
up vote
4
down vote
favorite
To my understanding, time would record the total amount of time spend in syscalls. Then I would expect the cumulative totals for sys time as reported by time and strace -fc to be the same. But they are wildly different (13.5 vs 0.005). What is happening?
# time php index.php >/dev/null
real 0m16.292s
user 0m2.728s
sys 0m13.548s
# strace -fc php index.php >/dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 83.72 0.000365 0 54845 munmap
 10.09 0.000044 0 36705 mmap
 6.19 0.000027 0 18249 madvise
 0.00 0.000000 0 289 read
 0.00 0.000000 0 119 write
 0.00 0.000000 0 118 3 open
 0.00 0.000000 0 118 close
 0.00 0.000000 0 23 stat
[ cut 0 duration syscalls for brevity ]
100.00 0.000436 110951 82 total
(I've rerun these tests for about 50 times and they both give consistent results)
linux time strace syscalls
add a comment |Â
up vote
4
down vote
favorite
To my understanding, time would record the total amount of time spend in syscalls. Then I would expect the cumulative totals for sys time as reported by time and strace -fc to be the same. But they are wildly different (13.5 vs 0.005). What is happening?
# time php index.php >/dev/null
real 0m16.292s
user 0m2.728s
sys 0m13.548s
# strace -fc php index.php >/dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 83.72 0.000365 0 54845 munmap
 10.09 0.000044 0 36705 mmap
 6.19 0.000027 0 18249 madvise
 0.00 0.000000 0 289 read
 0.00 0.000000 0 119 write
 0.00 0.000000 0 118 3 open
 0.00 0.000000 0 118 close
 0.00 0.000000 0 23 stat
[ cut 0 duration syscalls for brevity ]
100.00 0.000436 110951 82 total
(I've rerun these tests for about 50 times and they both give consistent results)
linux time strace syscalls
 
 
 1
 
 
 
 
 Here- timeis measuring- strace, not- php.
 â Andy Dalton
 Dec 14 '17 at 14:50
 
 
 
 
 
 
 
 
 
 Thanks, however that did not seem to matter much. Updated question to show distinct tests.
 â Willem
 Dec 14 '17 at 15:35
 
 
 
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
To my understanding, time would record the total amount of time spend in syscalls. Then I would expect the cumulative totals for sys time as reported by time and strace -fc to be the same. But they are wildly different (13.5 vs 0.005). What is happening?
# time php index.php >/dev/null
real 0m16.292s
user 0m2.728s
sys 0m13.548s
# strace -fc php index.php >/dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 83.72 0.000365 0 54845 munmap
 10.09 0.000044 0 36705 mmap
 6.19 0.000027 0 18249 madvise
 0.00 0.000000 0 289 read
 0.00 0.000000 0 119 write
 0.00 0.000000 0 118 3 open
 0.00 0.000000 0 118 close
 0.00 0.000000 0 23 stat
[ cut 0 duration syscalls for brevity ]
100.00 0.000436 110951 82 total
(I've rerun these tests for about 50 times and they both give consistent results)
linux time strace syscalls
To my understanding, time would record the total amount of time spend in syscalls. Then I would expect the cumulative totals for sys time as reported by time and strace -fc to be the same. But they are wildly different (13.5 vs 0.005). What is happening?
# time php index.php >/dev/null
real 0m16.292s
user 0m2.728s
sys 0m13.548s
# strace -fc php index.php >/dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 83.72 0.000365 0 54845 munmap
 10.09 0.000044 0 36705 mmap
 6.19 0.000027 0 18249 madvise
 0.00 0.000000 0 289 read
 0.00 0.000000 0 119 write
 0.00 0.000000 0 118 3 open
 0.00 0.000000 0 118 close
 0.00 0.000000 0 23 stat
[ cut 0 duration syscalls for brevity ]
100.00 0.000436 110951 82 total
(I've rerun these tests for about 50 times and they both give consistent results)
linux time strace syscalls
edited Dec 30 '17 at 10:48
asked Dec 14 '17 at 14:09
Willem
1487
1487
 
 
 1
 
 
 
 
 Here- timeis measuring- strace, not- php.
 â Andy Dalton
 Dec 14 '17 at 14:50
 
 
 
 
 
 
 
 
 
 Thanks, however that did not seem to matter much. Updated question to show distinct tests.
 â Willem
 Dec 14 '17 at 15:35
 
 
 
add a comment |Â
 
 
 1
 
 
 
 
 Here- timeis measuring- strace, not- php.
 â Andy Dalton
 Dec 14 '17 at 14:50
 
 
 
 
 
 
 
 
 
 Thanks, however that did not seem to matter much. Updated question to show distinct tests.
 â Willem
 Dec 14 '17 at 15:35
 
 
 
1
1
Here
time is measuring strace, not php.â Andy Dalton
Dec 14 '17 at 14:50
Here
time is measuring strace, not php.â Andy Dalton
Dec 14 '17 at 14:50
Thanks, however that did not seem to matter much. Updated question to show distinct tests.
â Willem
Dec 14 '17 at 15:35
Thanks, however that did not seem to matter much. Updated question to show distinct tests.
â Willem
Dec 14 '17 at 15:35
add a comment |Â
 1 Answer
 1
 
active
oldest
votes
up vote
6
down vote
accepted
A process can spend any amount of time not doing any system call.
For instance a shell doing a while :; do :; done will spend an infinite amount of time not doing any system call and not spending any sys CPU time, only user CPU time.
strace -c tries to count the sys CPU time by each system call. A nanosleep(1000000000) will spend close to 0ms CPU time but 1s wall clock time.
$ bash -c 'time strace -fc sleep 1'
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 [...]
 0.00 0.000000 0 1 nanosleep
 [...]
100.00 0.000000 33 3 total
real 0m1.006s
user 0m0.001s
sys 0m0.003s
Comparing the sys time with the time reported by strace would make more sense but note that not all sys CPU time accounted to the process is spent in those syscalls. Accessing mmaped data for instance can make the process spend a lot of system time without doing any system call.
Here, those numerous mmaps() calls could be to allocate memory. Those are instantaneous, as they just update some mappings. But the first time you write to those, that's where some system time is needed to back those writes with actual memory.
Or they could be to map object files like shared libraries (that's also a likely possibility as the number is not far from the number of open()s). Then again, the mmap() is quick, but reading the memory later does also mean some page faults and actual time to read the data off disk, which is not accounted to any system call.
More fundamentally, if you do time strace -f your-application as in an earlier version of your question, time will time both the command and strace.
strace adds a lot of overhead. It does itself a few system calls for each system call of the straced application.
Doing
strace -fc time your-application
instead of
time strace -fc your-application
is more likely to give you a better match.
What I find though is that for the wait*() system calls that processes make to wait their children, strace counts the time reported by those wait*() system calls as system time, which means that the time of the children processes (at least the ones that are waited for) is counted several times. That matters in strace -f time cmd as time does run cmd in a child process and waits for it.
$ strace -c time head -c 100M /dev/urandom > /dev/null
0.00user 0.76system 0:00.76elapsed 99%CPU (0avgtext+0avgdata 1796maxresident)k
0inputs+0outputs (0major+83minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.762288 762288 1 wait4
 0.00 0.000000 0 1 read
 0.00 0.000000 0 112 write
 0.00 0.000000 0 2 open
 0.00 0.000000 0 2 close
 0.00 0.000000 0 2 fstat
 0.00 0.000000 0 6 mmap
 0.00 0.000000 0 4 mprotect
 0.00 0.000000 0 1 munmap
 0.00 0.000000 0 1 brk
 0.00 0.000000 0 4 rt_sigaction
 0.00 0.000000 0 3 3 access
 0.00 0.000000 0 1 clone
 0.00 0.000000 0 1 execve
 0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.762288 142 3 total
time and strace report the same sys time (as returned by the wait4() system call), but with -f:
$ strace -fc time head -c 100M /dev/urandom > /dev/null
strace: Process 2298 attached
0.01user 1.33system 0:01.91elapsed 69%CPU (0avgtext+0avgdata 1920maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.60 1.331335 1331335 1 wait4
 39.43 0.961497 75 12804 read
 5.94 0.144825 6 25711 write
 0.01 0.000148 11 13 6 open
 0.00 0.000104 8 13 mmap
 0.00 0.000094 19 5 3 execve
 0.00 0.000063 8 8 mprotect
 0.00 0.000050 6 9 close
 0.00 0.000041 7 6 6 access
 0.00 0.000037 5 7 fstat
 0.00 0.000031 16 2 munmap
 0.00 0.000030 8 4 brk
 0.00 0.000007 4 2 arch_prctl
 0.00 0.000006 6 1 1 ioctl
 0.00 0.000000 0 4 rt_sigaction
 0.00 0.000000 0 1 clone
------ ----------- ----------- --------- --------- ----------------
100.00 2.438268 38591 16 total
The 1.33 is the time reported by the one wait4() system call that time does. That reports the sys time of head (time's child).
However, strace also tries to get the system time for every syscall of its head grandchild on top of that, which means that is counted twice (not exactly). That stracing of the grand child does still incur some overhead accounted to the straced process as we get 1.33 as opposed to 0.76 earlier. That is considerably reduced if I force strace, time and head to run on the same processor (with taskset 1).
 
 
 
 
 
 
 Thanks, but my test case (even with split runs) still doesn't make sense. The sum of sys time as reported by- timeis quite off from the sum as reported by- strace.
 â Willem
 Dec 14 '17 at 15:37
 
 
 
 
 
 
 
 
 
 My observed behaviour is that- stracereports less than expected systime, not more. See my updated case where I did split tests (1 with- timeand 1 with- strace -fc)
 â Willem
 Dec 14 '17 at 16:21
 
 
 
 
 
 
 
 
 
 @Willem, it looks like in your case, the time is not spent in system calls. Given the- mmap()s, the fact that quite some time is spent in munmap, it's not surprising.
 â Stéphane Chazelas
 Dec 14 '17 at 16:39
 
 
 
 
 
 1
 
 
 
 
 @Willem, both- timeand- stracewould report correct times, but- timereturns the system time, while- straceonly reports the system time spent in system calls. See also the edit.
 â Stéphane Chazelas
 Dec 14 '17 at 16:55
 
 
 
 
 
 1
 
 
 
 
 @Willem- straceis right in that it is a system call tracer, so it would be wrong if it reported time not spent in system calls. Here, it's useful in that it tells you the time is likely not spent in system calls. You could try- time -v php...(Assuming GNU time, the backslash is to bypass bash's keyword) to see if it's not just allocating too much memory, or inspect the log of- strace -ttTf -o log php...to see between which events most of the time is spent, or instrument the php code. btrace (blktrace) could help telling you which disk IO is accounted to that php process.
 â Stéphane Chazelas
 Dec 14 '17 at 17:31
 
 
 
 
 |Â
show 2 more comments
 1 Answer
 1
 
active
oldest
votes
 1 Answer
 1
 
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
A process can spend any amount of time not doing any system call.
For instance a shell doing a while :; do :; done will spend an infinite amount of time not doing any system call and not spending any sys CPU time, only user CPU time.
strace -c tries to count the sys CPU time by each system call. A nanosleep(1000000000) will spend close to 0ms CPU time but 1s wall clock time.
$ bash -c 'time strace -fc sleep 1'
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 [...]
 0.00 0.000000 0 1 nanosleep
 [...]
100.00 0.000000 33 3 total
real 0m1.006s
user 0m0.001s
sys 0m0.003s
Comparing the sys time with the time reported by strace would make more sense but note that not all sys CPU time accounted to the process is spent in those syscalls. Accessing mmaped data for instance can make the process spend a lot of system time without doing any system call.
Here, those numerous mmaps() calls could be to allocate memory. Those are instantaneous, as they just update some mappings. But the first time you write to those, that's where some system time is needed to back those writes with actual memory.
Or they could be to map object files like shared libraries (that's also a likely possibility as the number is not far from the number of open()s). Then again, the mmap() is quick, but reading the memory later does also mean some page faults and actual time to read the data off disk, which is not accounted to any system call.
More fundamentally, if you do time strace -f your-application as in an earlier version of your question, time will time both the command and strace.
strace adds a lot of overhead. It does itself a few system calls for each system call of the straced application.
Doing
strace -fc time your-application
instead of
time strace -fc your-application
is more likely to give you a better match.
What I find though is that for the wait*() system calls that processes make to wait their children, strace counts the time reported by those wait*() system calls as system time, which means that the time of the children processes (at least the ones that are waited for) is counted several times. That matters in strace -f time cmd as time does run cmd in a child process and waits for it.
$ strace -c time head -c 100M /dev/urandom > /dev/null
0.00user 0.76system 0:00.76elapsed 99%CPU (0avgtext+0avgdata 1796maxresident)k
0inputs+0outputs (0major+83minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.762288 762288 1 wait4
 0.00 0.000000 0 1 read
 0.00 0.000000 0 112 write
 0.00 0.000000 0 2 open
 0.00 0.000000 0 2 close
 0.00 0.000000 0 2 fstat
 0.00 0.000000 0 6 mmap
 0.00 0.000000 0 4 mprotect
 0.00 0.000000 0 1 munmap
 0.00 0.000000 0 1 brk
 0.00 0.000000 0 4 rt_sigaction
 0.00 0.000000 0 3 3 access
 0.00 0.000000 0 1 clone
 0.00 0.000000 0 1 execve
 0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.762288 142 3 total
time and strace report the same sys time (as returned by the wait4() system call), but with -f:
$ strace -fc time head -c 100M /dev/urandom > /dev/null
strace: Process 2298 attached
0.01user 1.33system 0:01.91elapsed 69%CPU (0avgtext+0avgdata 1920maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.60 1.331335 1331335 1 wait4
 39.43 0.961497 75 12804 read
 5.94 0.144825 6 25711 write
 0.01 0.000148 11 13 6 open
 0.00 0.000104 8 13 mmap
 0.00 0.000094 19 5 3 execve
 0.00 0.000063 8 8 mprotect
 0.00 0.000050 6 9 close
 0.00 0.000041 7 6 6 access
 0.00 0.000037 5 7 fstat
 0.00 0.000031 16 2 munmap
 0.00 0.000030 8 4 brk
 0.00 0.000007 4 2 arch_prctl
 0.00 0.000006 6 1 1 ioctl
 0.00 0.000000 0 4 rt_sigaction
 0.00 0.000000 0 1 clone
------ ----------- ----------- --------- --------- ----------------
100.00 2.438268 38591 16 total
The 1.33 is the time reported by the one wait4() system call that time does. That reports the sys time of head (time's child).
However, strace also tries to get the system time for every syscall of its head grandchild on top of that, which means that is counted twice (not exactly). That stracing of the grand child does still incur some overhead accounted to the straced process as we get 1.33 as opposed to 0.76 earlier. That is considerably reduced if I force strace, time and head to run on the same processor (with taskset 1).
 
 
 
 
 
 
 Thanks, but my test case (even with split runs) still doesn't make sense. The sum of sys time as reported by- timeis quite off from the sum as reported by- strace.
 â Willem
 Dec 14 '17 at 15:37
 
 
 
 
 
 
 
 
 
 My observed behaviour is that- stracereports less than expected systime, not more. See my updated case where I did split tests (1 with- timeand 1 with- strace -fc)
 â Willem
 Dec 14 '17 at 16:21
 
 
 
 
 
 
 
 
 
 @Willem, it looks like in your case, the time is not spent in system calls. Given the- mmap()s, the fact that quite some time is spent in munmap, it's not surprising.
 â Stéphane Chazelas
 Dec 14 '17 at 16:39
 
 
 
 
 
 1
 
 
 
 
 @Willem, both- timeand- stracewould report correct times, but- timereturns the system time, while- straceonly reports the system time spent in system calls. See also the edit.
 â Stéphane Chazelas
 Dec 14 '17 at 16:55
 
 
 
 
 
 1
 
 
 
 
 @Willem- straceis right in that it is a system call tracer, so it would be wrong if it reported time not spent in system calls. Here, it's useful in that it tells you the time is likely not spent in system calls. You could try- time -v php...(Assuming GNU time, the backslash is to bypass bash's keyword) to see if it's not just allocating too much memory, or inspect the log of- strace -ttTf -o log php...to see between which events most of the time is spent, or instrument the php code. btrace (blktrace) could help telling you which disk IO is accounted to that php process.
 â Stéphane Chazelas
 Dec 14 '17 at 17:31
 
 
 
 
 |Â
show 2 more comments
up vote
6
down vote
accepted
A process can spend any amount of time not doing any system call.
For instance a shell doing a while :; do :; done will spend an infinite amount of time not doing any system call and not spending any sys CPU time, only user CPU time.
strace -c tries to count the sys CPU time by each system call. A nanosleep(1000000000) will spend close to 0ms CPU time but 1s wall clock time.
$ bash -c 'time strace -fc sleep 1'
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 [...]
 0.00 0.000000 0 1 nanosleep
 [...]
100.00 0.000000 33 3 total
real 0m1.006s
user 0m0.001s
sys 0m0.003s
Comparing the sys time with the time reported by strace would make more sense but note that not all sys CPU time accounted to the process is spent in those syscalls. Accessing mmaped data for instance can make the process spend a lot of system time without doing any system call.
Here, those numerous mmaps() calls could be to allocate memory. Those are instantaneous, as they just update some mappings. But the first time you write to those, that's where some system time is needed to back those writes with actual memory.
Or they could be to map object files like shared libraries (that's also a likely possibility as the number is not far from the number of open()s). Then again, the mmap() is quick, but reading the memory later does also mean some page faults and actual time to read the data off disk, which is not accounted to any system call.
More fundamentally, if you do time strace -f your-application as in an earlier version of your question, time will time both the command and strace.
strace adds a lot of overhead. It does itself a few system calls for each system call of the straced application.
Doing
strace -fc time your-application
instead of
time strace -fc your-application
is more likely to give you a better match.
What I find though is that for the wait*() system calls that processes make to wait their children, strace counts the time reported by those wait*() system calls as system time, which means that the time of the children processes (at least the ones that are waited for) is counted several times. That matters in strace -f time cmd as time does run cmd in a child process and waits for it.
$ strace -c time head -c 100M /dev/urandom > /dev/null
0.00user 0.76system 0:00.76elapsed 99%CPU (0avgtext+0avgdata 1796maxresident)k
0inputs+0outputs (0major+83minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.762288 762288 1 wait4
 0.00 0.000000 0 1 read
 0.00 0.000000 0 112 write
 0.00 0.000000 0 2 open
 0.00 0.000000 0 2 close
 0.00 0.000000 0 2 fstat
 0.00 0.000000 0 6 mmap
 0.00 0.000000 0 4 mprotect
 0.00 0.000000 0 1 munmap
 0.00 0.000000 0 1 brk
 0.00 0.000000 0 4 rt_sigaction
 0.00 0.000000 0 3 3 access
 0.00 0.000000 0 1 clone
 0.00 0.000000 0 1 execve
 0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.762288 142 3 total
time and strace report the same sys time (as returned by the wait4() system call), but with -f:
$ strace -fc time head -c 100M /dev/urandom > /dev/null
strace: Process 2298 attached
0.01user 1.33system 0:01.91elapsed 69%CPU (0avgtext+0avgdata 1920maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.60 1.331335 1331335 1 wait4
 39.43 0.961497 75 12804 read
 5.94 0.144825 6 25711 write
 0.01 0.000148 11 13 6 open
 0.00 0.000104 8 13 mmap
 0.00 0.000094 19 5 3 execve
 0.00 0.000063 8 8 mprotect
 0.00 0.000050 6 9 close
 0.00 0.000041 7 6 6 access
 0.00 0.000037 5 7 fstat
 0.00 0.000031 16 2 munmap
 0.00 0.000030 8 4 brk
 0.00 0.000007 4 2 arch_prctl
 0.00 0.000006 6 1 1 ioctl
 0.00 0.000000 0 4 rt_sigaction
 0.00 0.000000 0 1 clone
------ ----------- ----------- --------- --------- ----------------
100.00 2.438268 38591 16 total
The 1.33 is the time reported by the one wait4() system call that time does. That reports the sys time of head (time's child).
However, strace also tries to get the system time for every syscall of its head grandchild on top of that, which means that is counted twice (not exactly). That stracing of the grand child does still incur some overhead accounted to the straced process as we get 1.33 as opposed to 0.76 earlier. That is considerably reduced if I force strace, time and head to run on the same processor (with taskset 1).
 
 
 
 
 
 
 Thanks, but my test case (even with split runs) still doesn't make sense. The sum of sys time as reported by- timeis quite off from the sum as reported by- strace.
 â Willem
 Dec 14 '17 at 15:37
 
 
 
 
 
 
 
 
 
 My observed behaviour is that- stracereports less than expected systime, not more. See my updated case where I did split tests (1 with- timeand 1 with- strace -fc)
 â Willem
 Dec 14 '17 at 16:21
 
 
 
 
 
 
 
 
 
 @Willem, it looks like in your case, the time is not spent in system calls. Given the- mmap()s, the fact that quite some time is spent in munmap, it's not surprising.
 â Stéphane Chazelas
 Dec 14 '17 at 16:39
 
 
 
 
 
 1
 
 
 
 
 @Willem, both- timeand- stracewould report correct times, but- timereturns the system time, while- straceonly reports the system time spent in system calls. See also the edit.
 â Stéphane Chazelas
 Dec 14 '17 at 16:55
 
 
 
 
 
 1
 
 
 
 
 @Willem- straceis right in that it is a system call tracer, so it would be wrong if it reported time not spent in system calls. Here, it's useful in that it tells you the time is likely not spent in system calls. You could try- time -v php...(Assuming GNU time, the backslash is to bypass bash's keyword) to see if it's not just allocating too much memory, or inspect the log of- strace -ttTf -o log php...to see between which events most of the time is spent, or instrument the php code. btrace (blktrace) could help telling you which disk IO is accounted to that php process.
 â Stéphane Chazelas
 Dec 14 '17 at 17:31
 
 
 
 
 |Â
show 2 more comments
up vote
6
down vote
accepted
up vote
6
down vote
accepted
A process can spend any amount of time not doing any system call.
For instance a shell doing a while :; do :; done will spend an infinite amount of time not doing any system call and not spending any sys CPU time, only user CPU time.
strace -c tries to count the sys CPU time by each system call. A nanosleep(1000000000) will spend close to 0ms CPU time but 1s wall clock time.
$ bash -c 'time strace -fc sleep 1'
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 [...]
 0.00 0.000000 0 1 nanosleep
 [...]
100.00 0.000000 33 3 total
real 0m1.006s
user 0m0.001s
sys 0m0.003s
Comparing the sys time with the time reported by strace would make more sense but note that not all sys CPU time accounted to the process is spent in those syscalls. Accessing mmaped data for instance can make the process spend a lot of system time without doing any system call.
Here, those numerous mmaps() calls could be to allocate memory. Those are instantaneous, as they just update some mappings. But the first time you write to those, that's where some system time is needed to back those writes with actual memory.
Or they could be to map object files like shared libraries (that's also a likely possibility as the number is not far from the number of open()s). Then again, the mmap() is quick, but reading the memory later does also mean some page faults and actual time to read the data off disk, which is not accounted to any system call.
More fundamentally, if you do time strace -f your-application as in an earlier version of your question, time will time both the command and strace.
strace adds a lot of overhead. It does itself a few system calls for each system call of the straced application.
Doing
strace -fc time your-application
instead of
time strace -fc your-application
is more likely to give you a better match.
What I find though is that for the wait*() system calls that processes make to wait their children, strace counts the time reported by those wait*() system calls as system time, which means that the time of the children processes (at least the ones that are waited for) is counted several times. That matters in strace -f time cmd as time does run cmd in a child process and waits for it.
$ strace -c time head -c 100M /dev/urandom > /dev/null
0.00user 0.76system 0:00.76elapsed 99%CPU (0avgtext+0avgdata 1796maxresident)k
0inputs+0outputs (0major+83minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.762288 762288 1 wait4
 0.00 0.000000 0 1 read
 0.00 0.000000 0 112 write
 0.00 0.000000 0 2 open
 0.00 0.000000 0 2 close
 0.00 0.000000 0 2 fstat
 0.00 0.000000 0 6 mmap
 0.00 0.000000 0 4 mprotect
 0.00 0.000000 0 1 munmap
 0.00 0.000000 0 1 brk
 0.00 0.000000 0 4 rt_sigaction
 0.00 0.000000 0 3 3 access
 0.00 0.000000 0 1 clone
 0.00 0.000000 0 1 execve
 0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.762288 142 3 total
time and strace report the same sys time (as returned by the wait4() system call), but with -f:
$ strace -fc time head -c 100M /dev/urandom > /dev/null
strace: Process 2298 attached
0.01user 1.33system 0:01.91elapsed 69%CPU (0avgtext+0avgdata 1920maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.60 1.331335 1331335 1 wait4
 39.43 0.961497 75 12804 read
 5.94 0.144825 6 25711 write
 0.01 0.000148 11 13 6 open
 0.00 0.000104 8 13 mmap
 0.00 0.000094 19 5 3 execve
 0.00 0.000063 8 8 mprotect
 0.00 0.000050 6 9 close
 0.00 0.000041 7 6 6 access
 0.00 0.000037 5 7 fstat
 0.00 0.000031 16 2 munmap
 0.00 0.000030 8 4 brk
 0.00 0.000007 4 2 arch_prctl
 0.00 0.000006 6 1 1 ioctl
 0.00 0.000000 0 4 rt_sigaction
 0.00 0.000000 0 1 clone
------ ----------- ----------- --------- --------- ----------------
100.00 2.438268 38591 16 total
The 1.33 is the time reported by the one wait4() system call that time does. That reports the sys time of head (time's child).
However, strace also tries to get the system time for every syscall of its head grandchild on top of that, which means that is counted twice (not exactly). That stracing of the grand child does still incur some overhead accounted to the straced process as we get 1.33 as opposed to 0.76 earlier. That is considerably reduced if I force strace, time and head to run on the same processor (with taskset 1).
A process can spend any amount of time not doing any system call.
For instance a shell doing a while :; do :; done will spend an infinite amount of time not doing any system call and not spending any sys CPU time, only user CPU time.
strace -c tries to count the sys CPU time by each system call. A nanosleep(1000000000) will spend close to 0ms CPU time but 1s wall clock time.
$ bash -c 'time strace -fc sleep 1'
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 [...]
 0.00 0.000000 0 1 nanosleep
 [...]
100.00 0.000000 33 3 total
real 0m1.006s
user 0m0.001s
sys 0m0.003s
Comparing the sys time with the time reported by strace would make more sense but note that not all sys CPU time accounted to the process is spent in those syscalls. Accessing mmaped data for instance can make the process spend a lot of system time without doing any system call.
Here, those numerous mmaps() calls could be to allocate memory. Those are instantaneous, as they just update some mappings. But the first time you write to those, that's where some system time is needed to back those writes with actual memory.
Or they could be to map object files like shared libraries (that's also a likely possibility as the number is not far from the number of open()s). Then again, the mmap() is quick, but reading the memory later does also mean some page faults and actual time to read the data off disk, which is not accounted to any system call.
More fundamentally, if you do time strace -f your-application as in an earlier version of your question, time will time both the command and strace.
strace adds a lot of overhead. It does itself a few system calls for each system call of the straced application.
Doing
strace -fc time your-application
instead of
time strace -fc your-application
is more likely to give you a better match.
What I find though is that for the wait*() system calls that processes make to wait their children, strace counts the time reported by those wait*() system calls as system time, which means that the time of the children processes (at least the ones that are waited for) is counted several times. That matters in strace -f time cmd as time does run cmd in a child process and waits for it.
$ strace -c time head -c 100M /dev/urandom > /dev/null
0.00user 0.76system 0:00.76elapsed 99%CPU (0avgtext+0avgdata 1796maxresident)k
0inputs+0outputs (0major+83minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.762288 762288 1 wait4
 0.00 0.000000 0 1 read
 0.00 0.000000 0 112 write
 0.00 0.000000 0 2 open
 0.00 0.000000 0 2 close
 0.00 0.000000 0 2 fstat
 0.00 0.000000 0 6 mmap
 0.00 0.000000 0 4 mprotect
 0.00 0.000000 0 1 munmap
 0.00 0.000000 0 1 brk
 0.00 0.000000 0 4 rt_sigaction
 0.00 0.000000 0 3 3 access
 0.00 0.000000 0 1 clone
 0.00 0.000000 0 1 execve
 0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.762288 142 3 total
time and strace report the same sys time (as returned by the wait4() system call), but with -f:
$ strace -fc time head -c 100M /dev/urandom > /dev/null
strace: Process 2298 attached
0.01user 1.33system 0:01.91elapsed 69%CPU (0avgtext+0avgdata 1920maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.60 1.331335 1331335 1 wait4
 39.43 0.961497 75 12804 read
 5.94 0.144825 6 25711 write
 0.01 0.000148 11 13 6 open
 0.00 0.000104 8 13 mmap
 0.00 0.000094 19 5 3 execve
 0.00 0.000063 8 8 mprotect
 0.00 0.000050 6 9 close
 0.00 0.000041 7 6 6 access
 0.00 0.000037 5 7 fstat
 0.00 0.000031 16 2 munmap
 0.00 0.000030 8 4 brk
 0.00 0.000007 4 2 arch_prctl
 0.00 0.000006 6 1 1 ioctl
 0.00 0.000000 0 4 rt_sigaction
 0.00 0.000000 0 1 clone
------ ----------- ----------- --------- --------- ----------------
100.00 2.438268 38591 16 total
The 1.33 is the time reported by the one wait4() system call that time does. That reports the sys time of head (time's child).
However, strace also tries to get the system time for every syscall of its head grandchild on top of that, which means that is counted twice (not exactly). That stracing of the grand child does still incur some overhead accounted to the straced process as we get 1.33 as opposed to 0.76 earlier. That is considerably reduced if I force strace, time and head to run on the same processor (with taskset 1).
edited Dec 14 '17 at 19:12


Jeff Schaller
31.9k848109
31.9k848109
answered Dec 14 '17 at 14:51


Stéphane Chazelas
282k53520854
282k53520854
 
 
 
 
 
 
 Thanks, but my test case (even with split runs) still doesn't make sense. The sum of sys time as reported by- timeis quite off from the sum as reported by- strace.
 â Willem
 Dec 14 '17 at 15:37
 
 
 
 
 
 
 
 
 
 My observed behaviour is that- stracereports less than expected systime, not more. See my updated case where I did split tests (1 with- timeand 1 with- strace -fc)
 â Willem
 Dec 14 '17 at 16:21
 
 
 
 
 
 
 
 
 
 @Willem, it looks like in your case, the time is not spent in system calls. Given the- mmap()s, the fact that quite some time is spent in munmap, it's not surprising.
 â Stéphane Chazelas
 Dec 14 '17 at 16:39
 
 
 
 
 
 1
 
 
 
 
 @Willem, both- timeand- stracewould report correct times, but- timereturns the system time, while- straceonly reports the system time spent in system calls. See also the edit.
 â Stéphane Chazelas
 Dec 14 '17 at 16:55
 
 
 
 
 
 1
 
 
 
 
 @Willem- straceis right in that it is a system call tracer, so it would be wrong if it reported time not spent in system calls. Here, it's useful in that it tells you the time is likely not spent in system calls. You could try- time -v php...(Assuming GNU time, the backslash is to bypass bash's keyword) to see if it's not just allocating too much memory, or inspect the log of- strace -ttTf -o log php...to see between which events most of the time is spent, or instrument the php code. btrace (blktrace) could help telling you which disk IO is accounted to that php process.
 â Stéphane Chazelas
 Dec 14 '17 at 17:31
 
 
 
 
 |Â
show 2 more comments
 
 
 
 
 
 
 Thanks, but my test case (even with split runs) still doesn't make sense. The sum of sys time as reported by- timeis quite off from the sum as reported by- strace.
 â Willem
 Dec 14 '17 at 15:37
 
 
 
 
 
 
 
 
 
 My observed behaviour is that- stracereports less than expected systime, not more. See my updated case where I did split tests (1 with- timeand 1 with- strace -fc)
 â Willem
 Dec 14 '17 at 16:21
 
 
 
 
 
 
 
 
 
 @Willem, it looks like in your case, the time is not spent in system calls. Given the- mmap()s, the fact that quite some time is spent in munmap, it's not surprising.
 â Stéphane Chazelas
 Dec 14 '17 at 16:39
 
 
 
 
 
 1
 
 
 
 
 @Willem, both- timeand- stracewould report correct times, but- timereturns the system time, while- straceonly reports the system time spent in system calls. See also the edit.
 â Stéphane Chazelas
 Dec 14 '17 at 16:55
 
 
 
 
 
 1
 
 
 
 
 @Willem- straceis right in that it is a system call tracer, so it would be wrong if it reported time not spent in system calls. Here, it's useful in that it tells you the time is likely not spent in system calls. You could try- time -v php...(Assuming GNU time, the backslash is to bypass bash's keyword) to see if it's not just allocating too much memory, or inspect the log of- strace -ttTf -o log php...to see between which events most of the time is spent, or instrument the php code. btrace (blktrace) could help telling you which disk IO is accounted to that php process.
 â Stéphane Chazelas
 Dec 14 '17 at 17:31
 
 
 
 
Thanks, but my test case (even with split runs) still doesn't make sense. The sum of sys time as reported by
time is quite off from the sum as reported by strace.â Willem
Dec 14 '17 at 15:37
Thanks, but my test case (even with split runs) still doesn't make sense. The sum of sys time as reported by
time is quite off from the sum as reported by strace.â Willem
Dec 14 '17 at 15:37
My observed behaviour is that
strace reports less than expected systime, not more. See my updated case where I did split tests (1 with time and 1 with strace -fc)â Willem
Dec 14 '17 at 16:21
My observed behaviour is that
strace reports less than expected systime, not more. See my updated case where I did split tests (1 with time and 1 with strace -fc)â Willem
Dec 14 '17 at 16:21
@Willem, it looks like in your case, the time is not spent in system calls. Given the
mmap()s, the fact that quite some time is spent in munmap, it's not surprising.â Stéphane Chazelas
Dec 14 '17 at 16:39
@Willem, it looks like in your case, the time is not spent in system calls. Given the
mmap()s, the fact that quite some time is spent in munmap, it's not surprising.â Stéphane Chazelas
Dec 14 '17 at 16:39
1
1
@Willem, both
time and strace would report correct times, but time returns the system time, while strace only reports the system time spent in system calls. See also the edit.â Stéphane Chazelas
Dec 14 '17 at 16:55
@Willem, both
time and strace would report correct times, but time returns the system time, while strace only reports the system time spent in system calls. See also the edit.â Stéphane Chazelas
Dec 14 '17 at 16:55
1
1
@Willem
strace is right in that it is a system call tracer, so it would be wrong if it reported time not spent in system calls. Here, it's useful in that it tells you the time is likely not spent in system calls. You could try time -v php... (Assuming GNU time, the backslash is to bypass bash's keyword) to see if it's not just allocating too much memory, or inspect the log of strace -ttTf -o log php... to see between which events most of the time is spent, or instrument the php code. btrace (blktrace) could help telling you which disk IO is accounted to that php process.â Stéphane Chazelas
Dec 14 '17 at 17:31
@Willem
strace is right in that it is a system call tracer, so it would be wrong if it reported time not spent in system calls. Here, it's useful in that it tells you the time is likely not spent in system calls. You could try time -v php... (Assuming GNU time, the backslash is to bypass bash's keyword) to see if it's not just allocating too much memory, or inspect the log of strace -ttTf -o log php... to see between which events most of the time is spent, or instrument the php code. btrace (blktrace) could help telling you which disk IO is accounted to that php process.â Stéphane Chazelas
Dec 14 '17 at 17:31
 |Â
show 2 more comments
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f410877%2fwhy-do-time-and-strace-c-differ%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Here
timeis measuringstrace, notphp.â Andy Dalton
Dec 14 '17 at 14:50
Thanks, however that did not seem to matter much. Updated question to show distinct tests.
â Willem
Dec 14 '17 at 15:35