Causes for inefficient I/O?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












I'm rsync'ing a huge folder from an external to an internal 3,5" HDD, both 5.400 rpm. When using dstat to have a look at the current throughput, I regularly see patterns like this:



--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
20 6 34 40 0| 98M 90M|1272B 3652B| 0 0 |1702 4455
21 6 37 37 0| 121M 90M|1646B 4678B| 0 0 |2057 6488
17 24 29 30 0| 77M 95M| 630B 2416B| 0 0 |1581 4644
20 5 33 43 0| 86M 84M|1372B 2980B| 0 0 |1560 4146
20 6 30 44 0| 80M 75M| 700B 2722B| 0 0 |1484 3942
11 2 47 39 0| 39M 65M| 642B 1332B| 0 0 | 765 1507
0 0 50 50 0| 0 91M| 70B 354B| 0 0 | 136 70
0 0 50 49 0| 0 71M| 306B 346B| 0 0 | 146 119
0 0 50 50 0| 0 83M| 70B 346B| 0 0 | 145 60
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 36 84
0 0 50 50 0| 0 0 | 164B 646B| 0 0 | 35 71
0 0 50 50 0| 0 0 | 140B 802B| 0 0 | 30 64
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 27 68
0 0 50 50 0| 0 34M| 134B 346B| 0 0 | 86 68
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 30 71
0 0 50 50 0| 0 0 |2320B 346B| 0 0 | 40 76
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 29 71
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 25 50
-----------------------------[snip]------------------------------
0 0 50 50 0| 0 0 |2230B 346B| 0 0 | 35 61
0 0 50 50 0| 0 60M| 70B 346B| 0 0 | 118 83
1 7 42 50 0| 256k 104M| 230B 500B| 0 0 | 281 480
21 5 31 42 0| 117M 76M|1120B 3392B| 0 0 |1849 4309
23 5 36 36 0| 137M 56M|1202B 3958B| 0 0 |2149 5782
24 5 36 35 0| 138M 100M|1284B 4112B| 0 0 |2174 6021


Say, for several seconds up to a minute, both read and write throughput drop to zero. What's the bottleneck here?



I mean, since both drives are about the same speed, none of them should be idle for too long. Even further, at least one drive should be always reading or writing. What is the system waiting for?



System is idle, only thing eating cpu is the rsync task. Memory is 8GB, CPU is a 7th-gen i5 quad-core. The internal HDD is hooked via SATA to the mainboard, a Gigabyte G170X-Ultra Gaming. Filesystem is ext4 in both cases, encrypted with dmcrypt/LUKS on the internal (write) side. Might that be the cause? If so, how to check the performance of dmcrypt? I see, CPU is 50% idle 50% waiting when the transfer drops occur. What may I conclude from that?



It's an up-to-date-Archlinux with kernel version 4.13.11-1-ARCH. Anything to look out for? Thanks in advance.



UPDATE: iotop was pointed out to be more accurate than dstat. Unfortunately, iotop shows zero thoughput as well when dstat drops to zero. I've done a screencast to show it.







share|improve this question






















  • I think your external hdd is not in the best condition.
    – Ipor Sircer
    Nov 4 '17 at 21:39










  • You can try to grab a stack trace of your rsync process during an IO pause to get some data on what it's trying to do at that time. FWIW, rsync is an extremely inefficient way to transfer data - it has to do a lot of IO operations just to identify what needs to be synced before even starting to copy data. Something like ZFS send/receive of incremental snapshots is much more efficient because the filesystem keeps track of changes as they are made - there's no crawling two entire filesystems trying to find what needs to be sent.
    – Andrew Henle
    Nov 5 '17 at 14:36














up vote
1
down vote

favorite












I'm rsync'ing a huge folder from an external to an internal 3,5" HDD, both 5.400 rpm. When using dstat to have a look at the current throughput, I regularly see patterns like this:



--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
20 6 34 40 0| 98M 90M|1272B 3652B| 0 0 |1702 4455
21 6 37 37 0| 121M 90M|1646B 4678B| 0 0 |2057 6488
17 24 29 30 0| 77M 95M| 630B 2416B| 0 0 |1581 4644
20 5 33 43 0| 86M 84M|1372B 2980B| 0 0 |1560 4146
20 6 30 44 0| 80M 75M| 700B 2722B| 0 0 |1484 3942
11 2 47 39 0| 39M 65M| 642B 1332B| 0 0 | 765 1507
0 0 50 50 0| 0 91M| 70B 354B| 0 0 | 136 70
0 0 50 49 0| 0 71M| 306B 346B| 0 0 | 146 119
0 0 50 50 0| 0 83M| 70B 346B| 0 0 | 145 60
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 36 84
0 0 50 50 0| 0 0 | 164B 646B| 0 0 | 35 71
0 0 50 50 0| 0 0 | 140B 802B| 0 0 | 30 64
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 27 68
0 0 50 50 0| 0 34M| 134B 346B| 0 0 | 86 68
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 30 71
0 0 50 50 0| 0 0 |2320B 346B| 0 0 | 40 76
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 29 71
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 25 50
-----------------------------[snip]------------------------------
0 0 50 50 0| 0 0 |2230B 346B| 0 0 | 35 61
0 0 50 50 0| 0 60M| 70B 346B| 0 0 | 118 83
1 7 42 50 0| 256k 104M| 230B 500B| 0 0 | 281 480
21 5 31 42 0| 117M 76M|1120B 3392B| 0 0 |1849 4309
23 5 36 36 0| 137M 56M|1202B 3958B| 0 0 |2149 5782
24 5 36 35 0| 138M 100M|1284B 4112B| 0 0 |2174 6021


Say, for several seconds up to a minute, both read and write throughput drop to zero. What's the bottleneck here?



I mean, since both drives are about the same speed, none of them should be idle for too long. Even further, at least one drive should be always reading or writing. What is the system waiting for?



System is idle, only thing eating cpu is the rsync task. Memory is 8GB, CPU is a 7th-gen i5 quad-core. The internal HDD is hooked via SATA to the mainboard, a Gigabyte G170X-Ultra Gaming. Filesystem is ext4 in both cases, encrypted with dmcrypt/LUKS on the internal (write) side. Might that be the cause? If so, how to check the performance of dmcrypt? I see, CPU is 50% idle 50% waiting when the transfer drops occur. What may I conclude from that?



It's an up-to-date-Archlinux with kernel version 4.13.11-1-ARCH. Anything to look out for? Thanks in advance.



UPDATE: iotop was pointed out to be more accurate than dstat. Unfortunately, iotop shows zero thoughput as well when dstat drops to zero. I've done a screencast to show it.







share|improve this question






















  • I think your external hdd is not in the best condition.
    – Ipor Sircer
    Nov 4 '17 at 21:39










  • You can try to grab a stack trace of your rsync process during an IO pause to get some data on what it's trying to do at that time. FWIW, rsync is an extremely inefficient way to transfer data - it has to do a lot of IO operations just to identify what needs to be synced before even starting to copy data. Something like ZFS send/receive of incremental snapshots is much more efficient because the filesystem keeps track of changes as they are made - there's no crawling two entire filesystems trying to find what needs to be sent.
    – Andrew Henle
    Nov 5 '17 at 14:36












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I'm rsync'ing a huge folder from an external to an internal 3,5" HDD, both 5.400 rpm. When using dstat to have a look at the current throughput, I regularly see patterns like this:



--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
20 6 34 40 0| 98M 90M|1272B 3652B| 0 0 |1702 4455
21 6 37 37 0| 121M 90M|1646B 4678B| 0 0 |2057 6488
17 24 29 30 0| 77M 95M| 630B 2416B| 0 0 |1581 4644
20 5 33 43 0| 86M 84M|1372B 2980B| 0 0 |1560 4146
20 6 30 44 0| 80M 75M| 700B 2722B| 0 0 |1484 3942
11 2 47 39 0| 39M 65M| 642B 1332B| 0 0 | 765 1507
0 0 50 50 0| 0 91M| 70B 354B| 0 0 | 136 70
0 0 50 49 0| 0 71M| 306B 346B| 0 0 | 146 119
0 0 50 50 0| 0 83M| 70B 346B| 0 0 | 145 60
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 36 84
0 0 50 50 0| 0 0 | 164B 646B| 0 0 | 35 71
0 0 50 50 0| 0 0 | 140B 802B| 0 0 | 30 64
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 27 68
0 0 50 50 0| 0 34M| 134B 346B| 0 0 | 86 68
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 30 71
0 0 50 50 0| 0 0 |2320B 346B| 0 0 | 40 76
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 29 71
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 25 50
-----------------------------[snip]------------------------------
0 0 50 50 0| 0 0 |2230B 346B| 0 0 | 35 61
0 0 50 50 0| 0 60M| 70B 346B| 0 0 | 118 83
1 7 42 50 0| 256k 104M| 230B 500B| 0 0 | 281 480
21 5 31 42 0| 117M 76M|1120B 3392B| 0 0 |1849 4309
23 5 36 36 0| 137M 56M|1202B 3958B| 0 0 |2149 5782
24 5 36 35 0| 138M 100M|1284B 4112B| 0 0 |2174 6021


Say, for several seconds up to a minute, both read and write throughput drop to zero. What's the bottleneck here?



I mean, since both drives are about the same speed, none of them should be idle for too long. Even further, at least one drive should be always reading or writing. What is the system waiting for?



System is idle, only thing eating cpu is the rsync task. Memory is 8GB, CPU is a 7th-gen i5 quad-core. The internal HDD is hooked via SATA to the mainboard, a Gigabyte G170X-Ultra Gaming. Filesystem is ext4 in both cases, encrypted with dmcrypt/LUKS on the internal (write) side. Might that be the cause? If so, how to check the performance of dmcrypt? I see, CPU is 50% idle 50% waiting when the transfer drops occur. What may I conclude from that?



It's an up-to-date-Archlinux with kernel version 4.13.11-1-ARCH. Anything to look out for? Thanks in advance.



UPDATE: iotop was pointed out to be more accurate than dstat. Unfortunately, iotop shows zero thoughput as well when dstat drops to zero. I've done a screencast to show it.







share|improve this question














I'm rsync'ing a huge folder from an external to an internal 3,5" HDD, both 5.400 rpm. When using dstat to have a look at the current throughput, I regularly see patterns like this:



--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
20 6 34 40 0| 98M 90M|1272B 3652B| 0 0 |1702 4455
21 6 37 37 0| 121M 90M|1646B 4678B| 0 0 |2057 6488
17 24 29 30 0| 77M 95M| 630B 2416B| 0 0 |1581 4644
20 5 33 43 0| 86M 84M|1372B 2980B| 0 0 |1560 4146
20 6 30 44 0| 80M 75M| 700B 2722B| 0 0 |1484 3942
11 2 47 39 0| 39M 65M| 642B 1332B| 0 0 | 765 1507
0 0 50 50 0| 0 91M| 70B 354B| 0 0 | 136 70
0 0 50 49 0| 0 71M| 306B 346B| 0 0 | 146 119
0 0 50 50 0| 0 83M| 70B 346B| 0 0 | 145 60
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 36 84
0 0 50 50 0| 0 0 | 164B 646B| 0 0 | 35 71
0 0 50 50 0| 0 0 | 140B 802B| 0 0 | 30 64
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 27 68
0 0 50 50 0| 0 34M| 134B 346B| 0 0 | 86 68
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 30 71
0 0 50 50 0| 0 0 |2320B 346B| 0 0 | 40 76
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 29 71
0 0 50 50 0| 0 0 | 70B 346B| 0 0 | 25 50
-----------------------------[snip]------------------------------
0 0 50 50 0| 0 0 |2230B 346B| 0 0 | 35 61
0 0 50 50 0| 0 60M| 70B 346B| 0 0 | 118 83
1 7 42 50 0| 256k 104M| 230B 500B| 0 0 | 281 480
21 5 31 42 0| 117M 76M|1120B 3392B| 0 0 |1849 4309
23 5 36 36 0| 137M 56M|1202B 3958B| 0 0 |2149 5782
24 5 36 35 0| 138M 100M|1284B 4112B| 0 0 |2174 6021


Say, for several seconds up to a minute, both read and write throughput drop to zero. What's the bottleneck here?



I mean, since both drives are about the same speed, none of them should be idle for too long. Even further, at least one drive should be always reading or writing. What is the system waiting for?



System is idle, only thing eating cpu is the rsync task. Memory is 8GB, CPU is a 7th-gen i5 quad-core. The internal HDD is hooked via SATA to the mainboard, a Gigabyte G170X-Ultra Gaming. Filesystem is ext4 in both cases, encrypted with dmcrypt/LUKS on the internal (write) side. Might that be the cause? If so, how to check the performance of dmcrypt? I see, CPU is 50% idle 50% waiting when the transfer drops occur. What may I conclude from that?



It's an up-to-date-Archlinux with kernel version 4.13.11-1-ARCH. Anything to look out for? Thanks in advance.



UPDATE: iotop was pointed out to be more accurate than dstat. Unfortunately, iotop shows zero thoughput as well when dstat drops to zero. I've done a screencast to show it.









share|improve this question













share|improve this question




share|improve this question








edited Nov 4 '17 at 22:07

























asked Nov 4 '17 at 20:12









LukeLR

1305




1305











  • I think your external hdd is not in the best condition.
    – Ipor Sircer
    Nov 4 '17 at 21:39










  • You can try to grab a stack trace of your rsync process during an IO pause to get some data on what it's trying to do at that time. FWIW, rsync is an extremely inefficient way to transfer data - it has to do a lot of IO operations just to identify what needs to be synced before even starting to copy data. Something like ZFS send/receive of incremental snapshots is much more efficient because the filesystem keeps track of changes as they are made - there's no crawling two entire filesystems trying to find what needs to be sent.
    – Andrew Henle
    Nov 5 '17 at 14:36
















  • I think your external hdd is not in the best condition.
    – Ipor Sircer
    Nov 4 '17 at 21:39










  • You can try to grab a stack trace of your rsync process during an IO pause to get some data on what it's trying to do at that time. FWIW, rsync is an extremely inefficient way to transfer data - it has to do a lot of IO operations just to identify what needs to be synced before even starting to copy data. Something like ZFS send/receive of incremental snapshots is much more efficient because the filesystem keeps track of changes as they are made - there's no crawling two entire filesystems trying to find what needs to be sent.
    – Andrew Henle
    Nov 5 '17 at 14:36















I think your external hdd is not in the best condition.
– Ipor Sircer
Nov 4 '17 at 21:39




I think your external hdd is not in the best condition.
– Ipor Sircer
Nov 4 '17 at 21:39












You can try to grab a stack trace of your rsync process during an IO pause to get some data on what it's trying to do at that time. FWIW, rsync is an extremely inefficient way to transfer data - it has to do a lot of IO operations just to identify what needs to be synced before even starting to copy data. Something like ZFS send/receive of incremental snapshots is much more efficient because the filesystem keeps track of changes as they are made - there's no crawling two entire filesystems trying to find what needs to be sent.
– Andrew Henle
Nov 5 '17 at 14:36




You can try to grab a stack trace of your rsync process during an IO pause to get some data on what it's trying to do at that time. FWIW, rsync is an extremely inefficient way to transfer data - it has to do a lot of IO operations just to identify what needs to be synced before even starting to copy data. Something like ZFS send/receive of incremental snapshots is much more efficient because the filesystem keeps track of changes as they are made - there's no crawling two entire filesystems trying to find what needs to be sent.
– Andrew Henle
Nov 5 '17 at 14:36










2 Answers
2






active

oldest

votes

















up vote
4
down vote



accepted










There are 2 sets of tools to get some block-level device statistics. The first is iolatency from Brendan Gregg's perf tools. It produces a simple histogram of disk operation latency such as:



>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 1913 |######################################|
1 -> 2 : 438 |######### |
2 -> 4 : 100 |## |
4 -> 8 : 145 |### |
8 -> 16 : 43 |# |
16 -> 32 : 43 |# |
32 -> 64 : 1 |# |


Another script in the toolset is iosnoop which shows commands and their operations, eg:



COMM PID TYPE DEV BLOCK BYTES LATms
/usr/bin/mon 31456 R 8,0 9741888 4096 2.14
/usr/bin/mon 31456 R 8,0 9751408 4096 0.16
/usr/bin/mon 31456 R 8,0 20022728 4096 1.44
/usr/bin/mon 31456 R 8,0 19851752 4096 0.26
jbd2/sda3-41 416 WS 8,0 130618232 65536 1.89
jbd2/sda3-41 416 WS 8,0 209996928 65536 1.92
jbd2/sda3-41 416 WS 8,0 210006528 8192 1.94


Then there is the blktrace package which records low-level block operations with blktrace and then shows all sorts of information with blkparse, and many other commands, including the simple summary from btt (pdf user guide):



$ sudo blktrace /dev/sda # ^C to stop
=== sda ===
CPU 0: 180 events, 9 KiB data
CPU 1: 1958 events, 92 KiB data
Total: 2138 events (dropped 0), 101 KiB data
$ ls -ltra # one file per cpu
-rw-r--r-- 1 root root 8640 Nov 5 10:16 sda.blktrace.0
-rw-r--r-- 1 root root 93992 Nov 5 10:16 sda.blktrace.1
$ blkparse -O -d combined.output sda.blktrace.* # combine cpus
$ btt -i combined.output
ALL MIN AVG MAX N
Q2Q 0.000001053 0.106888548 6.376503027 253
Q2G 0.000000795 0.000002266 0.000011060 184
G2I 0.000000874 0.000979485 0.002588781 328
Q2M 0.000000331 0.000000599 0.000002716 70
I2D 0.000000393 0.000480112 0.002435491 328
M2D 0.000002044 0.000028418 0.000126845 70
D2C 0.000080986 0.001925224 0.010111418 254
Q2C 0.000087025 0.002603157 0.010120629 254
...


D2C, for example, is how long it takes the hardware device to do an operation.



You might also run sudo smartctl -a /dev/sda on each disc to see if there are any failures showing.






share|improve this answer





























    up vote
    1
    down vote













    I guess that dstat uses I/O statistics on the file descriptor level i.e. the application calls write(), and as soon as the syscall returns the data which dstat sees increases.



    But that does not mean that the data has actually been written. I guess that these seeming pauses are the phases in which the buffers are written to the block device. It makes sense that during these times the I/O wait value is even higher than during the phases in which dstat measures data transfer.



    iotop tells apart writes to and reads from the disk and the cache. Maybe that tool can deliver interesing additional information.






    share|improve this answer




















    • Thanks for your reply! Unfortunately, iotop shows the same drops as dstat, see above...
      – LukeLR
      Nov 4 '17 at 22:08










    • @LukeLR You misunderstood me. When I/O blocks then neither dstat nor iotop should see a change. I meant: Before the pauses iotop should show that a lot of the writing is done to the cache.
      – Hauke Laging
      Nov 4 '17 at 22:19










    • Okay, thanks. But is there a way to confirm if the disk is still active during these zero-periods, or if it is waiting for some other bottleneck?
      – LukeLR
      Nov 4 '17 at 23:33











    • @LukeLR Unfortunately I do not know that. Maybe with the tools from the other answer. But another thought: If the buffers are being flushed during a pause then you would expect the size of the buffers (e.g. in top) to change.
      – Hauke Laging
      Nov 5 '17 at 9:46










    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f402545%2fcauses-for-inefficient-i-o%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    4
    down vote



    accepted










    There are 2 sets of tools to get some block-level device statistics. The first is iolatency from Brendan Gregg's perf tools. It produces a simple histogram of disk operation latency such as:



    >=(ms) .. <(ms) : I/O |Distribution |
    0 -> 1 : 1913 |######################################|
    1 -> 2 : 438 |######### |
    2 -> 4 : 100 |## |
    4 -> 8 : 145 |### |
    8 -> 16 : 43 |# |
    16 -> 32 : 43 |# |
    32 -> 64 : 1 |# |


    Another script in the toolset is iosnoop which shows commands and their operations, eg:



    COMM PID TYPE DEV BLOCK BYTES LATms
    /usr/bin/mon 31456 R 8,0 9741888 4096 2.14
    /usr/bin/mon 31456 R 8,0 9751408 4096 0.16
    /usr/bin/mon 31456 R 8,0 20022728 4096 1.44
    /usr/bin/mon 31456 R 8,0 19851752 4096 0.26
    jbd2/sda3-41 416 WS 8,0 130618232 65536 1.89
    jbd2/sda3-41 416 WS 8,0 209996928 65536 1.92
    jbd2/sda3-41 416 WS 8,0 210006528 8192 1.94


    Then there is the blktrace package which records low-level block operations with blktrace and then shows all sorts of information with blkparse, and many other commands, including the simple summary from btt (pdf user guide):



    $ sudo blktrace /dev/sda # ^C to stop
    === sda ===
    CPU 0: 180 events, 9 KiB data
    CPU 1: 1958 events, 92 KiB data
    Total: 2138 events (dropped 0), 101 KiB data
    $ ls -ltra # one file per cpu
    -rw-r--r-- 1 root root 8640 Nov 5 10:16 sda.blktrace.0
    -rw-r--r-- 1 root root 93992 Nov 5 10:16 sda.blktrace.1
    $ blkparse -O -d combined.output sda.blktrace.* # combine cpus
    $ btt -i combined.output
    ALL MIN AVG MAX N
    Q2Q 0.000001053 0.106888548 6.376503027 253
    Q2G 0.000000795 0.000002266 0.000011060 184
    G2I 0.000000874 0.000979485 0.002588781 328
    Q2M 0.000000331 0.000000599 0.000002716 70
    I2D 0.000000393 0.000480112 0.002435491 328
    M2D 0.000002044 0.000028418 0.000126845 70
    D2C 0.000080986 0.001925224 0.010111418 254
    Q2C 0.000087025 0.002603157 0.010120629 254
    ...


    D2C, for example, is how long it takes the hardware device to do an operation.



    You might also run sudo smartctl -a /dev/sda on each disc to see if there are any failures showing.






    share|improve this answer


























      up vote
      4
      down vote



      accepted










      There are 2 sets of tools to get some block-level device statistics. The first is iolatency from Brendan Gregg's perf tools. It produces a simple histogram of disk operation latency such as:



      >=(ms) .. <(ms) : I/O |Distribution |
      0 -> 1 : 1913 |######################################|
      1 -> 2 : 438 |######### |
      2 -> 4 : 100 |## |
      4 -> 8 : 145 |### |
      8 -> 16 : 43 |# |
      16 -> 32 : 43 |# |
      32 -> 64 : 1 |# |


      Another script in the toolset is iosnoop which shows commands and their operations, eg:



      COMM PID TYPE DEV BLOCK BYTES LATms
      /usr/bin/mon 31456 R 8,0 9741888 4096 2.14
      /usr/bin/mon 31456 R 8,0 9751408 4096 0.16
      /usr/bin/mon 31456 R 8,0 20022728 4096 1.44
      /usr/bin/mon 31456 R 8,0 19851752 4096 0.26
      jbd2/sda3-41 416 WS 8,0 130618232 65536 1.89
      jbd2/sda3-41 416 WS 8,0 209996928 65536 1.92
      jbd2/sda3-41 416 WS 8,0 210006528 8192 1.94


      Then there is the blktrace package which records low-level block operations with blktrace and then shows all sorts of information with blkparse, and many other commands, including the simple summary from btt (pdf user guide):



      $ sudo blktrace /dev/sda # ^C to stop
      === sda ===
      CPU 0: 180 events, 9 KiB data
      CPU 1: 1958 events, 92 KiB data
      Total: 2138 events (dropped 0), 101 KiB data
      $ ls -ltra # one file per cpu
      -rw-r--r-- 1 root root 8640 Nov 5 10:16 sda.blktrace.0
      -rw-r--r-- 1 root root 93992 Nov 5 10:16 sda.blktrace.1
      $ blkparse -O -d combined.output sda.blktrace.* # combine cpus
      $ btt -i combined.output
      ALL MIN AVG MAX N
      Q2Q 0.000001053 0.106888548 6.376503027 253
      Q2G 0.000000795 0.000002266 0.000011060 184
      G2I 0.000000874 0.000979485 0.002588781 328
      Q2M 0.000000331 0.000000599 0.000002716 70
      I2D 0.000000393 0.000480112 0.002435491 328
      M2D 0.000002044 0.000028418 0.000126845 70
      D2C 0.000080986 0.001925224 0.010111418 254
      Q2C 0.000087025 0.002603157 0.010120629 254
      ...


      D2C, for example, is how long it takes the hardware device to do an operation.



      You might also run sudo smartctl -a /dev/sda on each disc to see if there are any failures showing.






      share|improve this answer
























        up vote
        4
        down vote



        accepted







        up vote
        4
        down vote



        accepted






        There are 2 sets of tools to get some block-level device statistics. The first is iolatency from Brendan Gregg's perf tools. It produces a simple histogram of disk operation latency such as:



        >=(ms) .. <(ms) : I/O |Distribution |
        0 -> 1 : 1913 |######################################|
        1 -> 2 : 438 |######### |
        2 -> 4 : 100 |## |
        4 -> 8 : 145 |### |
        8 -> 16 : 43 |# |
        16 -> 32 : 43 |# |
        32 -> 64 : 1 |# |


        Another script in the toolset is iosnoop which shows commands and their operations, eg:



        COMM PID TYPE DEV BLOCK BYTES LATms
        /usr/bin/mon 31456 R 8,0 9741888 4096 2.14
        /usr/bin/mon 31456 R 8,0 9751408 4096 0.16
        /usr/bin/mon 31456 R 8,0 20022728 4096 1.44
        /usr/bin/mon 31456 R 8,0 19851752 4096 0.26
        jbd2/sda3-41 416 WS 8,0 130618232 65536 1.89
        jbd2/sda3-41 416 WS 8,0 209996928 65536 1.92
        jbd2/sda3-41 416 WS 8,0 210006528 8192 1.94


        Then there is the blktrace package which records low-level block operations with blktrace and then shows all sorts of information with blkparse, and many other commands, including the simple summary from btt (pdf user guide):



        $ sudo blktrace /dev/sda # ^C to stop
        === sda ===
        CPU 0: 180 events, 9 KiB data
        CPU 1: 1958 events, 92 KiB data
        Total: 2138 events (dropped 0), 101 KiB data
        $ ls -ltra # one file per cpu
        -rw-r--r-- 1 root root 8640 Nov 5 10:16 sda.blktrace.0
        -rw-r--r-- 1 root root 93992 Nov 5 10:16 sda.blktrace.1
        $ blkparse -O -d combined.output sda.blktrace.* # combine cpus
        $ btt -i combined.output
        ALL MIN AVG MAX N
        Q2Q 0.000001053 0.106888548 6.376503027 253
        Q2G 0.000000795 0.000002266 0.000011060 184
        G2I 0.000000874 0.000979485 0.002588781 328
        Q2M 0.000000331 0.000000599 0.000002716 70
        I2D 0.000000393 0.000480112 0.002435491 328
        M2D 0.000002044 0.000028418 0.000126845 70
        D2C 0.000080986 0.001925224 0.010111418 254
        Q2C 0.000087025 0.002603157 0.010120629 254
        ...


        D2C, for example, is how long it takes the hardware device to do an operation.



        You might also run sudo smartctl -a /dev/sda on each disc to see if there are any failures showing.






        share|improve this answer














        There are 2 sets of tools to get some block-level device statistics. The first is iolatency from Brendan Gregg's perf tools. It produces a simple histogram of disk operation latency such as:



        >=(ms) .. <(ms) : I/O |Distribution |
        0 -> 1 : 1913 |######################################|
        1 -> 2 : 438 |######### |
        2 -> 4 : 100 |## |
        4 -> 8 : 145 |### |
        8 -> 16 : 43 |# |
        16 -> 32 : 43 |# |
        32 -> 64 : 1 |# |


        Another script in the toolset is iosnoop which shows commands and their operations, eg:



        COMM PID TYPE DEV BLOCK BYTES LATms
        /usr/bin/mon 31456 R 8,0 9741888 4096 2.14
        /usr/bin/mon 31456 R 8,0 9751408 4096 0.16
        /usr/bin/mon 31456 R 8,0 20022728 4096 1.44
        /usr/bin/mon 31456 R 8,0 19851752 4096 0.26
        jbd2/sda3-41 416 WS 8,0 130618232 65536 1.89
        jbd2/sda3-41 416 WS 8,0 209996928 65536 1.92
        jbd2/sda3-41 416 WS 8,0 210006528 8192 1.94


        Then there is the blktrace package which records low-level block operations with blktrace and then shows all sorts of information with blkparse, and many other commands, including the simple summary from btt (pdf user guide):



        $ sudo blktrace /dev/sda # ^C to stop
        === sda ===
        CPU 0: 180 events, 9 KiB data
        CPU 1: 1958 events, 92 KiB data
        Total: 2138 events (dropped 0), 101 KiB data
        $ ls -ltra # one file per cpu
        -rw-r--r-- 1 root root 8640 Nov 5 10:16 sda.blktrace.0
        -rw-r--r-- 1 root root 93992 Nov 5 10:16 sda.blktrace.1
        $ blkparse -O -d combined.output sda.blktrace.* # combine cpus
        $ btt -i combined.output
        ALL MIN AVG MAX N
        Q2Q 0.000001053 0.106888548 6.376503027 253
        Q2G 0.000000795 0.000002266 0.000011060 184
        G2I 0.000000874 0.000979485 0.002588781 328
        Q2M 0.000000331 0.000000599 0.000002716 70
        I2D 0.000000393 0.000480112 0.002435491 328
        M2D 0.000002044 0.000028418 0.000126845 70
        D2C 0.000080986 0.001925224 0.010111418 254
        Q2C 0.000087025 0.002603157 0.010120629 254
        ...


        D2C, for example, is how long it takes the hardware device to do an operation.



        You might also run sudo smartctl -a /dev/sda on each disc to see if there are any failures showing.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 5 '17 at 13:43

























        answered Nov 5 '17 at 9:20









        meuh

        29.6k11751




        29.6k11751






















            up vote
            1
            down vote













            I guess that dstat uses I/O statistics on the file descriptor level i.e. the application calls write(), and as soon as the syscall returns the data which dstat sees increases.



            But that does not mean that the data has actually been written. I guess that these seeming pauses are the phases in which the buffers are written to the block device. It makes sense that during these times the I/O wait value is even higher than during the phases in which dstat measures data transfer.



            iotop tells apart writes to and reads from the disk and the cache. Maybe that tool can deliver interesing additional information.






            share|improve this answer




















            • Thanks for your reply! Unfortunately, iotop shows the same drops as dstat, see above...
              – LukeLR
              Nov 4 '17 at 22:08










            • @LukeLR You misunderstood me. When I/O blocks then neither dstat nor iotop should see a change. I meant: Before the pauses iotop should show that a lot of the writing is done to the cache.
              – Hauke Laging
              Nov 4 '17 at 22:19










            • Okay, thanks. But is there a way to confirm if the disk is still active during these zero-periods, or if it is waiting for some other bottleneck?
              – LukeLR
              Nov 4 '17 at 23:33











            • @LukeLR Unfortunately I do not know that. Maybe with the tools from the other answer. But another thought: If the buffers are being flushed during a pause then you would expect the size of the buffers (e.g. in top) to change.
              – Hauke Laging
              Nov 5 '17 at 9:46














            up vote
            1
            down vote













            I guess that dstat uses I/O statistics on the file descriptor level i.e. the application calls write(), and as soon as the syscall returns the data which dstat sees increases.



            But that does not mean that the data has actually been written. I guess that these seeming pauses are the phases in which the buffers are written to the block device. It makes sense that during these times the I/O wait value is even higher than during the phases in which dstat measures data transfer.



            iotop tells apart writes to and reads from the disk and the cache. Maybe that tool can deliver interesing additional information.






            share|improve this answer




















            • Thanks for your reply! Unfortunately, iotop shows the same drops as dstat, see above...
              – LukeLR
              Nov 4 '17 at 22:08










            • @LukeLR You misunderstood me. When I/O blocks then neither dstat nor iotop should see a change. I meant: Before the pauses iotop should show that a lot of the writing is done to the cache.
              – Hauke Laging
              Nov 4 '17 at 22:19










            • Okay, thanks. But is there a way to confirm if the disk is still active during these zero-periods, or if it is waiting for some other bottleneck?
              – LukeLR
              Nov 4 '17 at 23:33











            • @LukeLR Unfortunately I do not know that. Maybe with the tools from the other answer. But another thought: If the buffers are being flushed during a pause then you would expect the size of the buffers (e.g. in top) to change.
              – Hauke Laging
              Nov 5 '17 at 9:46












            up vote
            1
            down vote










            up vote
            1
            down vote









            I guess that dstat uses I/O statistics on the file descriptor level i.e. the application calls write(), and as soon as the syscall returns the data which dstat sees increases.



            But that does not mean that the data has actually been written. I guess that these seeming pauses are the phases in which the buffers are written to the block device. It makes sense that during these times the I/O wait value is even higher than during the phases in which dstat measures data transfer.



            iotop tells apart writes to and reads from the disk and the cache. Maybe that tool can deliver interesing additional information.






            share|improve this answer












            I guess that dstat uses I/O statistics on the file descriptor level i.e. the application calls write(), and as soon as the syscall returns the data which dstat sees increases.



            But that does not mean that the data has actually been written. I guess that these seeming pauses are the phases in which the buffers are written to the block device. It makes sense that during these times the I/O wait value is even higher than during the phases in which dstat measures data transfer.



            iotop tells apart writes to and reads from the disk and the cache. Maybe that tool can deliver interesing additional information.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 4 '17 at 20:28









            Hauke Laging

            53.6k1282130




            53.6k1282130











            • Thanks for your reply! Unfortunately, iotop shows the same drops as dstat, see above...
              – LukeLR
              Nov 4 '17 at 22:08










            • @LukeLR You misunderstood me. When I/O blocks then neither dstat nor iotop should see a change. I meant: Before the pauses iotop should show that a lot of the writing is done to the cache.
              – Hauke Laging
              Nov 4 '17 at 22:19










            • Okay, thanks. But is there a way to confirm if the disk is still active during these zero-periods, or if it is waiting for some other bottleneck?
              – LukeLR
              Nov 4 '17 at 23:33











            • @LukeLR Unfortunately I do not know that. Maybe with the tools from the other answer. But another thought: If the buffers are being flushed during a pause then you would expect the size of the buffers (e.g. in top) to change.
              – Hauke Laging
              Nov 5 '17 at 9:46
















            • Thanks for your reply! Unfortunately, iotop shows the same drops as dstat, see above...
              – LukeLR
              Nov 4 '17 at 22:08










            • @LukeLR You misunderstood me. When I/O blocks then neither dstat nor iotop should see a change. I meant: Before the pauses iotop should show that a lot of the writing is done to the cache.
              – Hauke Laging
              Nov 4 '17 at 22:19










            • Okay, thanks. But is there a way to confirm if the disk is still active during these zero-periods, or if it is waiting for some other bottleneck?
              – LukeLR
              Nov 4 '17 at 23:33











            • @LukeLR Unfortunately I do not know that. Maybe with the tools from the other answer. But another thought: If the buffers are being flushed during a pause then you would expect the size of the buffers (e.g. in top) to change.
              – Hauke Laging
              Nov 5 '17 at 9:46















            Thanks for your reply! Unfortunately, iotop shows the same drops as dstat, see above...
            – LukeLR
            Nov 4 '17 at 22:08




            Thanks for your reply! Unfortunately, iotop shows the same drops as dstat, see above...
            – LukeLR
            Nov 4 '17 at 22:08












            @LukeLR You misunderstood me. When I/O blocks then neither dstat nor iotop should see a change. I meant: Before the pauses iotop should show that a lot of the writing is done to the cache.
            – Hauke Laging
            Nov 4 '17 at 22:19




            @LukeLR You misunderstood me. When I/O blocks then neither dstat nor iotop should see a change. I meant: Before the pauses iotop should show that a lot of the writing is done to the cache.
            – Hauke Laging
            Nov 4 '17 at 22:19












            Okay, thanks. But is there a way to confirm if the disk is still active during these zero-periods, or if it is waiting for some other bottleneck?
            – LukeLR
            Nov 4 '17 at 23:33





            Okay, thanks. But is there a way to confirm if the disk is still active during these zero-periods, or if it is waiting for some other bottleneck?
            – LukeLR
            Nov 4 '17 at 23:33













            @LukeLR Unfortunately I do not know that. Maybe with the tools from the other answer. But another thought: If the buffers are being flushed during a pause then you would expect the size of the buffers (e.g. in top) to change.
            – Hauke Laging
            Nov 5 '17 at 9:46




            @LukeLR Unfortunately I do not know that. Maybe with the tools from the other answer. But another thought: If the buffers are being flushed during a pause then you would expect the size of the buffers (e.g. in top) to change.
            – Hauke Laging
            Nov 5 '17 at 9:46

















             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f402545%2fcauses-for-inefficient-i-o%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?