Why were âUSB-stick stallâ problems reported in 2013? Why wasn't this problem solved by the existing âNo-I/O dirty throttlingâ code?

Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
The pernicious USB-stick stall problem - LWN.net, 2013.
Artem S. Tashkinov recently encountered a problem that will be familiar to at least some LWN readers. Plug a slow storage device (a USB stick, say, or a media player) into a Linux machine and write a lot of data to it. The entire system proceeds to just hang, possibly for minutes.
This time around, though, Artem made an interesting observation: the system would stall when running with a 64-bit kernel, but no such problem was experienced when using a 32-bit kernel on the same hardware.
On 64-bit x86, the writeback cache was allowed to grow to 20% of system RAM by default. Linus suggested to effectively limit it to ~180MB on all platforms, mimicking a limitation of the 32-bit x86 code. However current Linux (v4.18) does not do this. Compare Linus's suggested patch, to the current function in Linux 4.18.
But I also read an another article, describing some code which was merged in 2011 (Linux 3.2):
No-I/O dirty throttling - LWN.net, 2011
That is where Fengguang's patch set comes in. He is attempting to create a control loop capable of determining how many pages each process should be allowed to dirty at any given time. Processes exceeding their limit are simply put to sleep for a while to allow the writeback system to catch up with them.
[...]
The goal of the system is to keep the number of dirty pages at the setpoint; if things get out of line, increasing amounts of force will be applied to bring things back to where they should be.
[...]
This ratio cannot really be calculated, though, without taking the backing device (BDI) into account. A process may be dirtying pages stored on a given BDI, and the system may have a surfeit of dirty pages at the moment, but the wisdom of throttling that process depends also on how many dirty pages exist for that BDI. If a given BDI is swamped with dirty pages, it may make sense to throttle a dirtying process even if the system as a whole is doing OK. On the other hand, a BDI with few dirty pages can clear its backlog quickly, so it can probably afford to have a few more, even if the system is somewhat more dirty than one might like. So the patch set tweaks the calculated pos_ratio for a specific BDI using a complicated formula looking at how far that specific BDI is from its own setpoint and its observed bandwidth. The end result is a modified pos_ratio describing whether the system should be dirtying more or fewer pages backed by the given BDI, and by how much.
So Linux had some per-device control over the dirty page cache, since at least 2011 and Linux 3.2. Why did we still have the problem in 2013, that the USB-stick stall affects the whole system?
linux cache
add a comment |Â
up vote
0
down vote
favorite
The pernicious USB-stick stall problem - LWN.net, 2013.
Artem S. Tashkinov recently encountered a problem that will be familiar to at least some LWN readers. Plug a slow storage device (a USB stick, say, or a media player) into a Linux machine and write a lot of data to it. The entire system proceeds to just hang, possibly for minutes.
This time around, though, Artem made an interesting observation: the system would stall when running with a 64-bit kernel, but no such problem was experienced when using a 32-bit kernel on the same hardware.
On 64-bit x86, the writeback cache was allowed to grow to 20% of system RAM by default. Linus suggested to effectively limit it to ~180MB on all platforms, mimicking a limitation of the 32-bit x86 code. However current Linux (v4.18) does not do this. Compare Linus's suggested patch, to the current function in Linux 4.18.
But I also read an another article, describing some code which was merged in 2011 (Linux 3.2):
No-I/O dirty throttling - LWN.net, 2011
That is where Fengguang's patch set comes in. He is attempting to create a control loop capable of determining how many pages each process should be allowed to dirty at any given time. Processes exceeding their limit are simply put to sleep for a while to allow the writeback system to catch up with them.
[...]
The goal of the system is to keep the number of dirty pages at the setpoint; if things get out of line, increasing amounts of force will be applied to bring things back to where they should be.
[...]
This ratio cannot really be calculated, though, without taking the backing device (BDI) into account. A process may be dirtying pages stored on a given BDI, and the system may have a surfeit of dirty pages at the moment, but the wisdom of throttling that process depends also on how many dirty pages exist for that BDI. If a given BDI is swamped with dirty pages, it may make sense to throttle a dirtying process even if the system as a whole is doing OK. On the other hand, a BDI with few dirty pages can clear its backlog quickly, so it can probably afford to have a few more, even if the system is somewhat more dirty than one might like. So the patch set tweaks the calculated pos_ratio for a specific BDI using a complicated formula looking at how far that specific BDI is from its own setpoint and its observed bandwidth. The end result is a modified pos_ratio describing whether the system should be dirtying more or fewer pages backed by the given BDI, and by how much.
So Linux had some per-device control over the dirty page cache, since at least 2011 and Linux 3.2. Why did we still have the problem in 2013, that the USB-stick stall affects the whole system?
linux cache
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
The pernicious USB-stick stall problem - LWN.net, 2013.
Artem S. Tashkinov recently encountered a problem that will be familiar to at least some LWN readers. Plug a slow storage device (a USB stick, say, or a media player) into a Linux machine and write a lot of data to it. The entire system proceeds to just hang, possibly for minutes.
This time around, though, Artem made an interesting observation: the system would stall when running with a 64-bit kernel, but no such problem was experienced when using a 32-bit kernel on the same hardware.
On 64-bit x86, the writeback cache was allowed to grow to 20% of system RAM by default. Linus suggested to effectively limit it to ~180MB on all platforms, mimicking a limitation of the 32-bit x86 code. However current Linux (v4.18) does not do this. Compare Linus's suggested patch, to the current function in Linux 4.18.
But I also read an another article, describing some code which was merged in 2011 (Linux 3.2):
No-I/O dirty throttling - LWN.net, 2011
That is where Fengguang's patch set comes in. He is attempting to create a control loop capable of determining how many pages each process should be allowed to dirty at any given time. Processes exceeding their limit are simply put to sleep for a while to allow the writeback system to catch up with them.
[...]
The goal of the system is to keep the number of dirty pages at the setpoint; if things get out of line, increasing amounts of force will be applied to bring things back to where they should be.
[...]
This ratio cannot really be calculated, though, without taking the backing device (BDI) into account. A process may be dirtying pages stored on a given BDI, and the system may have a surfeit of dirty pages at the moment, but the wisdom of throttling that process depends also on how many dirty pages exist for that BDI. If a given BDI is swamped with dirty pages, it may make sense to throttle a dirtying process even if the system as a whole is doing OK. On the other hand, a BDI with few dirty pages can clear its backlog quickly, so it can probably afford to have a few more, even if the system is somewhat more dirty than one might like. So the patch set tweaks the calculated pos_ratio for a specific BDI using a complicated formula looking at how far that specific BDI is from its own setpoint and its observed bandwidth. The end result is a modified pos_ratio describing whether the system should be dirtying more or fewer pages backed by the given BDI, and by how much.
So Linux had some per-device control over the dirty page cache, since at least 2011 and Linux 3.2. Why did we still have the problem in 2013, that the USB-stick stall affects the whole system?
linux cache
The pernicious USB-stick stall problem - LWN.net, 2013.
Artem S. Tashkinov recently encountered a problem that will be familiar to at least some LWN readers. Plug a slow storage device (a USB stick, say, or a media player) into a Linux machine and write a lot of data to it. The entire system proceeds to just hang, possibly for minutes.
This time around, though, Artem made an interesting observation: the system would stall when running with a 64-bit kernel, but no such problem was experienced when using a 32-bit kernel on the same hardware.
On 64-bit x86, the writeback cache was allowed to grow to 20% of system RAM by default. Linus suggested to effectively limit it to ~180MB on all platforms, mimicking a limitation of the 32-bit x86 code. However current Linux (v4.18) does not do this. Compare Linus's suggested patch, to the current function in Linux 4.18.
But I also read an another article, describing some code which was merged in 2011 (Linux 3.2):
No-I/O dirty throttling - LWN.net, 2011
That is where Fengguang's patch set comes in. He is attempting to create a control loop capable of determining how many pages each process should be allowed to dirty at any given time. Processes exceeding their limit are simply put to sleep for a while to allow the writeback system to catch up with them.
[...]
The goal of the system is to keep the number of dirty pages at the setpoint; if things get out of line, increasing amounts of force will be applied to bring things back to where they should be.
[...]
This ratio cannot really be calculated, though, without taking the backing device (BDI) into account. A process may be dirtying pages stored on a given BDI, and the system may have a surfeit of dirty pages at the moment, but the wisdom of throttling that process depends also on how many dirty pages exist for that BDI. If a given BDI is swamped with dirty pages, it may make sense to throttle a dirtying process even if the system as a whole is doing OK. On the other hand, a BDI with few dirty pages can clear its backlog quickly, so it can probably afford to have a few more, even if the system is somewhat more dirty than one might like. So the patch set tweaks the calculated pos_ratio for a specific BDI using a complicated formula looking at how far that specific BDI is from its own setpoint and its observed bandwidth. The end result is a modified pos_ratio describing whether the system should be dirtying more or fewer pages backed by the given BDI, and by how much.
So Linux had some per-device control over the dirty page cache, since at least 2011 and Linux 3.2. Why did we still have the problem in 2013, that the USB-stick stall affects the whole system?
linux cache
linux cache
edited 59 mins ago
asked 3 hours ago
sourcejedi
21.4k43395
21.4k43395
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
The "USB-stick stall" article provides no evidence for its claim. I think it is very misleading, and maybe just wrong.
Artem did not report the entire system hanging while it flushes cached writes to a USB stick. His report only complained that sync could take up to "dozens of minutes".
In a followup, Artem reported "the server almost stalls and other IO requests take a lot more time to complete even though mysqldump is run with ionice -c3". But this was not the USB-stick problem. It happened after creating a 10GB file on an internal disk.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
The "USB-stick stall" article provides no evidence for its claim. I think it is very misleading, and maybe just wrong.
Artem did not report the entire system hanging while it flushes cached writes to a USB stick. His report only complained that sync could take up to "dozens of minutes".
In a followup, Artem reported "the server almost stalls and other IO requests take a lot more time to complete even though mysqldump is run with ionice -c3". But this was not the USB-stick problem. It happened after creating a 10GB file on an internal disk.
add a comment |Â
up vote
1
down vote
The "USB-stick stall" article provides no evidence for its claim. I think it is very misleading, and maybe just wrong.
Artem did not report the entire system hanging while it flushes cached writes to a USB stick. His report only complained that sync could take up to "dozens of minutes".
In a followup, Artem reported "the server almost stalls and other IO requests take a lot more time to complete even though mysqldump is run with ionice -c3". But this was not the USB-stick problem. It happened after creating a 10GB file on an internal disk.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
The "USB-stick stall" article provides no evidence for its claim. I think it is very misleading, and maybe just wrong.
Artem did not report the entire system hanging while it flushes cached writes to a USB stick. His report only complained that sync could take up to "dozens of minutes".
In a followup, Artem reported "the server almost stalls and other IO requests take a lot more time to complete even though mysqldump is run with ionice -c3". But this was not the USB-stick problem. It happened after creating a 10GB file on an internal disk.
The "USB-stick stall" article provides no evidence for its claim. I think it is very misleading, and maybe just wrong.
Artem did not report the entire system hanging while it flushes cached writes to a USB stick. His report only complained that sync could take up to "dozens of minutes".
In a followup, Artem reported "the server almost stalls and other IO requests take a lot more time to complete even though mysqldump is run with ionice -c3". But this was not the USB-stick problem. It happened after creating a 10GB file on an internal disk.
edited 1 hour ago
answered 3 hours ago
sourcejedi
21.4k43395
21.4k43395
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f480399%2fwhy-were-usb-stick-stall-problems-reported-in-2013-why-wasnt-this-problem-so%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password