Why were “USB-stick stall” problems reported in 2013? Why wasn't this problem solved by the existing “No-I/O dirty throttling” code?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite














The pernicious USB-stick stall problem - LWN.net, 2013.



Artem S. Tashkinov recently encountered a problem that will be familiar to at least some LWN readers. Plug a slow storage device (a USB stick, say, or a media player) into a Linux machine and write a lot of data to it. The entire system proceeds to just hang, possibly for minutes.



This time around, though, Artem made an interesting observation: the system would stall when running with a 64-bit kernel, but no such problem was experienced when using a 32-bit kernel on the same hardware.




On 64-bit x86, the writeback cache was allowed to grow to 20% of system RAM by default. Linus suggested to effectively limit it to ~180MB on all platforms, mimicking a limitation of the 32-bit x86 code. However current Linux (v4.18) does not do this. Compare Linus's suggested patch, to the current function in Linux 4.18.



But I also read an another article, describing some code which was merged in 2011 (Linux 3.2):





No-I/O dirty throttling - LWN.net, 2011



That is where Fengguang's patch set comes in. He is attempting to create a control loop capable of determining how many pages each process should be allowed to dirty at any given time. Processes exceeding their limit are simply put to sleep for a while to allow the writeback system to catch up with them.



[...]



The goal of the system is to keep the number of dirty pages at the setpoint; if things get out of line, increasing amounts of force will be applied to bring things back to where they should be.



[...]



This ratio cannot really be calculated, though, without taking the backing device (BDI) into account. A process may be dirtying pages stored on a given BDI, and the system may have a surfeit of dirty pages at the moment, but the wisdom of throttling that process depends also on how many dirty pages exist for that BDI. If a given BDI is swamped with dirty pages, it may make sense to throttle a dirtying process even if the system as a whole is doing OK. On the other hand, a BDI with few dirty pages can clear its backlog quickly, so it can probably afford to have a few more, even if the system is somewhat more dirty than one might like. So the patch set tweaks the calculated pos_ratio for a specific BDI using a complicated formula looking at how far that specific BDI is from its own setpoint and its observed bandwidth. The end result is a modified pos_ratio describing whether the system should be dirtying more or fewer pages backed by the given BDI, and by how much.




So Linux had some per-device control over the dirty page cache, since at least 2011 and Linux 3.2. Why did we still have the problem in 2013, that the USB-stick stall affects the whole system?










share|improve this question



























    up vote
    0
    down vote

    favorite














    The pernicious USB-stick stall problem - LWN.net, 2013.



    Artem S. Tashkinov recently encountered a problem that will be familiar to at least some LWN readers. Plug a slow storage device (a USB stick, say, or a media player) into a Linux machine and write a lot of data to it. The entire system proceeds to just hang, possibly for minutes.



    This time around, though, Artem made an interesting observation: the system would stall when running with a 64-bit kernel, but no such problem was experienced when using a 32-bit kernel on the same hardware.




    On 64-bit x86, the writeback cache was allowed to grow to 20% of system RAM by default. Linus suggested to effectively limit it to ~180MB on all platforms, mimicking a limitation of the 32-bit x86 code. However current Linux (v4.18) does not do this. Compare Linus's suggested patch, to the current function in Linux 4.18.



    But I also read an another article, describing some code which was merged in 2011 (Linux 3.2):





    No-I/O dirty throttling - LWN.net, 2011



    That is where Fengguang's patch set comes in. He is attempting to create a control loop capable of determining how many pages each process should be allowed to dirty at any given time. Processes exceeding their limit are simply put to sleep for a while to allow the writeback system to catch up with them.



    [...]



    The goal of the system is to keep the number of dirty pages at the setpoint; if things get out of line, increasing amounts of force will be applied to bring things back to where they should be.



    [...]



    This ratio cannot really be calculated, though, without taking the backing device (BDI) into account. A process may be dirtying pages stored on a given BDI, and the system may have a surfeit of dirty pages at the moment, but the wisdom of throttling that process depends also on how many dirty pages exist for that BDI. If a given BDI is swamped with dirty pages, it may make sense to throttle a dirtying process even if the system as a whole is doing OK. On the other hand, a BDI with few dirty pages can clear its backlog quickly, so it can probably afford to have a few more, even if the system is somewhat more dirty than one might like. So the patch set tweaks the calculated pos_ratio for a specific BDI using a complicated formula looking at how far that specific BDI is from its own setpoint and its observed bandwidth. The end result is a modified pos_ratio describing whether the system should be dirtying more or fewer pages backed by the given BDI, and by how much.




    So Linux had some per-device control over the dirty page cache, since at least 2011 and Linux 3.2. Why did we still have the problem in 2013, that the USB-stick stall affects the whole system?










    share|improve this question

























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite













      The pernicious USB-stick stall problem - LWN.net, 2013.



      Artem S. Tashkinov recently encountered a problem that will be familiar to at least some LWN readers. Plug a slow storage device (a USB stick, say, or a media player) into a Linux machine and write a lot of data to it. The entire system proceeds to just hang, possibly for minutes.



      This time around, though, Artem made an interesting observation: the system would stall when running with a 64-bit kernel, but no such problem was experienced when using a 32-bit kernel on the same hardware.




      On 64-bit x86, the writeback cache was allowed to grow to 20% of system RAM by default. Linus suggested to effectively limit it to ~180MB on all platforms, mimicking a limitation of the 32-bit x86 code. However current Linux (v4.18) does not do this. Compare Linus's suggested patch, to the current function in Linux 4.18.



      But I also read an another article, describing some code which was merged in 2011 (Linux 3.2):





      No-I/O dirty throttling - LWN.net, 2011



      That is where Fengguang's patch set comes in. He is attempting to create a control loop capable of determining how many pages each process should be allowed to dirty at any given time. Processes exceeding their limit are simply put to sleep for a while to allow the writeback system to catch up with them.



      [...]



      The goal of the system is to keep the number of dirty pages at the setpoint; if things get out of line, increasing amounts of force will be applied to bring things back to where they should be.



      [...]



      This ratio cannot really be calculated, though, without taking the backing device (BDI) into account. A process may be dirtying pages stored on a given BDI, and the system may have a surfeit of dirty pages at the moment, but the wisdom of throttling that process depends also on how many dirty pages exist for that BDI. If a given BDI is swamped with dirty pages, it may make sense to throttle a dirtying process even if the system as a whole is doing OK. On the other hand, a BDI with few dirty pages can clear its backlog quickly, so it can probably afford to have a few more, even if the system is somewhat more dirty than one might like. So the patch set tweaks the calculated pos_ratio for a specific BDI using a complicated formula looking at how far that specific BDI is from its own setpoint and its observed bandwidth. The end result is a modified pos_ratio describing whether the system should be dirtying more or fewer pages backed by the given BDI, and by how much.




      So Linux had some per-device control over the dirty page cache, since at least 2011 and Linux 3.2. Why did we still have the problem in 2013, that the USB-stick stall affects the whole system?










      share|improve this question

















      The pernicious USB-stick stall problem - LWN.net, 2013.



      Artem S. Tashkinov recently encountered a problem that will be familiar to at least some LWN readers. Plug a slow storage device (a USB stick, say, or a media player) into a Linux machine and write a lot of data to it. The entire system proceeds to just hang, possibly for minutes.



      This time around, though, Artem made an interesting observation: the system would stall when running with a 64-bit kernel, but no such problem was experienced when using a 32-bit kernel on the same hardware.




      On 64-bit x86, the writeback cache was allowed to grow to 20% of system RAM by default. Linus suggested to effectively limit it to ~180MB on all platforms, mimicking a limitation of the 32-bit x86 code. However current Linux (v4.18) does not do this. Compare Linus's suggested patch, to the current function in Linux 4.18.



      But I also read an another article, describing some code which was merged in 2011 (Linux 3.2):





      No-I/O dirty throttling - LWN.net, 2011



      That is where Fengguang's patch set comes in. He is attempting to create a control loop capable of determining how many pages each process should be allowed to dirty at any given time. Processes exceeding their limit are simply put to sleep for a while to allow the writeback system to catch up with them.



      [...]



      The goal of the system is to keep the number of dirty pages at the setpoint; if things get out of line, increasing amounts of force will be applied to bring things back to where they should be.



      [...]



      This ratio cannot really be calculated, though, without taking the backing device (BDI) into account. A process may be dirtying pages stored on a given BDI, and the system may have a surfeit of dirty pages at the moment, but the wisdom of throttling that process depends also on how many dirty pages exist for that BDI. If a given BDI is swamped with dirty pages, it may make sense to throttle a dirtying process even if the system as a whole is doing OK. On the other hand, a BDI with few dirty pages can clear its backlog quickly, so it can probably afford to have a few more, even if the system is somewhat more dirty than one might like. So the patch set tweaks the calculated pos_ratio for a specific BDI using a complicated formula looking at how far that specific BDI is from its own setpoint and its observed bandwidth. The end result is a modified pos_ratio describing whether the system should be dirtying more or fewer pages backed by the given BDI, and by how much.




      So Linux had some per-device control over the dirty page cache, since at least 2011 and Linux 3.2. Why did we still have the problem in 2013, that the USB-stick stall affects the whole system?







      linux cache






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 59 mins ago

























      asked 3 hours ago









      sourcejedi

      21.4k43395




      21.4k43395




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote













          The "USB-stick stall" article provides no evidence for its claim. I think it is very misleading, and maybe just wrong.



          Artem did not report the entire system hanging while it flushes cached writes to a USB stick. His report only complained that sync could take up to "dozens of minutes".



          In a followup, Artem reported "the server almost stalls and other IO requests take a lot more time to complete even though mysqldump is run with ionice -c3". But this was not the USB-stick problem. It happened after creating a 10GB file on an internal disk.






          share|improve this answer






















            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "106"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f480399%2fwhy-were-usb-stick-stall-problems-reported-in-2013-why-wasnt-this-problem-so%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote













            The "USB-stick stall" article provides no evidence for its claim. I think it is very misleading, and maybe just wrong.



            Artem did not report the entire system hanging while it flushes cached writes to a USB stick. His report only complained that sync could take up to "dozens of minutes".



            In a followup, Artem reported "the server almost stalls and other IO requests take a lot more time to complete even though mysqldump is run with ionice -c3". But this was not the USB-stick problem. It happened after creating a 10GB file on an internal disk.






            share|improve this answer


























              up vote
              1
              down vote













              The "USB-stick stall" article provides no evidence for its claim. I think it is very misleading, and maybe just wrong.



              Artem did not report the entire system hanging while it flushes cached writes to a USB stick. His report only complained that sync could take up to "dozens of minutes".



              In a followup, Artem reported "the server almost stalls and other IO requests take a lot more time to complete even though mysqldump is run with ionice -c3". But this was not the USB-stick problem. It happened after creating a 10GB file on an internal disk.






              share|improve this answer
























                up vote
                1
                down vote










                up vote
                1
                down vote









                The "USB-stick stall" article provides no evidence for its claim. I think it is very misleading, and maybe just wrong.



                Artem did not report the entire system hanging while it flushes cached writes to a USB stick. His report only complained that sync could take up to "dozens of minutes".



                In a followup, Artem reported "the server almost stalls and other IO requests take a lot more time to complete even though mysqldump is run with ionice -c3". But this was not the USB-stick problem. It happened after creating a 10GB file on an internal disk.






                share|improve this answer














                The "USB-stick stall" article provides no evidence for its claim. I think it is very misleading, and maybe just wrong.



                Artem did not report the entire system hanging while it flushes cached writes to a USB stick. His report only complained that sync could take up to "dozens of minutes".



                In a followup, Artem reported "the server almost stalls and other IO requests take a lot more time to complete even though mysqldump is run with ionice -c3". But this was not the USB-stick problem. It happened after creating a 10GB file on an internal disk.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 1 hour ago

























                answered 3 hours ago









                sourcejedi

                21.4k43395




                21.4k43395



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f480399%2fwhy-were-usb-stick-stall-problems-reported-in-2013-why-wasnt-this-problem-so%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    Peggy Mitchell

                    Palaiologos

                    The Forum (Inglewood, California)