OOM killer doesn't work properly, leads to a frozen OS

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
10
down vote

favorite
7












Since years, the OOM killer of my operating system doesn't work properly and leads to a frozen system.

When the memory usage is very high, the whole system tends to freeze for hours or even days (the maximum that I have recorded is 7 days before a reset), instead of killing processes to free the memory.

In this situation, the iowait is very very high (~ 70%).

The tool: iotop has showed that every programs are reading at a very high throughput (per tens of MB/sec) from my hard drive.

What those programs are reading ?

- The directory hierarchy ?

- The executable code itself ?

I don't exactly now.



I use an uptodate ArchLinux (currently: 4.9.27-1-lts), have 16 GB of physical ram, and no enabled swap partition.

Because of the amount of ram that I have, I don't want to enable a swap partition, since it would just delay the apparition of the issue.



To me the problem is caused by the fact that Linux drops essential data from the caches, which leads to a frozen system because it has to read everything, every time from the hard drive.



I even wonder if Linux wouldn't drop the executable code pages of running programs, which would explain why programs that normally don't read a lot of data, behave this way in this situation.



I have tried several things in the hope to fix this issue.

One was to set /proc/sys/vm/min_free_kbytes to 1000000 (1 GB).

Because this 1 GB should remains free, I thought that this memory would be reserved by Linux to cache important data.

But it hasn't worked.



Also, I think useful to add that even if it sounds great in theory, restricting the size of the virtual memory to the size of the physical memory, by defining /proc/sys/vm/overcommit_memory to 2 isn't decently technically possible in my situation, because the kind of applications that I use, require more virtual memory than they effectively use for some reasons.

According to the file /proc/meminfo, the Commited_AS value is often higher than the double of the physical ram on my system (16 GB, Commited_AS is often > 32 GB).



I have experienced this problem with /proc/sys/vm/overcommit_memory to its default value: 0, and since a while I have defined it to: 1, because I prefer programs to be killed by the OOM killer rather than behaving wrongly because they don't check the return values of malloc, when the allocations are refused.



When I was talking about this issue on IRC, I have met other Linux users who have experienced this very same problem, so I guess that a lot of users are concerned by this.

To me this is not acceptable since even Windows deals better with high memory usage.



If you need more information, have a suggestion, please tell me.










share|improve this question



















  • 1




    I think this is what you should expect, if you're thrashing, but you're not really approaching 100% "used" i.e. there is too much memory usage which is file-backed, counted as "buff/cache". (Ugh, this phrasing assumes your tmpfs allocations are trivial, as these show up as "buff/cache", but cannot be paged out to a physical filesystem). min_free_kbytes is not relevant, it's not a reserve for cached pages. AFAICT none of the vm sysctls allow reserving any memory specifically for cached pages, i.e. limiting MAP_ANONYMOUS allocations :(.
    – sourcejedi
    Jun 28 '17 at 19:41











  • Thank you for having confirmed that setting another value to: min_free_kbytes is not the right way to solve this issue. However, I disagree with your statement: "this is what you should expect". What I do expect is a system good enough to free the memory, by killing non vital processes, in high memory usage, and certainly not a frozen system, unusable for days.
    – parasite
    Jun 29 '17 at 14:53











  • Ugh again. I was trying to make a technical statement, not a normative one. To be clear, I agree it's annoying there seems no way to reserve memory for cached pages, which you point out is the obvious way to try and mitigate this. I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process
    – sourcejedi
    Jun 29 '17 at 15:55











  • I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process. I imagine the latter is easier... there's less ambiguity about whether the app is expected to use the page again later. Running out of memory sucks and Linux hasn't managed to mitigate that very much.
    – sourcejedi
    Jun 29 '17 at 16:01







  • 1




    I've been looking for a solution for this exact issue for years now without any success. I believe I first noticed the problem after replacing my HDD by an SSD, which also entailed me disabling swapping altogether, but I can't really guarantee that it never happened before these changes, so it might be unrelated. I'm on Archlinux btw.
    – brunocodutra
    Aug 8 '17 at 11:59















up vote
10
down vote

favorite
7












Since years, the OOM killer of my operating system doesn't work properly and leads to a frozen system.

When the memory usage is very high, the whole system tends to freeze for hours or even days (the maximum that I have recorded is 7 days before a reset), instead of killing processes to free the memory.

In this situation, the iowait is very very high (~ 70%).

The tool: iotop has showed that every programs are reading at a very high throughput (per tens of MB/sec) from my hard drive.

What those programs are reading ?

- The directory hierarchy ?

- The executable code itself ?

I don't exactly now.



I use an uptodate ArchLinux (currently: 4.9.27-1-lts), have 16 GB of physical ram, and no enabled swap partition.

Because of the amount of ram that I have, I don't want to enable a swap partition, since it would just delay the apparition of the issue.



To me the problem is caused by the fact that Linux drops essential data from the caches, which leads to a frozen system because it has to read everything, every time from the hard drive.



I even wonder if Linux wouldn't drop the executable code pages of running programs, which would explain why programs that normally don't read a lot of data, behave this way in this situation.



I have tried several things in the hope to fix this issue.

One was to set /proc/sys/vm/min_free_kbytes to 1000000 (1 GB).

Because this 1 GB should remains free, I thought that this memory would be reserved by Linux to cache important data.

But it hasn't worked.



Also, I think useful to add that even if it sounds great in theory, restricting the size of the virtual memory to the size of the physical memory, by defining /proc/sys/vm/overcommit_memory to 2 isn't decently technically possible in my situation, because the kind of applications that I use, require more virtual memory than they effectively use for some reasons.

According to the file /proc/meminfo, the Commited_AS value is often higher than the double of the physical ram on my system (16 GB, Commited_AS is often > 32 GB).



I have experienced this problem with /proc/sys/vm/overcommit_memory to its default value: 0, and since a while I have defined it to: 1, because I prefer programs to be killed by the OOM killer rather than behaving wrongly because they don't check the return values of malloc, when the allocations are refused.



When I was talking about this issue on IRC, I have met other Linux users who have experienced this very same problem, so I guess that a lot of users are concerned by this.

To me this is not acceptable since even Windows deals better with high memory usage.



If you need more information, have a suggestion, please tell me.










share|improve this question



















  • 1




    I think this is what you should expect, if you're thrashing, but you're not really approaching 100% "used" i.e. there is too much memory usage which is file-backed, counted as "buff/cache". (Ugh, this phrasing assumes your tmpfs allocations are trivial, as these show up as "buff/cache", but cannot be paged out to a physical filesystem). min_free_kbytes is not relevant, it's not a reserve for cached pages. AFAICT none of the vm sysctls allow reserving any memory specifically for cached pages, i.e. limiting MAP_ANONYMOUS allocations :(.
    – sourcejedi
    Jun 28 '17 at 19:41











  • Thank you for having confirmed that setting another value to: min_free_kbytes is not the right way to solve this issue. However, I disagree with your statement: "this is what you should expect". What I do expect is a system good enough to free the memory, by killing non vital processes, in high memory usage, and certainly not a frozen system, unusable for days.
    – parasite
    Jun 29 '17 at 14:53











  • Ugh again. I was trying to make a technical statement, not a normative one. To be clear, I agree it's annoying there seems no way to reserve memory for cached pages, which you point out is the obvious way to try and mitigate this. I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process
    – sourcejedi
    Jun 29 '17 at 15:55











  • I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process. I imagine the latter is easier... there's less ambiguity about whether the app is expected to use the page again later. Running out of memory sucks and Linux hasn't managed to mitigate that very much.
    – sourcejedi
    Jun 29 '17 at 16:01







  • 1




    I've been looking for a solution for this exact issue for years now without any success. I believe I first noticed the problem after replacing my HDD by an SSD, which also entailed me disabling swapping altogether, but I can't really guarantee that it never happened before these changes, so it might be unrelated. I'm on Archlinux btw.
    – brunocodutra
    Aug 8 '17 at 11:59













up vote
10
down vote

favorite
7









up vote
10
down vote

favorite
7






7





Since years, the OOM killer of my operating system doesn't work properly and leads to a frozen system.

When the memory usage is very high, the whole system tends to freeze for hours or even days (the maximum that I have recorded is 7 days before a reset), instead of killing processes to free the memory.

In this situation, the iowait is very very high (~ 70%).

The tool: iotop has showed that every programs are reading at a very high throughput (per tens of MB/sec) from my hard drive.

What those programs are reading ?

- The directory hierarchy ?

- The executable code itself ?

I don't exactly now.



I use an uptodate ArchLinux (currently: 4.9.27-1-lts), have 16 GB of physical ram, and no enabled swap partition.

Because of the amount of ram that I have, I don't want to enable a swap partition, since it would just delay the apparition of the issue.



To me the problem is caused by the fact that Linux drops essential data from the caches, which leads to a frozen system because it has to read everything, every time from the hard drive.



I even wonder if Linux wouldn't drop the executable code pages of running programs, which would explain why programs that normally don't read a lot of data, behave this way in this situation.



I have tried several things in the hope to fix this issue.

One was to set /proc/sys/vm/min_free_kbytes to 1000000 (1 GB).

Because this 1 GB should remains free, I thought that this memory would be reserved by Linux to cache important data.

But it hasn't worked.



Also, I think useful to add that even if it sounds great in theory, restricting the size of the virtual memory to the size of the physical memory, by defining /proc/sys/vm/overcommit_memory to 2 isn't decently technically possible in my situation, because the kind of applications that I use, require more virtual memory than they effectively use for some reasons.

According to the file /proc/meminfo, the Commited_AS value is often higher than the double of the physical ram on my system (16 GB, Commited_AS is often > 32 GB).



I have experienced this problem with /proc/sys/vm/overcommit_memory to its default value: 0, and since a while I have defined it to: 1, because I prefer programs to be killed by the OOM killer rather than behaving wrongly because they don't check the return values of malloc, when the allocations are refused.



When I was talking about this issue on IRC, I have met other Linux users who have experienced this very same problem, so I guess that a lot of users are concerned by this.

To me this is not acceptable since even Windows deals better with high memory usage.



If you need more information, have a suggestion, please tell me.










share|improve this question















Since years, the OOM killer of my operating system doesn't work properly and leads to a frozen system.

When the memory usage is very high, the whole system tends to freeze for hours or even days (the maximum that I have recorded is 7 days before a reset), instead of killing processes to free the memory.

In this situation, the iowait is very very high (~ 70%).

The tool: iotop has showed that every programs are reading at a very high throughput (per tens of MB/sec) from my hard drive.

What those programs are reading ?

- The directory hierarchy ?

- The executable code itself ?

I don't exactly now.



I use an uptodate ArchLinux (currently: 4.9.27-1-lts), have 16 GB of physical ram, and no enabled swap partition.

Because of the amount of ram that I have, I don't want to enable a swap partition, since it would just delay the apparition of the issue.



To me the problem is caused by the fact that Linux drops essential data from the caches, which leads to a frozen system because it has to read everything, every time from the hard drive.



I even wonder if Linux wouldn't drop the executable code pages of running programs, which would explain why programs that normally don't read a lot of data, behave this way in this situation.



I have tried several things in the hope to fix this issue.

One was to set /proc/sys/vm/min_free_kbytes to 1000000 (1 GB).

Because this 1 GB should remains free, I thought that this memory would be reserved by Linux to cache important data.

But it hasn't worked.



Also, I think useful to add that even if it sounds great in theory, restricting the size of the virtual memory to the size of the physical memory, by defining /proc/sys/vm/overcommit_memory to 2 isn't decently technically possible in my situation, because the kind of applications that I use, require more virtual memory than they effectively use for some reasons.

According to the file /proc/meminfo, the Commited_AS value is often higher than the double of the physical ram on my system (16 GB, Commited_AS is often > 32 GB).



I have experienced this problem with /proc/sys/vm/overcommit_memory to its default value: 0, and since a while I have defined it to: 1, because I prefer programs to be killed by the OOM killer rather than behaving wrongly because they don't check the return values of malloc, when the allocations are refused.



When I was talking about this issue on IRC, I have met other Linux users who have experienced this very same problem, so I guess that a lot of users are concerned by this.

To me this is not acceptable since even Windows deals better with high memory usage.



If you need more information, have a suggestion, please tell me.







linux arch-linux out-of-memory






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jul 3 '17 at 12:55

























asked Jun 25 '17 at 19:45









parasite

1245




1245







  • 1




    I think this is what you should expect, if you're thrashing, but you're not really approaching 100% "used" i.e. there is too much memory usage which is file-backed, counted as "buff/cache". (Ugh, this phrasing assumes your tmpfs allocations are trivial, as these show up as "buff/cache", but cannot be paged out to a physical filesystem). min_free_kbytes is not relevant, it's not a reserve for cached pages. AFAICT none of the vm sysctls allow reserving any memory specifically for cached pages, i.e. limiting MAP_ANONYMOUS allocations :(.
    – sourcejedi
    Jun 28 '17 at 19:41











  • Thank you for having confirmed that setting another value to: min_free_kbytes is not the right way to solve this issue. However, I disagree with your statement: "this is what you should expect". What I do expect is a system good enough to free the memory, by killing non vital processes, in high memory usage, and certainly not a frozen system, unusable for days.
    – parasite
    Jun 29 '17 at 14:53











  • Ugh again. I was trying to make a technical statement, not a normative one. To be clear, I agree it's annoying there seems no way to reserve memory for cached pages, which you point out is the obvious way to try and mitigate this. I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process
    – sourcejedi
    Jun 29 '17 at 15:55











  • I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process. I imagine the latter is easier... there's less ambiguity about whether the app is expected to use the page again later. Running out of memory sucks and Linux hasn't managed to mitigate that very much.
    – sourcejedi
    Jun 29 '17 at 16:01







  • 1




    I've been looking for a solution for this exact issue for years now without any success. I believe I first noticed the problem after replacing my HDD by an SSD, which also entailed me disabling swapping altogether, but I can't really guarantee that it never happened before these changes, so it might be unrelated. I'm on Archlinux btw.
    – brunocodutra
    Aug 8 '17 at 11:59













  • 1




    I think this is what you should expect, if you're thrashing, but you're not really approaching 100% "used" i.e. there is too much memory usage which is file-backed, counted as "buff/cache". (Ugh, this phrasing assumes your tmpfs allocations are trivial, as these show up as "buff/cache", but cannot be paged out to a physical filesystem). min_free_kbytes is not relevant, it's not a reserve for cached pages. AFAICT none of the vm sysctls allow reserving any memory specifically for cached pages, i.e. limiting MAP_ANONYMOUS allocations :(.
    – sourcejedi
    Jun 28 '17 at 19:41











  • Thank you for having confirmed that setting another value to: min_free_kbytes is not the right way to solve this issue. However, I disagree with your statement: "this is what you should expect". What I do expect is a system good enough to free the memory, by killing non vital processes, in high memory usage, and certainly not a frozen system, unusable for days.
    – parasite
    Jun 29 '17 at 14:53











  • Ugh again. I was trying to make a technical statement, not a normative one. To be clear, I agree it's annoying there seems no way to reserve memory for cached pages, which you point out is the obvious way to try and mitigate this. I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process
    – sourcejedi
    Jun 29 '17 at 15:55











  • I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process. I imagine the latter is easier... there's less ambiguity about whether the app is expected to use the page again later. Running out of memory sucks and Linux hasn't managed to mitigate that very much.
    – sourcejedi
    Jun 29 '17 at 16:01







  • 1




    I've been looking for a solution for this exact issue for years now without any success. I believe I first noticed the problem after replacing my HDD by an SSD, which also entailed me disabling swapping altogether, but I can't really guarantee that it never happened before these changes, so it might be unrelated. I'm on Archlinux btw.
    – brunocodutra
    Aug 8 '17 at 11:59








1




1




I think this is what you should expect, if you're thrashing, but you're not really approaching 100% "used" i.e. there is too much memory usage which is file-backed, counted as "buff/cache". (Ugh, this phrasing assumes your tmpfs allocations are trivial, as these show up as "buff/cache", but cannot be paged out to a physical filesystem). min_free_kbytes is not relevant, it's not a reserve for cached pages. AFAICT none of the vm sysctls allow reserving any memory specifically for cached pages, i.e. limiting MAP_ANONYMOUS allocations :(.
– sourcejedi
Jun 28 '17 at 19:41





I think this is what you should expect, if you're thrashing, but you're not really approaching 100% "used" i.e. there is too much memory usage which is file-backed, counted as "buff/cache". (Ugh, this phrasing assumes your tmpfs allocations are trivial, as these show up as "buff/cache", but cannot be paged out to a physical filesystem). min_free_kbytes is not relevant, it's not a reserve for cached pages. AFAICT none of the vm sysctls allow reserving any memory specifically for cached pages, i.e. limiting MAP_ANONYMOUS allocations :(.
– sourcejedi
Jun 28 '17 at 19:41













Thank you for having confirmed that setting another value to: min_free_kbytes is not the right way to solve this issue. However, I disagree with your statement: "this is what you should expect". What I do expect is a system good enough to free the memory, by killing non vital processes, in high memory usage, and certainly not a frozen system, unusable for days.
– parasite
Jun 29 '17 at 14:53





Thank you for having confirmed that setting another value to: min_free_kbytes is not the right way to solve this issue. However, I disagree with your statement: "this is what you should expect". What I do expect is a system good enough to free the memory, by killing non vital processes, in high memory usage, and certainly not a frozen system, unusable for days.
– parasite
Jun 29 '17 at 14:53













Ugh again. I was trying to make a technical statement, not a normative one. To be clear, I agree it's annoying there seems no way to reserve memory for cached pages, which you point out is the obvious way to try and mitigate this. I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process
– sourcejedi
Jun 29 '17 at 15:55





Ugh again. I was trying to make a technical statement, not a normative one. To be clear, I agree it's annoying there seems no way to reserve memory for cached pages, which you point out is the obvious way to try and mitigate this. I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process
– sourcejedi
Jun 29 '17 at 15:55













I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process. I imagine the latter is easier... there's less ambiguity about whether the app is expected to use the page again later. Running out of memory sucks and Linux hasn't managed to mitigate that very much.
– sourcejedi
Jun 29 '17 at 16:01





I'm not smart enough to say if there's a sensible way to account the necessary working set of cached pages to a process in a similar way to how anonymous pages are accounted to a process. I imagine the latter is easier... there's less ambiguity about whether the app is expected to use the page again later. Running out of memory sucks and Linux hasn't managed to mitigate that very much.
– sourcejedi
Jun 29 '17 at 16:01





1




1




I've been looking for a solution for this exact issue for years now without any success. I believe I first noticed the problem after replacing my HDD by an SSD, which also entailed me disabling swapping altogether, but I can't really guarantee that it never happened before these changes, so it might be unrelated. I'm on Archlinux btw.
– brunocodutra
Aug 8 '17 at 11:59





I've been looking for a solution for this exact issue for years now without any success. I believe I first noticed the problem after replacing my HDD by an SSD, which also entailed me disabling swapping altogether, but I can't really guarantee that it never happened before these changes, so it might be unrelated. I'm on Archlinux btw.
– brunocodutra
Aug 8 '17 at 11:59











2 Answers
2






active

oldest

votes

















up vote
1
down vote













I beleve this was a kernel bug fixed sometime before 4.14.58 / 4.9.115;
I had similar issues close to the version you reported which are now fixed.






share|improve this answer



























    up vote
    1
    down vote













    I've found two explanations(of the same thing) as to why kswapd0 does constant disk reading happens well before OOM-killer kills the offending process:



    1. see the answer and comment of this askubuntu SE answer

    2. see the answer and David Schwartz's comments of this answer on unix SE

    I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:




    For example, consider a case where you have zero swap and system is
    nearly running out of RAM. The kernel will take memory from e.g.
    Firefox (it can do this because Firefox is running executable code
    that has been loaded from disk - the code can be loaded from disk
    again if needed). If Firefox then needs to access that RAM again N
    seconds later, the CPU generates "hard fault" which forces Linux to
    free some RAM (e.g. take some RAM from another process), load the
    missing data from disk and then allow Firefox to continue as usual.
    This is pretty similar to normal swapping and kswapd0 does it. – Mikko
    Rantalainen Feb 15 at 13:08




    If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
    EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
    EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!

    EDIT3 ignore these 2 EDITs, they are not the way to fix this because processes that wouldn't have died before die now sooner, also vm.overcommit_ratio=0 will cause everything to die, or something. I'll try to find a better way, likely needing kernel recompile and I'm looking into the relevant GFP flags...
    EDIT4: I found a way, through patching kernel, that works for me; see the patch inside this question.






    share|improve this answer






















      Your Answer







      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f373312%2foom-killer-doesnt-work-properly-leads-to-a-frozen-os%23new-answer', 'question_page');

      );

      Post as a guest






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote













      I beleve this was a kernel bug fixed sometime before 4.14.58 / 4.9.115;
      I had similar issues close to the version you reported which are now fixed.






      share|improve this answer
























        up vote
        1
        down vote













        I beleve this was a kernel bug fixed sometime before 4.14.58 / 4.9.115;
        I had similar issues close to the version you reported which are now fixed.






        share|improve this answer






















          up vote
          1
          down vote










          up vote
          1
          down vote









          I beleve this was a kernel bug fixed sometime before 4.14.58 / 4.9.115;
          I had similar issues close to the version you reported which are now fixed.






          share|improve this answer












          I beleve this was a kernel bug fixed sometime before 4.14.58 / 4.9.115;
          I had similar issues close to the version you reported which are now fixed.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jul 27 at 14:33









          user1133275

          2,277412




          2,277412






















              up vote
              1
              down vote













              I've found two explanations(of the same thing) as to why kswapd0 does constant disk reading happens well before OOM-killer kills the offending process:



              1. see the answer and comment of this askubuntu SE answer

              2. see the answer and David Schwartz's comments of this answer on unix SE

              I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:




              For example, consider a case where you have zero swap and system is
              nearly running out of RAM. The kernel will take memory from e.g.
              Firefox (it can do this because Firefox is running executable code
              that has been loaded from disk - the code can be loaded from disk
              again if needed). If Firefox then needs to access that RAM again N
              seconds later, the CPU generates "hard fault" which forces Linux to
              free some RAM (e.g. take some RAM from another process), load the
              missing data from disk and then allow Firefox to continue as usual.
              This is pretty similar to normal swapping and kswapd0 does it. – Mikko
              Rantalainen Feb 15 at 13:08




              If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
              EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
              EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!

              EDIT3 ignore these 2 EDITs, they are not the way to fix this because processes that wouldn't have died before die now sooner, also vm.overcommit_ratio=0 will cause everything to die, or something. I'll try to find a better way, likely needing kernel recompile and I'm looking into the relevant GFP flags...
              EDIT4: I found a way, through patching kernel, that works for me; see the patch inside this question.






              share|improve this answer


























                up vote
                1
                down vote













                I've found two explanations(of the same thing) as to why kswapd0 does constant disk reading happens well before OOM-killer kills the offending process:



                1. see the answer and comment of this askubuntu SE answer

                2. see the answer and David Schwartz's comments of this answer on unix SE

                I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:




                For example, consider a case where you have zero swap and system is
                nearly running out of RAM. The kernel will take memory from e.g.
                Firefox (it can do this because Firefox is running executable code
                that has been loaded from disk - the code can be loaded from disk
                again if needed). If Firefox then needs to access that RAM again N
                seconds later, the CPU generates "hard fault" which forces Linux to
                free some RAM (e.g. take some RAM from another process), load the
                missing data from disk and then allow Firefox to continue as usual.
                This is pretty similar to normal swapping and kswapd0 does it. – Mikko
                Rantalainen Feb 15 at 13:08




                If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
                EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
                EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!

                EDIT3 ignore these 2 EDITs, they are not the way to fix this because processes that wouldn't have died before die now sooner, also vm.overcommit_ratio=0 will cause everything to die, or something. I'll try to find a better way, likely needing kernel recompile and I'm looking into the relevant GFP flags...
                EDIT4: I found a way, through patching kernel, that works for me; see the patch inside this question.






                share|improve this answer
























                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  I've found two explanations(of the same thing) as to why kswapd0 does constant disk reading happens well before OOM-killer kills the offending process:



                  1. see the answer and comment of this askubuntu SE answer

                  2. see the answer and David Schwartz's comments of this answer on unix SE

                  I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:




                  For example, consider a case where you have zero swap and system is
                  nearly running out of RAM. The kernel will take memory from e.g.
                  Firefox (it can do this because Firefox is running executable code
                  that has been loaded from disk - the code can be loaded from disk
                  again if needed). If Firefox then needs to access that RAM again N
                  seconds later, the CPU generates "hard fault" which forces Linux to
                  free some RAM (e.g. take some RAM from another process), load the
                  missing data from disk and then allow Firefox to continue as usual.
                  This is pretty similar to normal swapping and kswapd0 does it. – Mikko
                  Rantalainen Feb 15 at 13:08




                  If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
                  EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
                  EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!

                  EDIT3 ignore these 2 EDITs, they are not the way to fix this because processes that wouldn't have died before die now sooner, also vm.overcommit_ratio=0 will cause everything to die, or something. I'll try to find a better way, likely needing kernel recompile and I'm looking into the relevant GFP flags...
                  EDIT4: I found a way, through patching kernel, that works for me; see the patch inside this question.






                  share|improve this answer














                  I've found two explanations(of the same thing) as to why kswapd0 does constant disk reading happens well before OOM-killer kills the offending process:



                  1. see the answer and comment of this askubuntu SE answer

                  2. see the answer and David Schwartz's comments of this answer on unix SE

                  I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:




                  For example, consider a case where you have zero swap and system is
                  nearly running out of RAM. The kernel will take memory from e.g.
                  Firefox (it can do this because Firefox is running executable code
                  that has been loaded from disk - the code can be loaded from disk
                  again if needed). If Firefox then needs to access that RAM again N
                  seconds later, the CPU generates "hard fault" which forces Linux to
                  free some RAM (e.g. take some RAM from another process), load the
                  missing data from disk and then allow Firefox to continue as usual.
                  This is pretty similar to normal swapping and kswapd0 does it. – Mikko
                  Rantalainen Feb 15 at 13:08




                  If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!
                  EDIT: I've just found out and tested that vm.overcommit_memory=2 avoids the disk thrashing that leads to frozen OS (even though the OOM-killer doesn't get a change to trigger, which makes sense because it only triggered well after the disk-thrashing anyway, with vm.overcommit_memory=0) like: cc1plus: out of memory allocating 127440 bytes after a total of 897024 bytes
                  EDIT2 ok the above worked for me with the default vm.overcommit_ratio=50 but if I set it to 200 then the disk-thrashing is back!

                  EDIT3 ignore these 2 EDITs, they are not the way to fix this because processes that wouldn't have died before die now sooner, also vm.overcommit_ratio=0 will cause everything to die, or something. I'll try to find a better way, likely needing kernel recompile and I'm looking into the relevant GFP flags...
                  EDIT4: I found a way, through patching kernel, that works for me; see the patch inside this question.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Aug 29 at 11:16

























                  answered Aug 19 at 10:23









                  Marcus Linsner

                  15914




                  15914



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f373312%2foom-killer-doesnt-work-properly-leads-to-a-frozen-os%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Bahrain

                      Postfix configuration issue with fips on centos 7; mailgun relay