squashfs double caching problem

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












Let's say there's a large squashfs image on a file. So it's mounted as a loopback device. Now as I understand, kernels from 4.4 and up have eliminated double caching on loopback devices. But unfortunately, not from squashfs.



When you read some things from the mounted squashfs, the compressed portion from the image is read and cached by Linux. The decompressed data that you accesses is also cached, so that reading from it again will be very fast and won't necessitate decompression again.



The second cache is very good, since it provides fast access. The first cache is pretty much redundant and completely useless. It pollutes the RAM usage and will drop other cached entries (or force applications to swap) that are actually useful. This is basically the double caching problem.



As long as the files are already cached, decompressed, there's no point keeping a cached version of the compressed data.



If the kernel really had to drop these caches later, it would drop the compressed data cache first (less recently used), and then the decompressed data. You will only realize that when you read after, because it will have to re-read from the drive and decompress it again. But caching the compressed data on the drive is pointless!



So, to summarize:



  • Keep the decompressed data cached so it's accessed very fast

  • Don't keep the "compressed" squashfs data (on the file) at all

I tried mounting it with the sync option, but it doesn't do anything.



There is a workaround that's more of a kludge than not. Basically, the following command drops the cache on the squashfs compressed data (good), and not on the decompressed data (good):



dd if=root.squashfs iflag=nocache count=0


I don't touch the mountpoint, as that would drop the decompressed data (which would be bad). Instead I touch the loop device's underlying file, as that's what I don't want cached (as it's pointless).



The problem is that the command has to be polled over and over again, since reads can happen from any application at any time. So the "kludge" is setting up the above command to execute every second or so.



Clearly, this is inelegant and a complete hack. But at least it shows exactly what I'm after. It just drops the file cache for that file itself (not the decompressed files). Imagine that command running every millisecond, that's exactly what I'd want, but without polling like this. Any better ways to do it?







share|improve this question























    up vote
    1
    down vote

    favorite












    Let's say there's a large squashfs image on a file. So it's mounted as a loopback device. Now as I understand, kernels from 4.4 and up have eliminated double caching on loopback devices. But unfortunately, not from squashfs.



    When you read some things from the mounted squashfs, the compressed portion from the image is read and cached by Linux. The decompressed data that you accesses is also cached, so that reading from it again will be very fast and won't necessitate decompression again.



    The second cache is very good, since it provides fast access. The first cache is pretty much redundant and completely useless. It pollutes the RAM usage and will drop other cached entries (or force applications to swap) that are actually useful. This is basically the double caching problem.



    As long as the files are already cached, decompressed, there's no point keeping a cached version of the compressed data.



    If the kernel really had to drop these caches later, it would drop the compressed data cache first (less recently used), and then the decompressed data. You will only realize that when you read after, because it will have to re-read from the drive and decompress it again. But caching the compressed data on the drive is pointless!



    So, to summarize:



    • Keep the decompressed data cached so it's accessed very fast

    • Don't keep the "compressed" squashfs data (on the file) at all

    I tried mounting it with the sync option, but it doesn't do anything.



    There is a workaround that's more of a kludge than not. Basically, the following command drops the cache on the squashfs compressed data (good), and not on the decompressed data (good):



    dd if=root.squashfs iflag=nocache count=0


    I don't touch the mountpoint, as that would drop the decompressed data (which would be bad). Instead I touch the loop device's underlying file, as that's what I don't want cached (as it's pointless).



    The problem is that the command has to be polled over and over again, since reads can happen from any application at any time. So the "kludge" is setting up the above command to execute every second or so.



    Clearly, this is inelegant and a complete hack. But at least it shows exactly what I'm after. It just drops the file cache for that file itself (not the decompressed files). Imagine that command running every millisecond, that's exactly what I'd want, but without polling like this. Any better ways to do it?







    share|improve this question





















      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      Let's say there's a large squashfs image on a file. So it's mounted as a loopback device. Now as I understand, kernels from 4.4 and up have eliminated double caching on loopback devices. But unfortunately, not from squashfs.



      When you read some things from the mounted squashfs, the compressed portion from the image is read and cached by Linux. The decompressed data that you accesses is also cached, so that reading from it again will be very fast and won't necessitate decompression again.



      The second cache is very good, since it provides fast access. The first cache is pretty much redundant and completely useless. It pollutes the RAM usage and will drop other cached entries (or force applications to swap) that are actually useful. This is basically the double caching problem.



      As long as the files are already cached, decompressed, there's no point keeping a cached version of the compressed data.



      If the kernel really had to drop these caches later, it would drop the compressed data cache first (less recently used), and then the decompressed data. You will only realize that when you read after, because it will have to re-read from the drive and decompress it again. But caching the compressed data on the drive is pointless!



      So, to summarize:



      • Keep the decompressed data cached so it's accessed very fast

      • Don't keep the "compressed" squashfs data (on the file) at all

      I tried mounting it with the sync option, but it doesn't do anything.



      There is a workaround that's more of a kludge than not. Basically, the following command drops the cache on the squashfs compressed data (good), and not on the decompressed data (good):



      dd if=root.squashfs iflag=nocache count=0


      I don't touch the mountpoint, as that would drop the decompressed data (which would be bad). Instead I touch the loop device's underlying file, as that's what I don't want cached (as it's pointless).



      The problem is that the command has to be polled over and over again, since reads can happen from any application at any time. So the "kludge" is setting up the above command to execute every second or so.



      Clearly, this is inelegant and a complete hack. But at least it shows exactly what I'm after. It just drops the file cache for that file itself (not the decompressed files). Imagine that command running every millisecond, that's exactly what I'd want, but without polling like this. Any better ways to do it?







      share|improve this question











      Let's say there's a large squashfs image on a file. So it's mounted as a loopback device. Now as I understand, kernels from 4.4 and up have eliminated double caching on loopback devices. But unfortunately, not from squashfs.



      When you read some things from the mounted squashfs, the compressed portion from the image is read and cached by Linux. The decompressed data that you accesses is also cached, so that reading from it again will be very fast and won't necessitate decompression again.



      The second cache is very good, since it provides fast access. The first cache is pretty much redundant and completely useless. It pollutes the RAM usage and will drop other cached entries (or force applications to swap) that are actually useful. This is basically the double caching problem.



      As long as the files are already cached, decompressed, there's no point keeping a cached version of the compressed data.



      If the kernel really had to drop these caches later, it would drop the compressed data cache first (less recently used), and then the decompressed data. You will only realize that when you read after, because it will have to re-read from the drive and decompress it again. But caching the compressed data on the drive is pointless!



      So, to summarize:



      • Keep the decompressed data cached so it's accessed very fast

      • Don't keep the "compressed" squashfs data (on the file) at all

      I tried mounting it with the sync option, but it doesn't do anything.



      There is a workaround that's more of a kludge than not. Basically, the following command drops the cache on the squashfs compressed data (good), and not on the decompressed data (good):



      dd if=root.squashfs iflag=nocache count=0


      I don't touch the mountpoint, as that would drop the decompressed data (which would be bad). Instead I touch the loop device's underlying file, as that's what I don't want cached (as it's pointless).



      The problem is that the command has to be polled over and over again, since reads can happen from any application at any time. So the "kludge" is setting up the above command to execute every second or so.



      Clearly, this is inelegant and a complete hack. But at least it shows exactly what I'm after. It just drops the file cache for that file itself (not the decompressed files). Imagine that command running every millisecond, that's exactly what I'd want, but without polling like this. Any better ways to do it?









      share|improve this question










      share|improve this question




      share|improve this question









      asked May 17 at 15:41









      kktsuri

      655




      655

























          active

          oldest

          votes











          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f444408%2fsquashfs-double-caching-problem%23new-answer', 'question_page');

          );

          Post as a guest



































          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes










           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f444408%2fsquashfs-double-caching-problem%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Displaying single band from multi-band raster using QGIS

          How many registers does an x86_64 CPU actually have?