Cgroup memory.usage_in_bytes shows incorrect value

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).



I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.



Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?







share|improve this question



















  • A related question is unix.stackexchange.com/questions/449911 .
    – JdeBP
    Jun 22 at 10:16














up vote
0
down vote

favorite












I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).



I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.



Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?







share|improve this question



















  • A related question is unix.stackexchange.com/questions/449911 .
    – JdeBP
    Jun 22 at 10:16












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).



I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.



Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?







share|improve this question











I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).



I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.



Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?









share|improve this question










share|improve this question




share|improve this question









asked Jun 22 at 8:36









DOAN MINH HUNG

1




1











  • A related question is unix.stackexchange.com/questions/449911 .
    – JdeBP
    Jun 22 at 10:16
















  • A related question is unix.stackexchange.com/questions/449911 .
    – JdeBP
    Jun 22 at 10:16















A related question is unix.stackexchange.com/questions/449911 .
– JdeBP
Jun 22 at 10:16




A related question is unix.stackexchange.com/questions/449911 .
– JdeBP
Jun 22 at 10:16










1 Answer
1






active

oldest

votes

















up vote
0
down vote














usage_in_bytes



For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).




If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.



I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.




Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.




-- https://lwn.net/Articles/258238/



If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.






share|improve this answer





















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f451252%2fcgroup-memory-usage-in-bytes-shows-incorrect-value%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote














    usage_in_bytes



    For efficiency, as other kernel components, memory cgroup uses some optimization
    to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
    method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
    value for efficient access. (Of course, when necessary, it's synchronized.)
    If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
    value in memory.stat(see 5.2).




    If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.



    I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.




    Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.




    -- https://lwn.net/Articles/258238/



    If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.






    share|improve this answer

























      up vote
      0
      down vote














      usage_in_bytes



      For efficiency, as other kernel components, memory cgroup uses some optimization
      to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
      method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
      value for efficient access. (Of course, when necessary, it's synchronized.)
      If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
      value in memory.stat(see 5.2).




      If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.



      I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.




      Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.




      -- https://lwn.net/Articles/258238/



      If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote










        usage_in_bytes



        For efficiency, as other kernel components, memory cgroup uses some optimization
        to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
        method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
        value for efficient access. (Of course, when necessary, it's synchronized.)
        If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
        value in memory.stat(see 5.2).




        If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.



        I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.




        Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.




        -- https://lwn.net/Articles/258238/



        If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.






        share|improve this answer














        usage_in_bytes



        For efficiency, as other kernel components, memory cgroup uses some optimization
        to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
        method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
        value for efficient access. (Of course, when necessary, it's synchronized.)
        If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
        value in memory.stat(see 5.2).




        If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.



        I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.




        Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.




        -- https://lwn.net/Articles/258238/



        If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.







        share|improve this answer













        share|improve this answer



        share|improve this answer











        answered Jun 22 at 11:25









        sourcejedi

        18.1k22375




        18.1k22375






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f451252%2fcgroup-memory-usage-in-bytes-shows-incorrect-value%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?