Cgroup memory.usage_in_bytes shows incorrect value

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).



I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.



Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?







share|improve this question



















  • A related question is unix.stackexchange.com/questions/449911 .
    – JdeBP
    Jun 22 at 10:16














up vote
0
down vote

favorite












I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).



I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.



Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?







share|improve this question



















  • A related question is unix.stackexchange.com/questions/449911 .
    – JdeBP
    Jun 22 at 10:16












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).



I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.



Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?







share|improve this question











I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).



I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.



Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?









share|improve this question










share|improve this question




share|improve this question









asked Jun 22 at 8:36









DOAN MINH HUNG

1




1











  • A related question is unix.stackexchange.com/questions/449911 .
    – JdeBP
    Jun 22 at 10:16
















  • A related question is unix.stackexchange.com/questions/449911 .
    – JdeBP
    Jun 22 at 10:16















A related question is unix.stackexchange.com/questions/449911 .
– JdeBP
Jun 22 at 10:16




A related question is unix.stackexchange.com/questions/449911 .
– JdeBP
Jun 22 at 10:16










1 Answer
1






active

oldest

votes

















up vote
0
down vote














usage_in_bytes



For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).




If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.



I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.




Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.




-- https://lwn.net/Articles/258238/



If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.






share|improve this answer





















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f451252%2fcgroup-memory-usage-in-bytes-shows-incorrect-value%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote














    usage_in_bytes



    For efficiency, as other kernel components, memory cgroup uses some optimization
    to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
    method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
    value for efficient access. (Of course, when necessary, it's synchronized.)
    If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
    value in memory.stat(see 5.2).




    If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.



    I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.




    Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.




    -- https://lwn.net/Articles/258238/



    If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.






    share|improve this answer

























      up vote
      0
      down vote














      usage_in_bytes



      For efficiency, as other kernel components, memory cgroup uses some optimization
      to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
      method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
      value for efficient access. (Of course, when necessary, it's synchronized.)
      If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
      value in memory.stat(see 5.2).




      If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.



      I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.




      Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.




      -- https://lwn.net/Articles/258238/



      If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote










        usage_in_bytes



        For efficiency, as other kernel components, memory cgroup uses some optimization
        to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
        method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
        value for efficient access. (Of course, when necessary, it's synchronized.)
        If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
        value in memory.stat(see 5.2).




        If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.



        I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.




        Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.




        -- https://lwn.net/Articles/258238/



        If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.






        share|improve this answer














        usage_in_bytes



        For efficiency, as other kernel components, memory cgroup uses some optimization
        to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
        method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
        value for efficient access. (Of course, when necessary, it's synchronized.)
        If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
        value in memory.stat(see 5.2).




        If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.



        I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.




        Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.




        -- https://lwn.net/Articles/258238/



        If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.







        share|improve this answer













        share|improve this answer



        share|improve this answer











        answered Jun 22 at 11:25









        sourcejedi

        18.1k22375




        18.1k22375






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f451252%2fcgroup-memory-usage-in-bytes-shows-incorrect-value%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Christian Cage

            How to properly install USB display driver for Fresco Logic FL2000DX on Ubuntu?