Cgroup memory.usage_in_bytes shows incorrect value
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).
I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.
Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?
memory cgroups
add a comment |Â
up vote
0
down vote
favorite
I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).
I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.
Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?
memory cgroups
A related question is unix.stackexchange.com/questions/449911 .
â JdeBP
Jun 22 at 10:16
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).
I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.
Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?
memory cgroups
I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).
I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.
Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?
memory cgroups
asked Jun 22 at 8:36
DOAN MINH HUNG
1
1
A related question is unix.stackexchange.com/questions/449911 .
â JdeBP
Jun 22 at 10:16
add a comment |Â
A related question is unix.stackexchange.com/questions/449911 .
â JdeBP
Jun 22 at 10:16
A related question is unix.stackexchange.com/questions/449911 .
â JdeBP
Jun 22 at 10:16
A related question is unix.stackexchange.com/questions/449911 .
â JdeBP
Jun 22 at 10:16
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
0
down vote
usage_in_bytes
For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).
If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.
I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.
Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.
-- https://lwn.net/Articles/258238/
If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
usage_in_bytes
For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).
If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.
I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.
Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.
-- https://lwn.net/Articles/258238/
If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.
add a comment |Â
up vote
0
down vote
usage_in_bytes
For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).
If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.
I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.
Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.
-- https://lwn.net/Articles/258238/
If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
usage_in_bytes
For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).
If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.
I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.
Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.
-- https://lwn.net/Articles/258238/
If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.
usage_in_bytes
For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).
If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.
I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.
Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.
-- https://lwn.net/Articles/258238/
If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.
answered Jun 22 at 11:25
sourcejedi
18.1k22375
18.1k22375
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f451252%2fcgroup-memory-usage-in-bytes-shows-incorrect-value%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
A related question is unix.stackexchange.com/questions/449911 .
â JdeBP
Jun 22 at 10:16