Cgroup memory.usage_in_bytes shows incorrect value

up vote
0
down vote

favorite

I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).

I read in a reliable documentation, the memory.usage_in_bytes doesn't show the exact value of memory (and swap) usage. If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat.

Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?

asked Jun 22 at 8:36

DOAN MINH HUNG

A related question is unix.stackexchange.com/questions/449911 .
â€“Â JdeBP
Jun 22 at 10:16

add a commentÂ |Â

up vote
0
down vote

favorite

I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).

Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?

asked Jun 22 at 8:36

DOAN MINH HUNG

A related question is unix.stackexchange.com/questions/449911 .
â€“Â JdeBP
Jun 22 at 10:16

add a commentÂ |Â

up vote
0
down vote

favorite

I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).

Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?

asked Jun 22 at 8:36

DOAN MINH HUNG

I created a Cgroup on my Linux device and noticed that the value of memory.usage_in_bytes (4325376) is bigger than the sum of RSS, CACHE and SWAP (4194304).

Does anyone have idea about this? Should I use the value of memory.usage_in_bytes?

asked Jun 22 at 8:36

DOAN MINH HUNG

asked Jun 22 at 8:36

DOAN MINH HUNG

asked Jun 22 at 8:36

DOAN MINH HUNG

asked Jun 22 at 8:36

DOAN MINH HUNG

A related question is unix.stackexchange.com/questions/449911 .
â€“Â JdeBP
Jun 22 at 10:16

add a commentÂ |Â

A related question is unix.stackexchange.com/questions/449911 .
â€“Â JdeBP
Jun 22 at 10:16

A related question is unix.stackexchange.com/questions/449911 .
â€“Â JdeBP
Jun 22 at 10:16

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
0
down vote

usage_in_bytes

For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).

If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.

I don't know how it works. However as an LWN.net reader, this sounds similar to a situation where there can be multiple counters which are kept local to where they are used, and where somehow reading usage_in_bytes does not force an immediate synchronization and summary of these counters.

Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.

-- https://lwn.net/Articles/258238/

If you have a small system - not a large number of cpus, and not maintaining a large number of cgroups - the overhead doesn't sound very significant. It doesn't cost very much to bounce a few cache lines between cpus. But the efficiency might be useful if you are Google and running thousands of containers on a system :), or if you have some system with thousands of cpus.

answered Jun 22 at 11:25

sourcejedi

18.1k22375

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f451252%2fcgroup-memory-usage-in-bytes-shows-incorrect-value%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

usage_in_bytes

For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).

If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.

Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.

-- https://lwn.net/Articles/258238/

answered Jun 22 at 11:25

sourcejedi

18.1k22375

add a commentÂ |Â

up vote
0
down vote

usage_in_bytes

For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).

If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.

Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.

-- https://lwn.net/Articles/258238/

answered Jun 22 at 11:25

sourcejedi

18.1k22375

add a commentÂ |Â

up vote
0
down vote

usage_in_bytes

For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).

If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.

Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.

-- https://lwn.net/Articles/258238/

answered Jun 22 at 11:25

sourcejedi

18.1k22375

usage_in_bytes

For efficiency, as other kernel components, memory cgroup uses some optimization
to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
value for efficient access. (Of course, when necessary, it's synchronized.)
If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
value in memory.stat(see 5.2).

If you are reading the accurate files repeatedly, and you have measured your system performance as being limited by cpu usage, you might consider sampling memory_usage instead.

Over the years, kernel developers have made increasing use of per-CPU data in an effort to minimize memory contention and its associated performance penalties. As a simple example, consider the disk operation statistics maintained by the block layer. Incrementing a global counter for every disk operation would cause the associated cache line to bounce continually between processors; disk operations are frequent enough that the performance cost would be measurable. So each CPU maintains its own set of counters locally; it never has to contend with any other CPU to increment one of those counters. When a total count is needed, all of the per-CPU counters are added up. Given that the counters are queried far more rarely than they are modified, storing them in per-CPU form yields a significant performance improvement.

-- https://lwn.net/Articles/258238/

answered Jun 22 at 11:25

sourcejedi

18.1k22375

answered Jun 22 at 11:25

sourcejedi

18.1k22375

answered Jun 22 at 11:25

sourcejedi

18.1k22375

answered Jun 22 at 11:25

sourcejedi

18.1k22375

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu