How to cause the kernel to compact fragmented memory

up vote
8
down vote

favorite

I am running Fedora 26.

This is for a very strange assignment given by my algorithms professor. The assignment says:

Memory fragmentation in C:

Design, implement, and execute a C-program that does the following: It allocates memory for a sequence of 3m arrays size 800,000 elements each; then it explicitly deallocates all even-numbered arrays and allocates a sequence of m arrays of size 900,000 elements each. Measure the amounts of time your program requires for the allocation of the first sequence and for the second sequence. Choose m to exhaust almost all of main memory available to your program."

The overall goal of this is to fragment memory then request slightly more than what is available as a contiguous chunk, forcing the operating system to compact or defragment memory.

In class I asked how we should go about doing this since memory is visualized and not actually contiguous, to which he replied: "Well you'll have to turn [virtual memory] off."
Some other students asked in class how we should know when we have hit this "garbage collection," and he said that: "The timing for the second allocation should be greater than the first because of the time taken by garbage collection"

After searching around a bit, the closest thing I could find to disabling virtual memory was to disable swap memory with swapoff -a. I disabled my desktop environment and compiled and ran my program from the native terminal (to avoid possible interference from other processes, especially a heavy one like the Desktop Environment). I did this and ran my program with increasing m until I reached a point where the timing for the second allocation was greater than the first.

I ran the program with increasing m and eventually found a point where the time for the second allocation was more than the time for the first allocation. Along the way however I hit a point where the process was killed before the second allocation. I checked dmesg and saw that it was killed by oom-killer. I found and read several articles about about oom-killer and discovered that you could disable over allocation of memory by the kernel.

I did this and ran my program again, only this time I was not able to find a m such that the timing of the second was higher than the first. Eventually with larger and larger m (though much smaller than when overallocation was enabled) malloc would fail and my program would terminate.

I have three questions, the first of which isn't really that important:

Is garbage collection the correct term for this? My professor is very adamant in saying that this is garbage collection, but I was under the assumption that garbage collection was something done by programming languages and that this would be considered more defragmenting.

Is compaction like he wants possible on a linux system?

Why was I able to reach a point where the time for the second allocation was higher than the first when I disable swap but still had overallocation of memory enabled? Did compaction actually take place? If so why was I not able to hit a point where compaction happened after I disabled overallocation of memory?

edited Feb 8 at 6:07

asked Feb 8 at 5:10

xylafur

435

You cannot "turn off" virtual memory. Also, be aware that in Linux you can logically allocate more memory than you actually have -- the kernel won't actually allocate the pages until you write to them.
â€“Â Andy Dalton
Feb 8 at 6:29

3

I was expecting yet another Write my homework! question. But rather this is a My homework seems badly specified, ill-thought, and impossible. Is it? question. Some of this is Stack Overflow territory, and you will find a lot of Q&As there along the lines of (just to pick one example at random) stackoverflow.com/questions/4039274 .
â€“Â JdeBP
Feb 8 at 7:12

add a commentÂ |Â

up vote
8
down vote

favorite

I am running Fedora 26.

This is for a very strange assignment given by my algorithms professor. The assignment says:

Memory fragmentation in C:

Design, implement, and execute a C-program that does the following: It allocates memory for a sequence of 3m arrays size 800,000 elements each; then it explicitly deallocates all even-numbered arrays and allocates a sequence of m arrays of size 900,000 elements each. Measure the amounts of time your program requires for the allocation of the first sequence and for the second sequence. Choose m to exhaust almost all of main memory available to your program."

The overall goal of this is to fragment memory then request slightly more than what is available as a contiguous chunk, forcing the operating system to compact or defragment memory.

I have three questions, the first of which isn't really that important:

Is garbage collection the correct term for this? My professor is very adamant in saying that this is garbage collection, but I was under the assumption that garbage collection was something done by programming languages and that this would be considered more defragmenting.

Is compaction like he wants possible on a linux system?

Why was I able to reach a point where the time for the second allocation was higher than the first when I disable swap but still had overallocation of memory enabled? Did compaction actually take place? If so why was I not able to hit a point where compaction happened after I disabled overallocation of memory?

edited Feb 8 at 6:07

asked Feb 8 at 5:10

xylafur

435

You cannot "turn off" virtual memory. Also, be aware that in Linux you can logically allocate more memory than you actually have -- the kernel won't actually allocate the pages until you write to them.
â€“Â Andy Dalton
Feb 8 at 6:29

3

I was expecting yet another Write my homework! question. But rather this is a My homework seems badly specified, ill-thought, and impossible. Is it? question. Some of this is Stack Overflow territory, and you will find a lot of Q&As there along the lines of (just to pick one example at random) stackoverflow.com/questions/4039274 .
â€“Â JdeBP
Feb 8 at 7:12

add a commentÂ |Â

up vote
8
down vote

favorite

I am running Fedora 26.

This is for a very strange assignment given by my algorithms professor. The assignment says:

Memory fragmentation in C:

Design, implement, and execute a C-program that does the following: It allocates memory for a sequence of 3m arrays size 800,000 elements each; then it explicitly deallocates all even-numbered arrays and allocates a sequence of m arrays of size 900,000 elements each. Measure the amounts of time your program requires for the allocation of the first sequence and for the second sequence. Choose m to exhaust almost all of main memory available to your program."

The overall goal of this is to fragment memory then request slightly more than what is available as a contiguous chunk, forcing the operating system to compact or defragment memory.

I have three questions, the first of which isn't really that important:

Is garbage collection the correct term for this? My professor is very adamant in saying that this is garbage collection, but I was under the assumption that garbage collection was something done by programming languages and that this would be considered more defragmenting.

Is compaction like he wants possible on a linux system?

Why was I able to reach a point where the time for the second allocation was higher than the first when I disable swap but still had overallocation of memory enabled? Did compaction actually take place? If so why was I not able to hit a point where compaction happened after I disabled overallocation of memory?

edited Feb 8 at 6:07

asked Feb 8 at 5:10

xylafur

435

I am running Fedora 26.

This is for a very strange assignment given by my algorithms professor. The assignment says:

Memory fragmentation in C:

Design, implement, and execute a C-program that does the following: It allocates memory for a sequence of 3m arrays size 800,000 elements each; then it explicitly deallocates all even-numbered arrays and allocates a sequence of m arrays of size 900,000 elements each. Measure the amounts of time your program requires for the allocation of the first sequence and for the second sequence. Choose m to exhaust almost all of main memory available to your program."

The overall goal of this is to fragment memory then request slightly more than what is available as a contiguous chunk, forcing the operating system to compact or defragment memory.

I have three questions, the first of which isn't really that important:

Is garbage collection the correct term for this? My professor is very adamant in saying that this is garbage collection, but I was under the assumption that garbage collection was something done by programming languages and that this would be considered more defragmenting.

Is compaction like he wants possible on a linux system?

Why was I able to reach a point where the time for the second allocation was higher than the first when I disable swap but still had overallocation of memory enabled? Did compaction actually take place? If so why was I not able to hit a point where compaction happened after I disabled overallocation of memory?

edited Feb 8 at 6:07

asked Feb 8 at 5:10

xylafur

435

edited Feb 8 at 6:07

asked Feb 8 at 5:10

xylafur

435

asked Feb 8 at 5:10

xylafur

435

asked Feb 8 at 5:10

xylafur

435

You cannot "turn off" virtual memory. Also, be aware that in Linux you can logically allocate more memory than you actually have -- the kernel won't actually allocate the pages until you write to them.
â€“Â Andy Dalton
Feb 8 at 6:29

3

I was expecting yet another Write my homework! question. But rather this is a My homework seems badly specified, ill-thought, and impossible. Is it? question. Some of this is Stack Overflow territory, and you will find a lot of Q&As there along the lines of (just to pick one example at random) stackoverflow.com/questions/4039274 .
â€“Â JdeBP
Feb 8 at 7:12

add a commentÂ |Â

You cannot "turn off" virtual memory. Also, be aware that in Linux you can logically allocate more memory than you actually have -- the kernel won't actually allocate the pages until you write to them.
â€“Â Andy Dalton
Feb 8 at 6:29

3

I was expecting yet another Write my homework! question. But rather this is a My homework seems badly specified, ill-thought, and impossible. Is it? question. Some of this is Stack Overflow territory, and you will find a lot of Q&As there along the lines of (just to pick one example at random) stackoverflow.com/questions/4039274 .
â€“Â JdeBP
Feb 8 at 7:12

You cannot "turn off" virtual memory. Also, be aware that in Linux you can logically allocate more memory than you actually have -- the kernel won't actually allocate the pages until you write to them.
â€“Â Andy Dalton
Feb 8 at 6:29

I was expecting yet another Write my homework! question. But rather this is a My homework seems badly specified, ill-thought, and impossible. Is it? question. Some of this is Stack Overflow territory, and you will find a lot of Q&As there along the lines of (just to pick one example at random) stackoverflow.com/questions/4039274 .
â€“Â JdeBP
Feb 8 at 7:12

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

Kudos on your research so far, this is indeed an interesting set of questions.

ThereÃ¢Â€Â™s an important aspect to consider here generally: memory allocation is partly the operating systemÃ¢Â€Â™s responsibility, and partly each running processÃ¢Â€Â™s responsibility (ignoring old systems without memory protection and virtual address spaces). The operating system takes care of providing each process with its own address space, and allocating physical memory to processes when necessary. Each process takes care of carving up its address space (to some extent) and ensuring itÃ¢Â€Â™s used appropriately. Note that a processÃ¢Â€Â™s responsibility will be largely invisible to programmers, since the runtime environment takes care of most things.

Now, to answer your questions...

In my mind garbage collection is one step removed from what youÃ¢Â€Â™re doing here. I imagine youÃ¢Â€Â™re writing in C, using malloc() and free(). Garbage collection, where supported by the programming language and runtime environment, takes care of the latter part: it identifies blocks of memory which were previously allocated but are no longer in use (and, importantly, can never again be used), and returns them to the allocator. The question linked in JdeBPÃ¢Â€Â™s comment provides some background, but I find it mostly interesting because it demonstrates that different people have very different opinions on garbage collection, and even what constitutes garbage collection.

In the context weÃ¢Â€Â™re interested in, I would use Ã¢Â€Âœmemory compactionÃ¢Â€Â to talk about the process under discussion.

From a userspace programming perspective, what your professor is asking for isnÃ¢Â€Â™t possible, in C, under Linux, for one simple reason: what we care about here isnÃ¢Â€Â™t physical memory fragmentation, itÃ¢Â€Â™s address space fragmentation. When you allocate your many 800,000-byte blocks, youÃ¢Â€Â™ll end up with just as many pointers to each block. On Linux, at this point, the operating system itself hasnÃ¢Â€Â™t done much, and you donÃ¢Â€Â™t necessarily have physical memory backing each allocation (as an aside, with smaller allocations the operating system wouldnÃ¢Â€Â™t be involved at all, just your C libraryÃ¢Â€Â™s allocator; but the allocations here are large enough that the C library will use mmap, which is handled by the kernel). When you free the odd-numbered blocks, you get back those blocks of address space, but you canÃ¢Â€Â™t change the pointers you have to the other blocks. If you print the pointers out as you go, youÃ¢Â€Â™ll see the difference between them is not much more than the allocation request (802,816 bytes on my system); thereÃ¢Â€Â™s no room between two pointers for a 900,000 byte block. Because your program has actual pointers to each block, rather than some more abstract value (in other contexts, a handle), the runtime environment canÃ¢Â€Â™t do anything about that, and so it canÃ¢Â€Â™t compact its memory to coalesce free blocks.

If you use a programming language where pointers arenÃ¢Â€Â™t a programmer-visible concept, then memory compaction is possible, under Linux. Another possibility would be to use a memory allocation API where the returned values arenÃ¢Â€Â™t pointers; see for example the handle-based heap allocation functions under Windows (where pointers are only valid when a handle is locked).

Your professorÃ¢Â€Â™s exercise is effectively measuring the performance of mmap, which includes its free-block-walking algorithm. You first allocate 3Ã‚Â ÃƒÂ— m blocks, then free half of them, and then start allocating m blocks again; freeing all those blocks dumps a huge load of free blocks on the kernelÃ¢Â€Â™s allocator, which it needs to keep track of (and the time taken by the free calls shows that thereÃ¢Â€Â™s no optimisation being done at this point). If you track the allocation times of each individual block, youÃ¢Â€Â™ll then see that the first 900k allocation takes much, much longer than the others (three orders of magnitude on my system), the second is much faster but still takes a lot longer (two orders of magnitude), and the third allocation is back to typical performance levels. So thereÃ¢Â€Â™s something going on, but the returned pointers show that it wasnÃ¢Â€Â™t memory compaction, at least not allocated block compaction (which, as explained above, is impossible) Ã¢Â€Â” presumably the time corresponds to time processing the data structures the kernel uses to keep track of the available address space in the process (IÃ¢Â€Â™m checking this and will update later). These lengthy allocations can grow to dwarf the overall allocation sequences youÃ¢Â€Â™re measuring, which is when the 900k allocations end up taking longer overall than the 800k allocations.

The reason overcommit changes the behaviour you see is that it changes the exercise from purely manipulating address space, to actually allocating memory, and thus reduces the size of your playground. When you can overcommit, the kernel is only limited by your processÃ¢Â€Â™s address space, so you can allocate far more blocks and put much more pressure on the allocator. When you disable overcommit, the kernel is limited by the available memory, which reduces the value you can have for m down to levels where the allocator isnÃ¢Â€Â™t stressed enough for the allocation times to blow up.

answered Feb 8 at 9:18

Stephen Kitt

142k22308369

Would using calloc() or actually writing to the allocated arrays make any difference here?
â€“Â Kusalananda
Feb 8 at 9:35

Writing to the allocated memory would cancel the overcommit capability, but itÃ¢Â€Â™s hard to deal with because failure then results in the OOM-killer stepping in (and not necessarily killing the over-allocating process). calloc() with large allocations behaves the same as malloc() on Linux, using mmap() to allocate an anonymous mapping, which is zero-filled when first used (so overcommit still works).
â€“Â Stephen Kitt
Feb 8 at 9:48

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f422708%2fhow-to-cause-the-kernel-to-compact-fragmented-memory%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

Kudos on your research so far, this is indeed an interesting set of questions.

Now, to answer your questions...

In my mind garbage collection is one step removed from what youÃ¢Â€Â™re doing here. I imagine youÃ¢Â€Â™re writing in C, using malloc() and free(). Garbage collection, where supported by the programming language and runtime environment, takes care of the latter part: it identifies blocks of memory which were previously allocated but are no longer in use (and, importantly, can never again be used), and returns them to the allocator. The question linked in JdeBPÃ¢Â€Â™s comment provides some background, but I find it mostly interesting because it demonstrates that different people have very different opinions on garbage collection, and even what constitutes garbage collection.

In the context weÃ¢Â€Â™re interested in, I would use Ã¢Â€Âœmemory compactionÃ¢Â€Â to talk about the process under discussion.

From a userspace programming perspective, what your professor is asking for isnÃ¢Â€Â™t possible, in C, under Linux, for one simple reason: what we care about here isnÃ¢Â€Â™t physical memory fragmentation, itÃ¢Â€Â™s address space fragmentation. When you allocate your many 800,000-byte blocks, youÃ¢Â€Â™ll end up with just as many pointers to each block. On Linux, at this point, the operating system itself hasnÃ¢Â€Â™t done much, and you donÃ¢Â€Â™t necessarily have physical memory backing each allocation (as an aside, with smaller allocations the operating system wouldnÃ¢Â€Â™t be involved at all, just your C libraryÃ¢Â€Â™s allocator; but the allocations here are large enough that the C library will use mmap, which is handled by the kernel). When you free the odd-numbered blocks, you get back those blocks of address space, but you canÃ¢Â€Â™t change the pointers you have to the other blocks. If you print the pointers out as you go, youÃ¢Â€Â™ll see the difference between them is not much more than the allocation request (802,816 bytes on my system); thereÃ¢Â€Â™s no room between two pointers for a 900,000 byte block. Because your program has actual pointers to each block, rather than some more abstract value (in other contexts, a handle), the runtime environment canÃ¢Â€Â™t do anything about that, and so it canÃ¢Â€Â™t compact its memory to coalesce free blocks.

If you use a programming language where pointers arenÃ¢Â€Â™t a programmer-visible concept, then memory compaction is possible, under Linux. Another possibility would be to use a memory allocation API where the returned values arenÃ¢Â€Â™t pointers; see for example the handle-based heap allocation functions under Windows (where pointers are only valid when a handle is locked).

Your professorÃ¢Â€Â™s exercise is effectively measuring the performance of mmap, which includes its free-block-walking algorithm. You first allocate 3Ã‚Â ÃƒÂ— m blocks, then free half of them, and then start allocating m blocks again; freeing all those blocks dumps a huge load of free blocks on the kernelÃ¢Â€Â™s allocator, which it needs to keep track of (and the time taken by the free calls shows that thereÃ¢Â€Â™s no optimisation being done at this point). If you track the allocation times of each individual block, youÃ¢Â€Â™ll then see that the first 900k allocation takes much, much longer than the others (three orders of magnitude on my system), the second is much faster but still takes a lot longer (two orders of magnitude), and the third allocation is back to typical performance levels. So thereÃ¢Â€Â™s something going on, but the returned pointers show that it wasnÃ¢Â€Â™t memory compaction, at least not allocated block compaction (which, as explained above, is impossible) Ã¢Â€Â” presumably the time corresponds to time processing the data structures the kernel uses to keep track of the available address space in the process (IÃ¢Â€Â™m checking this and will update later). These lengthy allocations can grow to dwarf the overall allocation sequences youÃ¢Â€Â™re measuring, which is when the 900k allocations end up taking longer overall than the 800k allocations.

The reason overcommit changes the behaviour you see is that it changes the exercise from purely manipulating address space, to actually allocating memory, and thus reduces the size of your playground. When you can overcommit, the kernel is only limited by your processÃ¢Â€Â™s address space, so you can allocate far more blocks and put much more pressure on the allocator. When you disable overcommit, the kernel is limited by the available memory, which reduces the value you can have for m down to levels where the allocator isnÃ¢Â€Â™t stressed enough for the allocation times to blow up.

answered Feb 8 at 9:18

Stephen Kitt

142k22308369

Would using calloc() or actually writing to the allocated arrays make any difference here?
â€“Â Kusalananda
Feb 8 at 9:35

Writing to the allocated memory would cancel the overcommit capability, but itÃ¢Â€Â™s hard to deal with because failure then results in the OOM-killer stepping in (and not necessarily killing the over-allocating process). calloc() with large allocations behaves the same as malloc() on Linux, using mmap() to allocate an anonymous mapping, which is zero-filled when first used (so overcommit still works).
â€“Â Stephen Kitt
Feb 8 at 9:48

add a commentÂ |Â

up vote
5
down vote

accepted

Kudos on your research so far, this is indeed an interesting set of questions.

Now, to answer your questions...

In my mind garbage collection is one step removed from what youÃ¢Â€Â™re doing here. I imagine youÃ¢Â€Â™re writing in C, using malloc() and free(). Garbage collection, where supported by the programming language and runtime environment, takes care of the latter part: it identifies blocks of memory which were previously allocated but are no longer in use (and, importantly, can never again be used), and returns them to the allocator. The question linked in JdeBPÃ¢Â€Â™s comment provides some background, but I find it mostly interesting because it demonstrates that different people have very different opinions on garbage collection, and even what constitutes garbage collection.

In the context weÃ¢Â€Â™re interested in, I would use Ã¢Â€Âœmemory compactionÃ¢Â€Â to talk about the process under discussion.

From a userspace programming perspective, what your professor is asking for isnÃ¢Â€Â™t possible, in C, under Linux, for one simple reason: what we care about here isnÃ¢Â€Â™t physical memory fragmentation, itÃ¢Â€Â™s address space fragmentation. When you allocate your many 800,000-byte blocks, youÃ¢Â€Â™ll end up with just as many pointers to each block. On Linux, at this point, the operating system itself hasnÃ¢Â€Â™t done much, and you donÃ¢Â€Â™t necessarily have physical memory backing each allocation (as an aside, with smaller allocations the operating system wouldnÃ¢Â€Â™t be involved at all, just your C libraryÃ¢Â€Â™s allocator; but the allocations here are large enough that the C library will use mmap, which is handled by the kernel). When you free the odd-numbered blocks, you get back those blocks of address space, but you canÃ¢Â€Â™t change the pointers you have to the other blocks. If you print the pointers out as you go, youÃ¢Â€Â™ll see the difference between them is not much more than the allocation request (802,816 bytes on my system); thereÃ¢Â€Â™s no room between two pointers for a 900,000 byte block. Because your program has actual pointers to each block, rather than some more abstract value (in other contexts, a handle), the runtime environment canÃ¢Â€Â™t do anything about that, and so it canÃ¢Â€Â™t compact its memory to coalesce free blocks.

If you use a programming language where pointers arenÃ¢Â€Â™t a programmer-visible concept, then memory compaction is possible, under Linux. Another possibility would be to use a memory allocation API where the returned values arenÃ¢Â€Â™t pointers; see for example the handle-based heap allocation functions under Windows (where pointers are only valid when a handle is locked).

Your professorÃ¢Â€Â™s exercise is effectively measuring the performance of mmap, which includes its free-block-walking algorithm. You first allocate 3Ã‚Â ÃƒÂ— m blocks, then free half of them, and then start allocating m blocks again; freeing all those blocks dumps a huge load of free blocks on the kernelÃ¢Â€Â™s allocator, which it needs to keep track of (and the time taken by the free calls shows that thereÃ¢Â€Â™s no optimisation being done at this point). If you track the allocation times of each individual block, youÃ¢Â€Â™ll then see that the first 900k allocation takes much, much longer than the others (three orders of magnitude on my system), the second is much faster but still takes a lot longer (two orders of magnitude), and the third allocation is back to typical performance levels. So thereÃ¢Â€Â™s something going on, but the returned pointers show that it wasnÃ¢Â€Â™t memory compaction, at least not allocated block compaction (which, as explained above, is impossible) Ã¢Â€Â” presumably the time corresponds to time processing the data structures the kernel uses to keep track of the available address space in the process (IÃ¢Â€Â™m checking this and will update later). These lengthy allocations can grow to dwarf the overall allocation sequences youÃ¢Â€Â™re measuring, which is when the 900k allocations end up taking longer overall than the 800k allocations.

The reason overcommit changes the behaviour you see is that it changes the exercise from purely manipulating address space, to actually allocating memory, and thus reduces the size of your playground. When you can overcommit, the kernel is only limited by your processÃ¢Â€Â™s address space, so you can allocate far more blocks and put much more pressure on the allocator. When you disable overcommit, the kernel is limited by the available memory, which reduces the value you can have for m down to levels where the allocator isnÃ¢Â€Â™t stressed enough for the allocation times to blow up.

answered Feb 8 at 9:18

Stephen Kitt

142k22308369

Would using calloc() or actually writing to the allocated arrays make any difference here?
â€“Â Kusalananda
Feb 8 at 9:35

Writing to the allocated memory would cancel the overcommit capability, but itÃ¢Â€Â™s hard to deal with because failure then results in the OOM-killer stepping in (and not necessarily killing the over-allocating process). calloc() with large allocations behaves the same as malloc() on Linux, using mmap() to allocate an anonymous mapping, which is zero-filled when first used (so overcommit still works).
â€“Â Stephen Kitt
Feb 8 at 9:48

add a commentÂ |Â

up vote
5
down vote

accepted

Kudos on your research so far, this is indeed an interesting set of questions.

Now, to answer your questions...

In my mind garbage collection is one step removed from what youÃ¢Â€Â™re doing here. I imagine youÃ¢Â€Â™re writing in C, using malloc() and free(). Garbage collection, where supported by the programming language and runtime environment, takes care of the latter part: it identifies blocks of memory which were previously allocated but are no longer in use (and, importantly, can never again be used), and returns them to the allocator. The question linked in JdeBPÃ¢Â€Â™s comment provides some background, but I find it mostly interesting because it demonstrates that different people have very different opinions on garbage collection, and even what constitutes garbage collection.

In the context weÃ¢Â€Â™re interested in, I would use Ã¢Â€Âœmemory compactionÃ¢Â€Â to talk about the process under discussion.

From a userspace programming perspective, what your professor is asking for isnÃ¢Â€Â™t possible, in C, under Linux, for one simple reason: what we care about here isnÃ¢Â€Â™t physical memory fragmentation, itÃ¢Â€Â™s address space fragmentation. When you allocate your many 800,000-byte blocks, youÃ¢Â€Â™ll end up with just as many pointers to each block. On Linux, at this point, the operating system itself hasnÃ¢Â€Â™t done much, and you donÃ¢Â€Â™t necessarily have physical memory backing each allocation (as an aside, with smaller allocations the operating system wouldnÃ¢Â€Â™t be involved at all, just your C libraryÃ¢Â€Â™s allocator; but the allocations here are large enough that the C library will use mmap, which is handled by the kernel). When you free the odd-numbered blocks, you get back those blocks of address space, but you canÃ¢Â€Â™t change the pointers you have to the other blocks. If you print the pointers out as you go, youÃ¢Â€Â™ll see the difference between them is not much more than the allocation request (802,816 bytes on my system); thereÃ¢Â€Â™s no room between two pointers for a 900,000 byte block. Because your program has actual pointers to each block, rather than some more abstract value (in other contexts, a handle), the runtime environment canÃ¢Â€Â™t do anything about that, and so it canÃ¢Â€Â™t compact its memory to coalesce free blocks.

If you use a programming language where pointers arenÃ¢Â€Â™t a programmer-visible concept, then memory compaction is possible, under Linux. Another possibility would be to use a memory allocation API where the returned values arenÃ¢Â€Â™t pointers; see for example the handle-based heap allocation functions under Windows (where pointers are only valid when a handle is locked).

Your professorÃ¢Â€Â™s exercise is effectively measuring the performance of mmap, which includes its free-block-walking algorithm. You first allocate 3Ã‚Â ÃƒÂ— m blocks, then free half of them, and then start allocating m blocks again; freeing all those blocks dumps a huge load of free blocks on the kernelÃ¢Â€Â™s allocator, which it needs to keep track of (and the time taken by the free calls shows that thereÃ¢Â€Â™s no optimisation being done at this point). If you track the allocation times of each individual block, youÃ¢Â€Â™ll then see that the first 900k allocation takes much, much longer than the others (three orders of magnitude on my system), the second is much faster but still takes a lot longer (two orders of magnitude), and the third allocation is back to typical performance levels. So thereÃ¢Â€Â™s something going on, but the returned pointers show that it wasnÃ¢Â€Â™t memory compaction, at least not allocated block compaction (which, as explained above, is impossible) Ã¢Â€Â” presumably the time corresponds to time processing the data structures the kernel uses to keep track of the available address space in the process (IÃ¢Â€Â™m checking this and will update later). These lengthy allocations can grow to dwarf the overall allocation sequences youÃ¢Â€Â™re measuring, which is when the 900k allocations end up taking longer overall than the 800k allocations.

The reason overcommit changes the behaviour you see is that it changes the exercise from purely manipulating address space, to actually allocating memory, and thus reduces the size of your playground. When you can overcommit, the kernel is only limited by your processÃ¢Â€Â™s address space, so you can allocate far more blocks and put much more pressure on the allocator. When you disable overcommit, the kernel is limited by the available memory, which reduces the value you can have for m down to levels where the allocator isnÃ¢Â€Â™t stressed enough for the allocation times to blow up.

answered Feb 8 at 9:18

Stephen Kitt

142k22308369

Kudos on your research so far, this is indeed an interesting set of questions.

Now, to answer your questions...

In my mind garbage collection is one step removed from what youÃ¢Â€Â™re doing here. I imagine youÃ¢Â€Â™re writing in C, using malloc() and free(). Garbage collection, where supported by the programming language and runtime environment, takes care of the latter part: it identifies blocks of memory which were previously allocated but are no longer in use (and, importantly, can never again be used), and returns them to the allocator. The question linked in JdeBPÃ¢Â€Â™s comment provides some background, but I find it mostly interesting because it demonstrates that different people have very different opinions on garbage collection, and even what constitutes garbage collection.

In the context weÃ¢Â€Â™re interested in, I would use Ã¢Â€Âœmemory compactionÃ¢Â€Â to talk about the process under discussion.

From a userspace programming perspective, what your professor is asking for isnÃ¢Â€Â™t possible, in C, under Linux, for one simple reason: what we care about here isnÃ¢Â€Â™t physical memory fragmentation, itÃ¢Â€Â™s address space fragmentation. When you allocate your many 800,000-byte blocks, youÃ¢Â€Â™ll end up with just as many pointers to each block. On Linux, at this point, the operating system itself hasnÃ¢Â€Â™t done much, and you donÃ¢Â€Â™t necessarily have physical memory backing each allocation (as an aside, with smaller allocations the operating system wouldnÃ¢Â€Â™t be involved at all, just your C libraryÃ¢Â€Â™s allocator; but the allocations here are large enough that the C library will use mmap, which is handled by the kernel). When you free the odd-numbered blocks, you get back those blocks of address space, but you canÃ¢Â€Â™t change the pointers you have to the other blocks. If you print the pointers out as you go, youÃ¢Â€Â™ll see the difference between them is not much more than the allocation request (802,816 bytes on my system); thereÃ¢Â€Â™s no room between two pointers for a 900,000 byte block. Because your program has actual pointers to each block, rather than some more abstract value (in other contexts, a handle), the runtime environment canÃ¢Â€Â™t do anything about that, and so it canÃ¢Â€Â™t compact its memory to coalesce free blocks.

If you use a programming language where pointers arenÃ¢Â€Â™t a programmer-visible concept, then memory compaction is possible, under Linux. Another possibility would be to use a memory allocation API where the returned values arenÃ¢Â€Â™t pointers; see for example the handle-based heap allocation functions under Windows (where pointers are only valid when a handle is locked).

Your professorÃ¢Â€Â™s exercise is effectively measuring the performance of mmap, which includes its free-block-walking algorithm. You first allocate 3Ã‚Â ÃƒÂ— m blocks, then free half of them, and then start allocating m blocks again; freeing all those blocks dumps a huge load of free blocks on the kernelÃ¢Â€Â™s allocator, which it needs to keep track of (and the time taken by the free calls shows that thereÃ¢Â€Â™s no optimisation being done at this point). If you track the allocation times of each individual block, youÃ¢Â€Â™ll then see that the first 900k allocation takes much, much longer than the others (three orders of magnitude on my system), the second is much faster but still takes a lot longer (two orders of magnitude), and the third allocation is back to typical performance levels. So thereÃ¢Â€Â™s something going on, but the returned pointers show that it wasnÃ¢Â€Â™t memory compaction, at least not allocated block compaction (which, as explained above, is impossible) Ã¢Â€Â” presumably the time corresponds to time processing the data structures the kernel uses to keep track of the available address space in the process (IÃ¢Â€Â™m checking this and will update later). These lengthy allocations can grow to dwarf the overall allocation sequences youÃ¢Â€Â™re measuring, which is when the 900k allocations end up taking longer overall than the 800k allocations.

The reason overcommit changes the behaviour you see is that it changes the exercise from purely manipulating address space, to actually allocating memory, and thus reduces the size of your playground. When you can overcommit, the kernel is only limited by your processÃ¢Â€Â™s address space, so you can allocate far more blocks and put much more pressure on the allocator. When you disable overcommit, the kernel is limited by the available memory, which reduces the value you can have for m down to levels where the allocator isnÃ¢Â€Â™t stressed enough for the allocation times to blow up.

answered Feb 8 at 9:18

Stephen Kitt

142k22308369

answered Feb 8 at 9:18

Stephen Kitt

142k22308369

answered Feb 8 at 9:18

Stephen Kitt

142k22308369

answered Feb 8 at 9:18

Stephen Kitt

142k22308369

Would using calloc() or actually writing to the allocated arrays make any difference here?
â€“Â Kusalananda
Feb 8 at 9:35

Writing to the allocated memory would cancel the overcommit capability, but itÃ¢Â€Â™s hard to deal with because failure then results in the OOM-killer stepping in (and not necessarily killing the over-allocating process). calloc() with large allocations behaves the same as malloc() on Linux, using mmap() to allocate an anonymous mapping, which is zero-filled when first used (so overcommit still works).
â€“Â Stephen Kitt
Feb 8 at 9:48

add a commentÂ |Â

Would using calloc() or actually writing to the allocated arrays make any difference here?
â€“Â Kusalananda
Feb 8 at 9:35

Writing to the allocated memory would cancel the overcommit capability, but itÃ¢Â€Â™s hard to deal with because failure then results in the OOM-killer stepping in (and not necessarily killing the over-allocating process). calloc() with large allocations behaves the same as malloc() on Linux, using mmap() to allocate an anonymous mapping, which is zero-filled when first used (so overcommit still works).
â€“Â Stephen Kitt
Feb 8 at 9:48

Would using calloc() or actually writing to the allocated arrays make any difference here?
â€“Â Kusalananda
Feb 8 at 9:35

Writing to the allocated memory would cancel the overcommit capability, but itÃ¢Â€Â™s hard to deal with because failure then results in the OOM-killer stepping in (and not necessarily killing the over-allocating process). calloc() with large allocations behaves the same as malloc() on Linux, using mmap() to allocate an anonymous mapping, which is zero-filled when first used (so overcommit still works).
â€“Â Stephen Kitt
Feb 8 at 9:48

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu