What could explain this strange sparse file handling of/in tmpfs?

up vote
12
down vote

favorite

On my ext4 filesystem partition I can run the following code:

fs="/mnt/ext4"

#create sparse 100M file on $fs
dd if=/dev/zero 
 of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2> /dev/null

#show its actual used size before
echo "Before:"
ls $fs/sparse100M -s

#setting the sparse file up as loopback and run md5sum on loopback
losetup /dev/loop0 $fs/sparse100M 
md5sum /dev/loop0

#show its actual used size afterwards
echo "After:"
ls $fs/sparse100M -s

#release loopback and remove file
losetup -d /dev/loop0
rm $fs/sparse100M

which yields

Before:
0 sparse100M
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
After:
0 sparse100M

Doing the very same thing on tmpfs as with:

fs="/tmp"

yields

Before:
0 /tmp/sparse100M
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
After:
102400 /tmp/sparse100M

which basically means that something I expected to merely read the data, caused the sparse file to "blow up like a balloon"?

I expect that is because of less perfect support for sparse file in tmpfs filesystem, and in particular because of the missing FIEMAP ioctl, but I am not sure what causes this behaviour? Can you tell me?

edited Sep 20 at 19:28

asked Mar 8 '17 at 23:54

humanityANDpeace

4,65743350

hum. There is a shared (copy-on-write) zero page, that could be used when a sparse page needed to be mmap()ed, for example. So I'm not sure why any type of read from a sparse tmpfs file would require allocating real memory. lwn.net/Articles/517465 . I wondered if this was some side effect of the conversion of loop to use direct io, but it seems there should not be any difference when you try to use the new type of loop on tmpfs. spinics.net/lists/linux-fsdevel/msg60337.html
â€“Â sourcejedi
Sep 15 at 19:26

maybe this might get an answer if it were on SO ? just a thought
â€“Â Marcus Linsner
Sep 15 at 22:22

1

The output of /tmp has different files Before/After. Is that a typo? Before: 0 /tmp/sparse100 (without M at the end) After: 102400 /tmp/sparse100M (with the trailing M).
â€“Â YoMismo
Sep 19 at 13:16

@YoMismo, yes was a only a little typo
â€“Â humanityANDpeace
Sep 21 at 8:13

add a commentÂ |Â

up vote
12
down vote

favorite

On my ext4 filesystem partition I can run the following code:

fs="/mnt/ext4"

#create sparse 100M file on $fs
dd if=/dev/zero 
 of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2> /dev/null

#show its actual used size before
echo "Before:"
ls $fs/sparse100M -s

#setting the sparse file up as loopback and run md5sum on loopback
losetup /dev/loop0 $fs/sparse100M 
md5sum /dev/loop0

#show its actual used size afterwards
echo "After:"
ls $fs/sparse100M -s

#release loopback and remove file
losetup -d /dev/loop0
rm $fs/sparse100M

which yields

Before:
0 sparse100M
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
After:
0 sparse100M

Doing the very same thing on tmpfs as with:

fs="/tmp"

yields

Before:
0 /tmp/sparse100M
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
After:
102400 /tmp/sparse100M

which basically means that something I expected to merely read the data, caused the sparse file to "blow up like a balloon"?

edited Sep 20 at 19:28

asked Mar 8 '17 at 23:54

humanityANDpeace

4,65743350

hum. There is a shared (copy-on-write) zero page, that could be used when a sparse page needed to be mmap()ed, for example. So I'm not sure why any type of read from a sparse tmpfs file would require allocating real memory. lwn.net/Articles/517465 . I wondered if this was some side effect of the conversion of loop to use direct io, but it seems there should not be any difference when you try to use the new type of loop on tmpfs. spinics.net/lists/linux-fsdevel/msg60337.html
â€“Â sourcejedi
Sep 15 at 19:26

maybe this might get an answer if it were on SO ? just a thought
â€“Â Marcus Linsner
Sep 15 at 22:22

1

The output of /tmp has different files Before/After. Is that a typo? Before: 0 /tmp/sparse100 (without M at the end) After: 102400 /tmp/sparse100M (with the trailing M).
â€“Â YoMismo
Sep 19 at 13:16

@YoMismo, yes was a only a little typo
â€“Â humanityANDpeace
Sep 21 at 8:13

add a commentÂ |Â

up vote
12
down vote

favorite

On my ext4 filesystem partition I can run the following code:

fs="/mnt/ext4"

#create sparse 100M file on $fs
dd if=/dev/zero 
 of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2> /dev/null

#show its actual used size before
echo "Before:"
ls $fs/sparse100M -s

#setting the sparse file up as loopback and run md5sum on loopback
losetup /dev/loop0 $fs/sparse100M 
md5sum /dev/loop0

#show its actual used size afterwards
echo "After:"
ls $fs/sparse100M -s

#release loopback and remove file
losetup -d /dev/loop0
rm $fs/sparse100M

which yields

Before:
0 sparse100M
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
After:
0 sparse100M

Doing the very same thing on tmpfs as with:

fs="/tmp"

yields

Before:
0 /tmp/sparse100M
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
After:
102400 /tmp/sparse100M

which basically means that something I expected to merely read the data, caused the sparse file to "blow up like a balloon"?

edited Sep 20 at 19:28

asked Mar 8 '17 at 23:54

humanityANDpeace

4,65743350

On my ext4 filesystem partition I can run the following code:

fs="/mnt/ext4"

#create sparse 100M file on $fs
dd if=/dev/zero 
 of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2> /dev/null

#show its actual used size before
echo "Before:"
ls $fs/sparse100M -s

#setting the sparse file up as loopback and run md5sum on loopback
losetup /dev/loop0 $fs/sparse100M 
md5sum /dev/loop0

#show its actual used size afterwards
echo "After:"
ls $fs/sparse100M -s

#release loopback and remove file
losetup -d /dev/loop0
rm $fs/sparse100M

which yields

Before:
0 sparse100M
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
After:
0 sparse100M

Doing the very same thing on tmpfs as with:

fs="/tmp"

yields

Before:
0 /tmp/sparse100M
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
After:
102400 /tmp/sparse100M

which basically means that something I expected to merely read the data, caused the sparse file to "blow up like a balloon"?

ext4 tmpfs sparse-files

edited Sep 20 at 19:28

asked Mar 8 '17 at 23:54

humanityANDpeace

4,65743350

edited Sep 20 at 19:28

asked Mar 8 '17 at 23:54

humanityANDpeace

4,65743350

edited Sep 20 at 19:28

asked Mar 8 '17 at 23:54

humanityANDpeace

4,65743350

asked Mar 8 '17 at 23:54

humanityANDpeace

4,65743350

asked Mar 8 '17 at 23:54

humanityANDpeace

4,65743350

hum. There is a shared (copy-on-write) zero page, that could be used when a sparse page needed to be mmap()ed, for example. So I'm not sure why any type of read from a sparse tmpfs file would require allocating real memory. lwn.net/Articles/517465 . I wondered if this was some side effect of the conversion of loop to use direct io, but it seems there should not be any difference when you try to use the new type of loop on tmpfs. spinics.net/lists/linux-fsdevel/msg60337.html
â€“Â sourcejedi
Sep 15 at 19:26

maybe this might get an answer if it were on SO ? just a thought
â€“Â Marcus Linsner
Sep 15 at 22:22

1

The output of /tmp has different files Before/After. Is that a typo? Before: 0 /tmp/sparse100 (without M at the end) After: 102400 /tmp/sparse100M (with the trailing M).
â€“Â YoMismo
Sep 19 at 13:16

@YoMismo, yes was a only a little typo
â€“Â humanityANDpeace
Sep 21 at 8:13

add a commentÂ |Â

hum. There is a shared (copy-on-write) zero page, that could be used when a sparse page needed to be mmap()ed, for example. So I'm not sure why any type of read from a sparse tmpfs file would require allocating real memory. lwn.net/Articles/517465 . I wondered if this was some side effect of the conversion of loop to use direct io, but it seems there should not be any difference when you try to use the new type of loop on tmpfs. spinics.net/lists/linux-fsdevel/msg60337.html
â€“Â sourcejedi
Sep 15 at 19:26

maybe this might get an answer if it were on SO ? just a thought
â€“Â Marcus Linsner
Sep 15 at 22:22

1

The output of /tmp has different files Before/After. Is that a typo? Before: 0 /tmp/sparse100 (without M at the end) After: 102400 /tmp/sparse100M (with the trailing M).
â€“Â YoMismo
Sep 19 at 13:16

@YoMismo, yes was a only a little typo
â€“Â humanityANDpeace
Sep 21 at 8:13

hum. There is a shared (copy-on-write) zero page, that could be used when a sparse page needed to be mmap()ed, for example. So I'm not sure why any type of read from a sparse tmpfs file would require allocating real memory. lwn.net/Articles/517465 . I wondered if this was some side effect of the conversion of loop to use direct io, but it seems there should not be any difference when you try to use the new type of loop on tmpfs. spinics.net/lists/linux-fsdevel/msg60337.html
â€“Â sourcejedi
Sep 15 at 19:26

maybe this might get an answer if it were on SO ? just a thought
â€“Â Marcus Linsner
Sep 15 at 22:22

The output of /tmp has different files Before/After. Is that a typo? Before: 0 /tmp/sparse100 (without M at the end) After: 102400 /tmp/sparse100M (with the trailing M).
â€“Â YoMismo
Sep 19 at 13:16

@YoMismo, yes was a only a little typo
â€“Â humanityANDpeace
Sep 21 at 8:13

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
4
down vote

+500

First off you're not alone in puzzling about these sorts of issues.

This is not just limited to tmpfs but has been a concern cited with
NFSv4.

If an application reads 'holes' in a sparse file, the file system converts empty blocks into "real" blocks filled with zeros, and returns them to the application.

When md5sum is attempting to scan a file it explicitly chooses to do this in
sequential order, which makes a lot of sense based on what md5sum is
attempting to do.

As there are fundamentally "holes" in the file, this sequential reading is going
to (in some situations) cause a copy on write like operation to fill out the file. This then gets
into a deeper issue around whether or not fallocate() as implemented in the
filesystem supports FALLOC_FL_PUNCH_HOLE.

Fortunately, not only does tmpfs support this but there is a mechanism to
"dig" the holes back out.

Using the CLI utility fallocate we can successfuly detect and re-dig these
holes.

As per man 1 fallocate:

-d, --dig-holes
 Detect and dig holes. This makes the file sparse in-place, without
 using extra disk space. The minimum size of the hole depends on
 filesystem I/O block size (usually 4096 bytes). Also, when using
 this option, --keep-size is implied. If no range is specified by
 --offset and --length, then the entire file is analyzed for holes.

 You can think of this option as doing a "cp --sparse" and then
 renaming the destination file to the original, without the need for
 extra disk space.

 See --punch-hole for a list of supported filesystems.

fallocate operates on the file level though and when you are running md5sum
against a block device (requesting sequential reads) you're tripping up on the
exact gap between how the fallocate() syscall should operate. We can see this
in action:

In action, using your example we see the following:

$ fs=$(mktemp -d)
$ echo $fs
/tmp/tmp.ONTGAS8L06
$ dd if=/dev/zero of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls $fs/sparse100M -s)"
Before: 0 /tmp/tmp.ONTGAS8L06/sparse100M
$ sudo losetup /dev/loop0 $fs/sparse100M
$ sudo md5sum /dev/loop0
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 102400 /tmp/tmp.ONTGAS8L06/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ONTGAS8L06/sparse100M

Now... that answers your basic question. My general motto is "get weird" so I
dug in further...

$ fs=$(mktemp -d)
$ echo $fs
/tmp/tmp.ZcAxvW32GY
$ dd if=/dev/zero of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls $fs/sparse100M -s)"
Before: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo losetup /dev/loop0 $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 516 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 512 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M

You see that merely the act of performing the losetup changes the size of
the sparse file. So this becomes an interesting combination of where tmpfs,
the HOLE_PUNCH mechanism, fallocate, and block devices intersect.

answered Sep 20 at 18:00

Brian Redbeard

1,578827

Thanks for your answer. I'm aware tmpfs supports sparse files and punch_hole. That's what makes it so confusing - tmpfs supports this, so why go and fill the sparse holes when reading through a loop device? losetup doesn't change the file size, but it creates a block device, which on most systems is then scanned for content like: is there a partition table? is there a filesystem with UUID? should I create a /dev/disk/by-uuid/ symlink then? And those reads already cause parts of the sparse file to be allocated, because for some mysterious reason, tmpfs fills holes on (some) reads.
â€“Â frostschutz
Sep 20 at 18:12

1

Can you clarify "sequential reading is going to (in some situations) cause a copy on write like operation", please? I'm curious to understand how a read operation would trigger a copy on write action. Thanks!
â€“Â roaima
Sep 20 at 18:12

This is odd. On my system I followed the same steps, though manually and not in a script. First I did a 100M file just like the OP. Then I repeated the steps with only a 10MB file. First result : ls -s sparse100M was 102400. But ls -s on the 10MB file was only 328 blocks. ??
â€“Â Patrick Taylor
Sep 21 at 5:16

1

@PatrickTaylor ~328K is about what's used after the UUID scanners came by, but you didn't cat / md5sum the loop device for a full read.
â€“Â frostschutz
Sep 21 at 10:53

1

I was digging through the source for the loop kernel module (in loop.c) and saw that there are two relevant functions: lo_read_simple & lo_read_transfer. There are some minor differences in how they do low level memory allocation... lo_read_transfer is actually requesting non-blocking io from slab.h (GFP_NOIO) while performing a alloc_page() call. lo_read_simple() on the other hand is not performing alloc_page().
â€“Â Brian Redbeard
Sep 21 at 19:23

Â |Â
show 4 more comments

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f350126%2fwhat-could-explain-this-strange-sparse-file-handling-of-in-tmpfs%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
4
down vote

+500

First off you're not alone in puzzling about these sorts of issues.

This is not just limited to tmpfs but has been a concern cited with
NFSv4.

If an application reads 'holes' in a sparse file, the file system converts empty blocks into "real" blocks filled with zeros, and returns them to the application.

When md5sum is attempting to scan a file it explicitly chooses to do this in
sequential order, which makes a lot of sense based on what md5sum is
attempting to do.

Fortunately, not only does tmpfs support this but there is a mechanism to
"dig" the holes back out.

Using the CLI utility fallocate we can successfuly detect and re-dig these
holes.

As per man 1 fallocate:

-d, --dig-holes
 Detect and dig holes. This makes the file sparse in-place, without
 using extra disk space. The minimum size of the hole depends on
 filesystem I/O block size (usually 4096 bytes). Also, when using
 this option, --keep-size is implied. If no range is specified by
 --offset and --length, then the entire file is analyzed for holes.

 You can think of this option as doing a "cp --sparse" and then
 renaming the destination file to the original, without the need for
 extra disk space.

 See --punch-hole for a list of supported filesystems.

In action, using your example we see the following:

$ fs=$(mktemp -d)
$ echo $fs
/tmp/tmp.ONTGAS8L06
$ dd if=/dev/zero of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls $fs/sparse100M -s)"
Before: 0 /tmp/tmp.ONTGAS8L06/sparse100M
$ sudo losetup /dev/loop0 $fs/sparse100M
$ sudo md5sum /dev/loop0
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 102400 /tmp/tmp.ONTGAS8L06/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ONTGAS8L06/sparse100M

Now... that answers your basic question. My general motto is "get weird" so I
dug in further...

$ fs=$(mktemp -d)
$ echo $fs
/tmp/tmp.ZcAxvW32GY
$ dd if=/dev/zero of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls $fs/sparse100M -s)"
Before: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo losetup /dev/loop0 $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 516 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 512 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M

answered Sep 20 at 18:00

Brian Redbeard

1,578827

Thanks for your answer. I'm aware tmpfs supports sparse files and punch_hole. That's what makes it so confusing - tmpfs supports this, so why go and fill the sparse holes when reading through a loop device? losetup doesn't change the file size, but it creates a block device, which on most systems is then scanned for content like: is there a partition table? is there a filesystem with UUID? should I create a /dev/disk/by-uuid/ symlink then? And those reads already cause parts of the sparse file to be allocated, because for some mysterious reason, tmpfs fills holes on (some) reads.
â€“Â frostschutz
Sep 20 at 18:12

1

Can you clarify "sequential reading is going to (in some situations) cause a copy on write like operation", please? I'm curious to understand how a read operation would trigger a copy on write action. Thanks!
â€“Â roaima
Sep 20 at 18:12

This is odd. On my system I followed the same steps, though manually and not in a script. First I did a 100M file just like the OP. Then I repeated the steps with only a 10MB file. First result : ls -s sparse100M was 102400. But ls -s on the 10MB file was only 328 blocks. ??
â€“Â Patrick Taylor
Sep 21 at 5:16

1

@PatrickTaylor ~328K is about what's used after the UUID scanners came by, but you didn't cat / md5sum the loop device for a full read.
â€“Â frostschutz
Sep 21 at 10:53

1

I was digging through the source for the loop kernel module (in loop.c) and saw that there are two relevant functions: lo_read_simple & lo_read_transfer. There are some minor differences in how they do low level memory allocation... lo_read_transfer is actually requesting non-blocking io from slab.h (GFP_NOIO) while performing a alloc_page() call. lo_read_simple() on the other hand is not performing alloc_page().
â€“Â Brian Redbeard
Sep 21 at 19:23

Â |Â
show 4 more comments

up vote
4
down vote

+500

First off you're not alone in puzzling about these sorts of issues.

This is not just limited to tmpfs but has been a concern cited with
NFSv4.

If an application reads 'holes' in a sparse file, the file system converts empty blocks into "real" blocks filled with zeros, and returns them to the application.

When md5sum is attempting to scan a file it explicitly chooses to do this in
sequential order, which makes a lot of sense based on what md5sum is
attempting to do.

Fortunately, not only does tmpfs support this but there is a mechanism to
"dig" the holes back out.

Using the CLI utility fallocate we can successfuly detect and re-dig these
holes.

As per man 1 fallocate:

-d, --dig-holes
 Detect and dig holes. This makes the file sparse in-place, without
 using extra disk space. The minimum size of the hole depends on
 filesystem I/O block size (usually 4096 bytes). Also, when using
 this option, --keep-size is implied. If no range is specified by
 --offset and --length, then the entire file is analyzed for holes.

 You can think of this option as doing a "cp --sparse" and then
 renaming the destination file to the original, without the need for
 extra disk space.

 See --punch-hole for a list of supported filesystems.

In action, using your example we see the following:

$ fs=$(mktemp -d)
$ echo $fs
/tmp/tmp.ONTGAS8L06
$ dd if=/dev/zero of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls $fs/sparse100M -s)"
Before: 0 /tmp/tmp.ONTGAS8L06/sparse100M
$ sudo losetup /dev/loop0 $fs/sparse100M
$ sudo md5sum /dev/loop0
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 102400 /tmp/tmp.ONTGAS8L06/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ONTGAS8L06/sparse100M

Now... that answers your basic question. My general motto is "get weird" so I
dug in further...

$ fs=$(mktemp -d)
$ echo $fs
/tmp/tmp.ZcAxvW32GY
$ dd if=/dev/zero of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls $fs/sparse100M -s)"
Before: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo losetup /dev/loop0 $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 516 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 512 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M

answered Sep 20 at 18:00

Brian Redbeard

1,578827

Thanks for your answer. I'm aware tmpfs supports sparse files and punch_hole. That's what makes it so confusing - tmpfs supports this, so why go and fill the sparse holes when reading through a loop device? losetup doesn't change the file size, but it creates a block device, which on most systems is then scanned for content like: is there a partition table? is there a filesystem with UUID? should I create a /dev/disk/by-uuid/ symlink then? And those reads already cause parts of the sparse file to be allocated, because for some mysterious reason, tmpfs fills holes on (some) reads.
â€“Â frostschutz
Sep 20 at 18:12

1

Can you clarify "sequential reading is going to (in some situations) cause a copy on write like operation", please? I'm curious to understand how a read operation would trigger a copy on write action. Thanks!
â€“Â roaima
Sep 20 at 18:12

This is odd. On my system I followed the same steps, though manually and not in a script. First I did a 100M file just like the OP. Then I repeated the steps with only a 10MB file. First result : ls -s sparse100M was 102400. But ls -s on the 10MB file was only 328 blocks. ??
â€“Â Patrick Taylor
Sep 21 at 5:16

1

@PatrickTaylor ~328K is about what's used after the UUID scanners came by, but you didn't cat / md5sum the loop device for a full read.
â€“Â frostschutz
Sep 21 at 10:53

1

I was digging through the source for the loop kernel module (in loop.c) and saw that there are two relevant functions: lo_read_simple & lo_read_transfer. There are some minor differences in how they do low level memory allocation... lo_read_transfer is actually requesting non-blocking io from slab.h (GFP_NOIO) while performing a alloc_page() call. lo_read_simple() on the other hand is not performing alloc_page().
â€“Â Brian Redbeard
Sep 21 at 19:23

Â |Â
show 4 more comments

up vote
4
down vote

+500

up vote
4
down vote

+500

First off you're not alone in puzzling about these sorts of issues.

This is not just limited to tmpfs but has been a concern cited with
NFSv4.

If an application reads 'holes' in a sparse file, the file system converts empty blocks into "real" blocks filled with zeros, and returns them to the application.

When md5sum is attempting to scan a file it explicitly chooses to do this in
sequential order, which makes a lot of sense based on what md5sum is
attempting to do.

Fortunately, not only does tmpfs support this but there is a mechanism to
"dig" the holes back out.

Using the CLI utility fallocate we can successfuly detect and re-dig these
holes.

As per man 1 fallocate:

-d, --dig-holes
 Detect and dig holes. This makes the file sparse in-place, without
 using extra disk space. The minimum size of the hole depends on
 filesystem I/O block size (usually 4096 bytes). Also, when using
 this option, --keep-size is implied. If no range is specified by
 --offset and --length, then the entire file is analyzed for holes.

 You can think of this option as doing a "cp --sparse" and then
 renaming the destination file to the original, without the need for
 extra disk space.

 See --punch-hole for a list of supported filesystems.

In action, using your example we see the following:

$ fs=$(mktemp -d)
$ echo $fs
/tmp/tmp.ONTGAS8L06
$ dd if=/dev/zero of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls $fs/sparse100M -s)"
Before: 0 /tmp/tmp.ONTGAS8L06/sparse100M
$ sudo losetup /dev/loop0 $fs/sparse100M
$ sudo md5sum /dev/loop0
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 102400 /tmp/tmp.ONTGAS8L06/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ONTGAS8L06/sparse100M

Now... that answers your basic question. My general motto is "get weird" so I
dug in further...

$ fs=$(mktemp -d)
$ echo $fs
/tmp/tmp.ZcAxvW32GY
$ dd if=/dev/zero of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls $fs/sparse100M -s)"
Before: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo losetup /dev/loop0 $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 516 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 512 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M

answered Sep 20 at 18:00

Brian Redbeard

1,578827

First off you're not alone in puzzling about these sorts of issues.

This is not just limited to tmpfs but has been a concern cited with
NFSv4.

If an application reads 'holes' in a sparse file, the file system converts empty blocks into "real" blocks filled with zeros, and returns them to the application.

When md5sum is attempting to scan a file it explicitly chooses to do this in
sequential order, which makes a lot of sense based on what md5sum is
attempting to do.

Fortunately, not only does tmpfs support this but there is a mechanism to
"dig" the holes back out.

Using the CLI utility fallocate we can successfuly detect and re-dig these
holes.

As per man 1 fallocate:

-d, --dig-holes
 Detect and dig holes. This makes the file sparse in-place, without
 using extra disk space. The minimum size of the hole depends on
 filesystem I/O block size (usually 4096 bytes). Also, when using
 this option, --keep-size is implied. If no range is specified by
 --offset and --length, then the entire file is analyzed for holes.

 You can think of this option as doing a "cp --sparse" and then
 renaming the destination file to the original, without the need for
 extra disk space.

 See --punch-hole for a list of supported filesystems.

In action, using your example we see the following:

$ fs=$(mktemp -d)
$ echo $fs
/tmp/tmp.ONTGAS8L06
$ dd if=/dev/zero of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls $fs/sparse100M -s)"
Before: 0 /tmp/tmp.ONTGAS8L06/sparse100M
$ sudo losetup /dev/loop0 $fs/sparse100M
$ sudo md5sum /dev/loop0
2f282b84e7e608d5852449ed940bfc51 /dev/loop0
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 102400 /tmp/tmp.ONTGAS8L06/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ONTGAS8L06/sparse100M

Now... that answers your basic question. My general motto is "get weird" so I
dug in further...

$ fs=$(mktemp -d)
$ echo $fs
/tmp/tmp.ZcAxvW32GY
$ dd if=/dev/zero of=$fs/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls $fs/sparse100M -s)"
Before: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo losetup /dev/loop0 $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 516 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 512 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d $fs/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum $fs/sparse100M
2f282b84e7e608d5852449ed940bfc51 /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls $fs/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M

answered Sep 20 at 18:00

Brian Redbeard

1,578827

answered Sep 20 at 18:00

Brian Redbeard

1,578827

answered Sep 20 at 18:00

Brian Redbeard

1,578827

answered Sep 20 at 18:00

Brian Redbeard

1,578827

Thanks for your answer. I'm aware tmpfs supports sparse files and punch_hole. That's what makes it so confusing - tmpfs supports this, so why go and fill the sparse holes when reading through a loop device? losetup doesn't change the file size, but it creates a block device, which on most systems is then scanned for content like: is there a partition table? is there a filesystem with UUID? should I create a /dev/disk/by-uuid/ symlink then? And those reads already cause parts of the sparse file to be allocated, because for some mysterious reason, tmpfs fills holes on (some) reads.
â€“Â frostschutz
Sep 20 at 18:12

1

Can you clarify "sequential reading is going to (in some situations) cause a copy on write like operation", please? I'm curious to understand how a read operation would trigger a copy on write action. Thanks!
â€“Â roaima
Sep 20 at 18:12

This is odd. On my system I followed the same steps, though manually and not in a script. First I did a 100M file just like the OP. Then I repeated the steps with only a 10MB file. First result : ls -s sparse100M was 102400. But ls -s on the 10MB file was only 328 blocks. ??
â€“Â Patrick Taylor
Sep 21 at 5:16

1

@PatrickTaylor ~328K is about what's used after the UUID scanners came by, but you didn't cat / md5sum the loop device for a full read.
â€“Â frostschutz
Sep 21 at 10:53

1

I was digging through the source for the loop kernel module (in loop.c) and saw that there are two relevant functions: lo_read_simple & lo_read_transfer. There are some minor differences in how they do low level memory allocation... lo_read_transfer is actually requesting non-blocking io from slab.h (GFP_NOIO) while performing a alloc_page() call. lo_read_simple() on the other hand is not performing alloc_page().
â€“Â Brian Redbeard
Sep 21 at 19:23

Â |Â
show 4 more comments

Thanks for your answer. I'm aware tmpfs supports sparse files and punch_hole. That's what makes it so confusing - tmpfs supports this, so why go and fill the sparse holes when reading through a loop device? losetup doesn't change the file size, but it creates a block device, which on most systems is then scanned for content like: is there a partition table? is there a filesystem with UUID? should I create a /dev/disk/by-uuid/ symlink then? And those reads already cause parts of the sparse file to be allocated, because for some mysterious reason, tmpfs fills holes on (some) reads.
â€“Â frostschutz
Sep 20 at 18:12

1

Can you clarify "sequential reading is going to (in some situations) cause a copy on write like operation", please? I'm curious to understand how a read operation would trigger a copy on write action. Thanks!
â€“Â roaima
Sep 20 at 18:12

This is odd. On my system I followed the same steps, though manually and not in a script. First I did a 100M file just like the OP. Then I repeated the steps with only a 10MB file. First result : ls -s sparse100M was 102400. But ls -s on the 10MB file was only 328 blocks. ??
â€“Â Patrick Taylor
Sep 21 at 5:16

1

@PatrickTaylor ~328K is about what's used after the UUID scanners came by, but you didn't cat / md5sum the loop device for a full read.
â€“Â frostschutz
Sep 21 at 10:53

1

I was digging through the source for the loop kernel module (in loop.c) and saw that there are two relevant functions: lo_read_simple & lo_read_transfer. There are some minor differences in how they do low level memory allocation... lo_read_transfer is actually requesting non-blocking io from slab.h (GFP_NOIO) while performing a alloc_page() call. lo_read_simple() on the other hand is not performing alloc_page().
â€“Â Brian Redbeard
Sep 21 at 19:23

Thanks for your answer. I'm aware tmpfs supports sparse files and punch_hole. That's what makes it so confusing - tmpfs supports this, so why go and fill the sparse holes when reading through a loop device? losetup doesn't change the file size, but it creates a block device, which on most systems is then scanned for content like: is there a partition table? is there a filesystem with UUID? should I create a /dev/disk/by-uuid/ symlink then? And those reads already cause parts of the sparse file to be allocated, because for some mysterious reason, tmpfs fills holes on (some) reads.
â€“Â frostschutz
Sep 20 at 18:12

Can you clarify "sequential reading is going to (in some situations) cause a copy on write like operation", please? I'm curious to understand how a read operation would trigger a copy on write action. Thanks!
â€“Â roaima
Sep 20 at 18:12

This is odd. On my system I followed the same steps, though manually and not in a script. First I did a 100M file just like the OP. Then I repeated the steps with only a 10MB file. First result : ls -s sparse100M was 102400. But ls -s on the 10MB file was only 328 blocks. ??
â€“Â Patrick Taylor
Sep 21 at 5:16

@PatrickTaylor ~328K is about what's used after the UUID scanners came by, but you didn't cat / md5sum the loop device for a full read.
â€“Â frostschutz
Sep 21 at 10:53

I was digging through the source for the loop kernel module (in loop.c) and saw that there are two relevant functions: lo_read_simple & lo_read_transfer. There are some minor differences in how they do low level memory allocation... lo_read_transfer is actually requesting non-blocking io from slab.h (GFP_NOIO) while performing a alloc_page() call. lo_read_simple() on the other hand is not performing alloc_page().
â€“Â Brian Redbeard
Sep 21 at 19:23

Â |Â
show 4 more comments

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu