zfs linux file level disk usage confusion

up vote
-1
down vote

favorite

I am using ZFS on linux and I am confused on why the actual disk usage of files (as reported by 'du') seems so all over the place.

I created a pool called 'vault' on a hardware Dell PERC RAID (just /dev/sdb) taking all defaults except autoexpand 'on'

I then created a volume on it using

-o reserv=2040G -o quota=2040G -o recsize=4k -o acltype=posixacl

I then rsync'ed a ext4 volume over to it. For example on this volume there are two data files (Matlab *.mat files) that are 13104 and 11264 bytes in size. On the ext4 file system du on these files says 16K and 12K respectively corresponding to the 4K block size. Any <4K file will always say 4K from du.

In contrast on ZFS a du on these two files shows 25K and 21K respectively while on one 1 byte file I get 4.5K. The extra .5K on the later is not too alarming due to various metadata usage I guess. Some other <4K files I have
run du on though come back exactly 4K. Most confusing is why are the du's on the *.mat files nearly twice the "real" data size?

edited Sep 7 at 23:49

Rui F Ribeiro

36.8k1273117

asked Sep 7 at 20:25

raines

775

I believe it has something to do with how copy-on-write filesystems like ZFS work, which can make determining actual disk usage with standard tools less reliable as with other filesystem types, due to snapshots and file revision copies and metadata.
â€“Â Mioriin
Sep 7 at 22:07

add a commentÂ |Â

up vote
-1
down vote

favorite

I am using ZFS on linux and I am confused on why the actual disk usage of files (as reported by 'du') seems so all over the place.

I created a pool called 'vault' on a hardware Dell PERC RAID (just /dev/sdb) taking all defaults except autoexpand 'on'

I then created a volume on it using

-o reserv=2040G -o quota=2040G -o recsize=4k -o acltype=posixacl

edited Sep 7 at 23:49

Rui F Ribeiro

36.8k1273117

asked Sep 7 at 20:25

raines

775

I believe it has something to do with how copy-on-write filesystems like ZFS work, which can make determining actual disk usage with standard tools less reliable as with other filesystem types, due to snapshots and file revision copies and metadata.
â€“Â Mioriin
Sep 7 at 22:07

add a commentÂ |Â

up vote
-1
down vote

favorite

I am using ZFS on linux and I am confused on why the actual disk usage of files (as reported by 'du') seems so all over the place.

I created a pool called 'vault' on a hardware Dell PERC RAID (just /dev/sdb) taking all defaults except autoexpand 'on'

I then created a volume on it using

-o reserv=2040G -o quota=2040G -o recsize=4k -o acltype=posixacl

edited Sep 7 at 23:49

Rui F Ribeiro

36.8k1273117

asked Sep 7 at 20:25

raines

775

I am using ZFS on linux and I am confused on why the actual disk usage of files (as reported by 'du') seems so all over the place.

I created a pool called 'vault' on a hardware Dell PERC RAID (just /dev/sdb) taking all defaults except autoexpand 'on'

I then created a volume on it using

-o reserv=2040G -o quota=2040G -o recsize=4k -o acltype=posixacl

zfs

edited Sep 7 at 23:49

Rui F Ribeiro

36.8k1273117

asked Sep 7 at 20:25

raines

775

edited Sep 7 at 23:49

Rui F Ribeiro

36.8k1273117

asked Sep 7 at 20:25

raines

775

edited Sep 7 at 23:49

Rui F Ribeiro

36.8k1273117

edited Sep 7 at 23:49

Rui F Ribeiro

36.8k1273117

edited Sep 7 at 23:49

Rui F Ribeiro

36.8k1273117

asked Sep 7 at 20:25

raines

775

asked Sep 7 at 20:25

raines

775

asked Sep 7 at 20:25

raines

775

I believe it has something to do with how copy-on-write filesystems like ZFS work, which can make determining actual disk usage with standard tools less reliable as with other filesystem types, due to snapshots and file revision copies and metadata.
â€“Â Mioriin
Sep 7 at 22:07

add a commentÂ |Â

I believe it has something to do with how copy-on-write filesystems like ZFS work, which can make determining actual disk usage with standard tools less reliable as with other filesystem types, due to snapshots and file revision copies and metadata.
â€“Â Mioriin
Sep 7 at 22:07

I believe it has something to do with how copy-on-write filesystems like ZFS work, which can make determining actual disk usage with standard tools less reliable as with other filesystem types, due to snapshots and file revision copies and metadata.
â€“Â Mioriin
Sep 7 at 22:07

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
2
down vote

You can use zdb to determine information like this - just grab the inode and the dataset. For example, if your dataset is named tank/foo, you can use ls -i to determine the inode number, and then zdb -ddddd tank/foo $INODE to dump out the information.

Here's an example on my machine:

# cd /var/tmp
# mkfile 13104 file1
# ls -i file1
 4125 file1
# zdb -ddddd rpool/VARSHARE/tmp 4125
Dataset rpool/VARSHARE/tmp [ZPL], ID 4128, cr_txg 928175, 8.24G, 1223 objects, rootbp DVA[0]=<0:1f62907a00:200:STD:1> DVA[1]=<0:f4a2eda00:200:STD:1> [L0 DMU objset] fletcher4 lzjb LE unique unencrypted size=800L/200P birth=14962025L/14962025P fill=1223 contiguous 2-copy cksum=1b152e5f60:7b661ed2ecb:1445f97091785:278e0e37baf85b

 Object lvl iblk dblk dsize lsize %full type
 4125 1 16K 13.0K 13.0K 13.0K 100.00 ZFS plain file
 168 bonus System attributes
 dnode flags: USED_BYTES USERUSED_ACCOUNTED 
 dnode maxblkid: 0
 path /file1
 uid 0
 gid 0
 atime Fri Sep 7 18:48:00 2018
 mtime Fri Sep 7 18:48:00 2018
 ctime Fri Sep 7 18:48:00 2018
 crtime Fri Sep 7 18:48:00 2018
 gen 14962023
 mode 0100600
 size 13104
 parent 4
 links 1
 pflags 0x40800000204
Indirect blocks:
 0 L0 0:0x1f55eb0200:0x3400 0x3400L/0x3400P F=1 B=14962023/14962023 ---

 segment [000000000000000000, 0x0000000000003400) size 13.0K
#

This will let you know the size of the data, and the amount of metadata (labelled "indirect blocks") that your file is consuming.

In the case here, it allocated a block of 13k exactly, and is using a single indirect block of 16k. So, it's using 29k to store the 13k file. I'm guessing your numbers will be similar.

Note that the 16k "iblk" is very probably compressed, so it's a good bet it's physically taking up only 4k.

answered Sep 7 at 22:54

mmusante

53225

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f467619%2fzfs-linux-file-level-disk-usage-confusion%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

Here's an example on my machine:

# cd /var/tmp
# mkfile 13104 file1
# ls -i file1
 4125 file1
# zdb -ddddd rpool/VARSHARE/tmp 4125
Dataset rpool/VARSHARE/tmp [ZPL], ID 4128, cr_txg 928175, 8.24G, 1223 objects, rootbp DVA[0]=<0:1f62907a00:200:STD:1> DVA[1]=<0:f4a2eda00:200:STD:1> [L0 DMU objset] fletcher4 lzjb LE unique unencrypted size=800L/200P birth=14962025L/14962025P fill=1223 contiguous 2-copy cksum=1b152e5f60:7b661ed2ecb:1445f97091785:278e0e37baf85b

 Object lvl iblk dblk dsize lsize %full type
 4125 1 16K 13.0K 13.0K 13.0K 100.00 ZFS plain file
 168 bonus System attributes
 dnode flags: USED_BYTES USERUSED_ACCOUNTED 
 dnode maxblkid: 0
 path /file1
 uid 0
 gid 0
 atime Fri Sep 7 18:48:00 2018
 mtime Fri Sep 7 18:48:00 2018
 ctime Fri Sep 7 18:48:00 2018
 crtime Fri Sep 7 18:48:00 2018
 gen 14962023
 mode 0100600
 size 13104
 parent 4
 links 1
 pflags 0x40800000204
Indirect blocks:
 0 L0 0:0x1f55eb0200:0x3400 0x3400L/0x3400P F=1 B=14962023/14962023 ---

 segment [000000000000000000, 0x0000000000003400) size 13.0K
#

This will let you know the size of the data, and the amount of metadata (labelled "indirect blocks") that your file is consuming.

In the case here, it allocated a block of 13k exactly, and is using a single indirect block of 16k. So, it's using 29k to store the 13k file. I'm guessing your numbers will be similar.

Note that the 16k "iblk" is very probably compressed, so it's a good bet it's physically taking up only 4k.

answered Sep 7 at 22:54

mmusante

53225

add a commentÂ |Â

up vote
2
down vote

Here's an example on my machine:

# cd /var/tmp
# mkfile 13104 file1
# ls -i file1
 4125 file1
# zdb -ddddd rpool/VARSHARE/tmp 4125
Dataset rpool/VARSHARE/tmp [ZPL], ID 4128, cr_txg 928175, 8.24G, 1223 objects, rootbp DVA[0]=<0:1f62907a00:200:STD:1> DVA[1]=<0:f4a2eda00:200:STD:1> [L0 DMU objset] fletcher4 lzjb LE unique unencrypted size=800L/200P birth=14962025L/14962025P fill=1223 contiguous 2-copy cksum=1b152e5f60:7b661ed2ecb:1445f97091785:278e0e37baf85b

 Object lvl iblk dblk dsize lsize %full type
 4125 1 16K 13.0K 13.0K 13.0K 100.00 ZFS plain file
 168 bonus System attributes
 dnode flags: USED_BYTES USERUSED_ACCOUNTED 
 dnode maxblkid: 0
 path /file1
 uid 0
 gid 0
 atime Fri Sep 7 18:48:00 2018
 mtime Fri Sep 7 18:48:00 2018
 ctime Fri Sep 7 18:48:00 2018
 crtime Fri Sep 7 18:48:00 2018
 gen 14962023
 mode 0100600
 size 13104
 parent 4
 links 1
 pflags 0x40800000204
Indirect blocks:
 0 L0 0:0x1f55eb0200:0x3400 0x3400L/0x3400P F=1 B=14962023/14962023 ---

 segment [000000000000000000, 0x0000000000003400) size 13.0K
#

This will let you know the size of the data, and the amount of metadata (labelled "indirect blocks") that your file is consuming.

In the case here, it allocated a block of 13k exactly, and is using a single indirect block of 16k. So, it's using 29k to store the 13k file. I'm guessing your numbers will be similar.

Note that the 16k "iblk" is very probably compressed, so it's a good bet it's physically taking up only 4k.

answered Sep 7 at 22:54

mmusante

53225

add a commentÂ |Â

up vote
2
down vote

Here's an example on my machine:

# cd /var/tmp
# mkfile 13104 file1
# ls -i file1
 4125 file1
# zdb -ddddd rpool/VARSHARE/tmp 4125
Dataset rpool/VARSHARE/tmp [ZPL], ID 4128, cr_txg 928175, 8.24G, 1223 objects, rootbp DVA[0]=<0:1f62907a00:200:STD:1> DVA[1]=<0:f4a2eda00:200:STD:1> [L0 DMU objset] fletcher4 lzjb LE unique unencrypted size=800L/200P birth=14962025L/14962025P fill=1223 contiguous 2-copy cksum=1b152e5f60:7b661ed2ecb:1445f97091785:278e0e37baf85b

 Object lvl iblk dblk dsize lsize %full type
 4125 1 16K 13.0K 13.0K 13.0K 100.00 ZFS plain file
 168 bonus System attributes
 dnode flags: USED_BYTES USERUSED_ACCOUNTED 
 dnode maxblkid: 0
 path /file1
 uid 0
 gid 0
 atime Fri Sep 7 18:48:00 2018
 mtime Fri Sep 7 18:48:00 2018
 ctime Fri Sep 7 18:48:00 2018
 crtime Fri Sep 7 18:48:00 2018
 gen 14962023
 mode 0100600
 size 13104
 parent 4
 links 1
 pflags 0x40800000204
Indirect blocks:
 0 L0 0:0x1f55eb0200:0x3400 0x3400L/0x3400P F=1 B=14962023/14962023 ---

 segment [000000000000000000, 0x0000000000003400) size 13.0K
#

This will let you know the size of the data, and the amount of metadata (labelled "indirect blocks") that your file is consuming.

In the case here, it allocated a block of 13k exactly, and is using a single indirect block of 16k. So, it's using 29k to store the 13k file. I'm guessing your numbers will be similar.

Note that the 16k "iblk" is very probably compressed, so it's a good bet it's physically taking up only 4k.

answered Sep 7 at 22:54

mmusante

53225

Here's an example on my machine:

# cd /var/tmp
# mkfile 13104 file1
# ls -i file1
 4125 file1
# zdb -ddddd rpool/VARSHARE/tmp 4125
Dataset rpool/VARSHARE/tmp [ZPL], ID 4128, cr_txg 928175, 8.24G, 1223 objects, rootbp DVA[0]=<0:1f62907a00:200:STD:1> DVA[1]=<0:f4a2eda00:200:STD:1> [L0 DMU objset] fletcher4 lzjb LE unique unencrypted size=800L/200P birth=14962025L/14962025P fill=1223 contiguous 2-copy cksum=1b152e5f60:7b661ed2ecb:1445f97091785:278e0e37baf85b

 Object lvl iblk dblk dsize lsize %full type
 4125 1 16K 13.0K 13.0K 13.0K 100.00 ZFS plain file
 168 bonus System attributes
 dnode flags: USED_BYTES USERUSED_ACCOUNTED 
 dnode maxblkid: 0
 path /file1
 uid 0
 gid 0
 atime Fri Sep 7 18:48:00 2018
 mtime Fri Sep 7 18:48:00 2018
 ctime Fri Sep 7 18:48:00 2018
 crtime Fri Sep 7 18:48:00 2018
 gen 14962023
 mode 0100600
 size 13104
 parent 4
 links 1
 pflags 0x40800000204
Indirect blocks:
 0 L0 0:0x1f55eb0200:0x3400 0x3400L/0x3400P F=1 B=14962023/14962023 ---

 segment [000000000000000000, 0x0000000000003400) size 13.0K
#

This will let you know the size of the data, and the amount of metadata (labelled "indirect blocks") that your file is consuming.

In the case here, it allocated a block of 13k exactly, and is using a single indirect block of 16k. So, it's using 29k to store the 13k file. I'm guessing your numbers will be similar.

Note that the 16k "iblk" is very probably compressed, so it's a good bet it's physically taking up only 4k.

answered Sep 7 at 22:54

mmusante

53225

answered Sep 7 at 22:54

mmusante

53225

answered Sep 7 at 22:54

mmusante

53225

answered Sep 7 at 22:54

mmusante

53225

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu