zfs linux file level disk usage confusion

Clash Royale CLAN TAG#URR8PPP
up vote
-1
down vote
favorite
I am using ZFS on linux and I am confused on why the actual disk usage of files (as reported by 'du') seems so all over the place.
I created a pool called 'vault' on a hardware Dell PERC RAID (just /dev/sdb) taking all defaults except autoexpand 'on'
I then created a volume on it using
-o reserv=2040G -o quota=2040G -o recsize=4k -o acltype=posixacl
I then rsync'ed a ext4 volume over to it. For example on this volume there are two data files (Matlab *.mat files) that are 13104 and 11264 bytes in size. On the ext4 file system du on these files says 16K and 12K respectively corresponding to the 4K block size. Any <4K file will always say 4K from du.
In contrast on ZFS a du on these two files shows 25K and 21K respectively while on one 1 byte file I get 4.5K. The extra .5K on the later is not too alarming due to various metadata usage I guess. Some other <4K files I have
run du on though come back exactly 4K. Most confusing is why are the du's on the *.mat files nearly twice the "real" data size?
zfs
add a comment |Â
up vote
-1
down vote
favorite
I am using ZFS on linux and I am confused on why the actual disk usage of files (as reported by 'du') seems so all over the place.
I created a pool called 'vault' on a hardware Dell PERC RAID (just /dev/sdb) taking all defaults except autoexpand 'on'
I then created a volume on it using
-o reserv=2040G -o quota=2040G -o recsize=4k -o acltype=posixacl
I then rsync'ed a ext4 volume over to it. For example on this volume there are two data files (Matlab *.mat files) that are 13104 and 11264 bytes in size. On the ext4 file system du on these files says 16K and 12K respectively corresponding to the 4K block size. Any <4K file will always say 4K from du.
In contrast on ZFS a du on these two files shows 25K and 21K respectively while on one 1 byte file I get 4.5K. The extra .5K on the later is not too alarming due to various metadata usage I guess. Some other <4K files I have
run du on though come back exactly 4K. Most confusing is why are the du's on the *.mat files nearly twice the "real" data size?
zfs
I believe it has something to do with how copy-on-write filesystems like ZFS work, which can make determining actual disk usage with standard tools less reliable as with other filesystem types, due to snapshots and file revision copies and metadata.
â Mioriin
Sep 7 at 22:07
add a comment |Â
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I am using ZFS on linux and I am confused on why the actual disk usage of files (as reported by 'du') seems so all over the place.
I created a pool called 'vault' on a hardware Dell PERC RAID (just /dev/sdb) taking all defaults except autoexpand 'on'
I then created a volume on it using
-o reserv=2040G -o quota=2040G -o recsize=4k -o acltype=posixacl
I then rsync'ed a ext4 volume over to it. For example on this volume there are two data files (Matlab *.mat files) that are 13104 and 11264 bytes in size. On the ext4 file system du on these files says 16K and 12K respectively corresponding to the 4K block size. Any <4K file will always say 4K from du.
In contrast on ZFS a du on these two files shows 25K and 21K respectively while on one 1 byte file I get 4.5K. The extra .5K on the later is not too alarming due to various metadata usage I guess. Some other <4K files I have
run du on though come back exactly 4K. Most confusing is why are the du's on the *.mat files nearly twice the "real" data size?
zfs
I am using ZFS on linux and I am confused on why the actual disk usage of files (as reported by 'du') seems so all over the place.
I created a pool called 'vault' on a hardware Dell PERC RAID (just /dev/sdb) taking all defaults except autoexpand 'on'
I then created a volume on it using
-o reserv=2040G -o quota=2040G -o recsize=4k -o acltype=posixacl
I then rsync'ed a ext4 volume over to it. For example on this volume there are two data files (Matlab *.mat files) that are 13104 and 11264 bytes in size. On the ext4 file system du on these files says 16K and 12K respectively corresponding to the 4K block size. Any <4K file will always say 4K from du.
In contrast on ZFS a du on these two files shows 25K and 21K respectively while on one 1 byte file I get 4.5K. The extra .5K on the later is not too alarming due to various metadata usage I guess. Some other <4K files I have
run du on though come back exactly 4K. Most confusing is why are the du's on the *.mat files nearly twice the "real" data size?
zfs
zfs
edited Sep 7 at 23:49
Rui F Ribeiro
36.8k1273117
36.8k1273117
asked Sep 7 at 20:25
raines
775
775
I believe it has something to do with how copy-on-write filesystems like ZFS work, which can make determining actual disk usage with standard tools less reliable as with other filesystem types, due to snapshots and file revision copies and metadata.
â Mioriin
Sep 7 at 22:07
add a comment |Â
I believe it has something to do with how copy-on-write filesystems like ZFS work, which can make determining actual disk usage with standard tools less reliable as with other filesystem types, due to snapshots and file revision copies and metadata.
â Mioriin
Sep 7 at 22:07
I believe it has something to do with how copy-on-write filesystems like ZFS work, which can make determining actual disk usage with standard tools less reliable as with other filesystem types, due to snapshots and file revision copies and metadata.
â Mioriin
Sep 7 at 22:07
I believe it has something to do with how copy-on-write filesystems like ZFS work, which can make determining actual disk usage with standard tools less reliable as with other filesystem types, due to snapshots and file revision copies and metadata.
â Mioriin
Sep 7 at 22:07
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
2
down vote
You can use zdb to determine information like this - just grab the inode and the dataset. For example, if your dataset is named tank/foo, you can use ls -i to determine the inode number, and then zdb -ddddd tank/foo $INODE to dump out the information.
Here's an example on my machine:
# cd /var/tmp
# mkfile 13104 file1
# ls -i file1
4125 file1
# zdb -ddddd rpool/VARSHARE/tmp 4125
Dataset rpool/VARSHARE/tmp [ZPL], ID 4128, cr_txg 928175, 8.24G, 1223 objects, rootbp DVA[0]=<0:1f62907a00:200:STD:1> DVA[1]=<0:f4a2eda00:200:STD:1> [L0 DMU objset] fletcher4 lzjb LE unique unencrypted size=800L/200P birth=14962025L/14962025P fill=1223 contiguous 2-copy cksum=1b152e5f60:7b661ed2ecb:1445f97091785:278e0e37baf85b
Object lvl iblk dblk dsize lsize %full type
4125 1 16K 13.0K 13.0K 13.0K 100.00 ZFS plain file
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /file1
uid 0
gid 0
atime Fri Sep 7 18:48:00 2018
mtime Fri Sep 7 18:48:00 2018
ctime Fri Sep 7 18:48:00 2018
crtime Fri Sep 7 18:48:00 2018
gen 14962023
mode 0100600
size 13104
parent 4
links 1
pflags 0x40800000204
Indirect blocks:
0 L0 0:0x1f55eb0200:0x3400 0x3400L/0x3400P F=1 B=14962023/14962023 ---
segment [000000000000000000, 0x0000000000003400) size 13.0K
#
This will let you know the size of the data, and the amount of metadata (labelled "indirect blocks") that your file is consuming.
In the case here, it allocated a block of 13k exactly, and is using a single indirect block of 16k. So, it's using 29k to store the 13k file. I'm guessing your numbers will be similar.
Note that the 16k "iblk" is very probably compressed, so it's a good bet it's physically taking up only 4k.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
You can use zdb to determine information like this - just grab the inode and the dataset. For example, if your dataset is named tank/foo, you can use ls -i to determine the inode number, and then zdb -ddddd tank/foo $INODE to dump out the information.
Here's an example on my machine:
# cd /var/tmp
# mkfile 13104 file1
# ls -i file1
4125 file1
# zdb -ddddd rpool/VARSHARE/tmp 4125
Dataset rpool/VARSHARE/tmp [ZPL], ID 4128, cr_txg 928175, 8.24G, 1223 objects, rootbp DVA[0]=<0:1f62907a00:200:STD:1> DVA[1]=<0:f4a2eda00:200:STD:1> [L0 DMU objset] fletcher4 lzjb LE unique unencrypted size=800L/200P birth=14962025L/14962025P fill=1223 contiguous 2-copy cksum=1b152e5f60:7b661ed2ecb:1445f97091785:278e0e37baf85b
Object lvl iblk dblk dsize lsize %full type
4125 1 16K 13.0K 13.0K 13.0K 100.00 ZFS plain file
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /file1
uid 0
gid 0
atime Fri Sep 7 18:48:00 2018
mtime Fri Sep 7 18:48:00 2018
ctime Fri Sep 7 18:48:00 2018
crtime Fri Sep 7 18:48:00 2018
gen 14962023
mode 0100600
size 13104
parent 4
links 1
pflags 0x40800000204
Indirect blocks:
0 L0 0:0x1f55eb0200:0x3400 0x3400L/0x3400P F=1 B=14962023/14962023 ---
segment [000000000000000000, 0x0000000000003400) size 13.0K
#
This will let you know the size of the data, and the amount of metadata (labelled "indirect blocks") that your file is consuming.
In the case here, it allocated a block of 13k exactly, and is using a single indirect block of 16k. So, it's using 29k to store the 13k file. I'm guessing your numbers will be similar.
Note that the 16k "iblk" is very probably compressed, so it's a good bet it's physically taking up only 4k.
add a comment |Â
up vote
2
down vote
You can use zdb to determine information like this - just grab the inode and the dataset. For example, if your dataset is named tank/foo, you can use ls -i to determine the inode number, and then zdb -ddddd tank/foo $INODE to dump out the information.
Here's an example on my machine:
# cd /var/tmp
# mkfile 13104 file1
# ls -i file1
4125 file1
# zdb -ddddd rpool/VARSHARE/tmp 4125
Dataset rpool/VARSHARE/tmp [ZPL], ID 4128, cr_txg 928175, 8.24G, 1223 objects, rootbp DVA[0]=<0:1f62907a00:200:STD:1> DVA[1]=<0:f4a2eda00:200:STD:1> [L0 DMU objset] fletcher4 lzjb LE unique unencrypted size=800L/200P birth=14962025L/14962025P fill=1223 contiguous 2-copy cksum=1b152e5f60:7b661ed2ecb:1445f97091785:278e0e37baf85b
Object lvl iblk dblk dsize lsize %full type
4125 1 16K 13.0K 13.0K 13.0K 100.00 ZFS plain file
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /file1
uid 0
gid 0
atime Fri Sep 7 18:48:00 2018
mtime Fri Sep 7 18:48:00 2018
ctime Fri Sep 7 18:48:00 2018
crtime Fri Sep 7 18:48:00 2018
gen 14962023
mode 0100600
size 13104
parent 4
links 1
pflags 0x40800000204
Indirect blocks:
0 L0 0:0x1f55eb0200:0x3400 0x3400L/0x3400P F=1 B=14962023/14962023 ---
segment [000000000000000000, 0x0000000000003400) size 13.0K
#
This will let you know the size of the data, and the amount of metadata (labelled "indirect blocks") that your file is consuming.
In the case here, it allocated a block of 13k exactly, and is using a single indirect block of 16k. So, it's using 29k to store the 13k file. I'm guessing your numbers will be similar.
Note that the 16k "iblk" is very probably compressed, so it's a good bet it's physically taking up only 4k.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
You can use zdb to determine information like this - just grab the inode and the dataset. For example, if your dataset is named tank/foo, you can use ls -i to determine the inode number, and then zdb -ddddd tank/foo $INODE to dump out the information.
Here's an example on my machine:
# cd /var/tmp
# mkfile 13104 file1
# ls -i file1
4125 file1
# zdb -ddddd rpool/VARSHARE/tmp 4125
Dataset rpool/VARSHARE/tmp [ZPL], ID 4128, cr_txg 928175, 8.24G, 1223 objects, rootbp DVA[0]=<0:1f62907a00:200:STD:1> DVA[1]=<0:f4a2eda00:200:STD:1> [L0 DMU objset] fletcher4 lzjb LE unique unencrypted size=800L/200P birth=14962025L/14962025P fill=1223 contiguous 2-copy cksum=1b152e5f60:7b661ed2ecb:1445f97091785:278e0e37baf85b
Object lvl iblk dblk dsize lsize %full type
4125 1 16K 13.0K 13.0K 13.0K 100.00 ZFS plain file
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /file1
uid 0
gid 0
atime Fri Sep 7 18:48:00 2018
mtime Fri Sep 7 18:48:00 2018
ctime Fri Sep 7 18:48:00 2018
crtime Fri Sep 7 18:48:00 2018
gen 14962023
mode 0100600
size 13104
parent 4
links 1
pflags 0x40800000204
Indirect blocks:
0 L0 0:0x1f55eb0200:0x3400 0x3400L/0x3400P F=1 B=14962023/14962023 ---
segment [000000000000000000, 0x0000000000003400) size 13.0K
#
This will let you know the size of the data, and the amount of metadata (labelled "indirect blocks") that your file is consuming.
In the case here, it allocated a block of 13k exactly, and is using a single indirect block of 16k. So, it's using 29k to store the 13k file. I'm guessing your numbers will be similar.
Note that the 16k "iblk" is very probably compressed, so it's a good bet it's physically taking up only 4k.
You can use zdb to determine information like this - just grab the inode and the dataset. For example, if your dataset is named tank/foo, you can use ls -i to determine the inode number, and then zdb -ddddd tank/foo $INODE to dump out the information.
Here's an example on my machine:
# cd /var/tmp
# mkfile 13104 file1
# ls -i file1
4125 file1
# zdb -ddddd rpool/VARSHARE/tmp 4125
Dataset rpool/VARSHARE/tmp [ZPL], ID 4128, cr_txg 928175, 8.24G, 1223 objects, rootbp DVA[0]=<0:1f62907a00:200:STD:1> DVA[1]=<0:f4a2eda00:200:STD:1> [L0 DMU objset] fletcher4 lzjb LE unique unencrypted size=800L/200P birth=14962025L/14962025P fill=1223 contiguous 2-copy cksum=1b152e5f60:7b661ed2ecb:1445f97091785:278e0e37baf85b
Object lvl iblk dblk dsize lsize %full type
4125 1 16K 13.0K 13.0K 13.0K 100.00 ZFS plain file
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /file1
uid 0
gid 0
atime Fri Sep 7 18:48:00 2018
mtime Fri Sep 7 18:48:00 2018
ctime Fri Sep 7 18:48:00 2018
crtime Fri Sep 7 18:48:00 2018
gen 14962023
mode 0100600
size 13104
parent 4
links 1
pflags 0x40800000204
Indirect blocks:
0 L0 0:0x1f55eb0200:0x3400 0x3400L/0x3400P F=1 B=14962023/14962023 ---
segment [000000000000000000, 0x0000000000003400) size 13.0K
#
This will let you know the size of the data, and the amount of metadata (labelled "indirect blocks") that your file is consuming.
In the case here, it allocated a block of 13k exactly, and is using a single indirect block of 16k. So, it's using 29k to store the 13k file. I'm guessing your numbers will be similar.
Note that the 16k "iblk" is very probably compressed, so it's a good bet it's physically taking up only 4k.
answered Sep 7 at 22:54
mmusante
53225
53225
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f467619%2fzfs-linux-file-level-disk-usage-confusion%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
I believe it has something to do with how copy-on-write filesystems like ZFS work, which can make determining actual disk usage with standard tools less reliable as with other filesystem types, due to snapshots and file revision copies and metadata.
â Mioriin
Sep 7 at 22:07