ZFS: re-compress existing files after change in compression algorithm

up vote
11
down vote

favorite

I have a pool that was created in 2011, using lzjb compression, and it wasn't until a couple of years later that an upgrade allowed me to set the compression to lz4. I estimate that at least 20% of the content (by space) on the array was created prior to 2013, which means it's still compressed using lzjb.

I can think of a couple of options to fix this and regain (some) space:

Back up and restore to a new pool. Not really practical, as I do not have sufficient redundant storage to hold the temporary copy. The restore would also require the pool to be offline for several hours.

Write a script to re-copy any file with a timestamp older than 2013. Potentially risky, especially if it chokes on spaces or other special characters and ends up mangling the original name.

So... is there some way to get ZFS to recompress any legacy blocks using the current compression algorithm? Kind of like a scrub, but healing the compression.

A related question: is there some way to see the usage of each type of compression algorithm? zdb just shows overall compression stats, rather than breaking them down into individual algorithms.

Thanks.

edited Oct 1 at 2:33

asked Oct 1 at 2:04

rowan194

563

New contributor

2

I'm pretty sure you named the only two options. See also the discussion in issue 3013 for why this functionality doesn't exist and you might not want to do this at all.
â€“Â Michael Hamptonâ™¦
Oct 1 at 2:14

2

lz4 is supposedly at most 10% better on compressing than lzjb. If 20% of your data can be compressed 10% better you'll get at most 2% more free space. Is it worth it?
â€“Â pipe
Oct 1 at 12:01

If you write a shell script to do the copy, add export LC_ALL=C to the beginning of the script, and all non-ASCII special characters in filenames will be kept intact. Keeping whitespace and dash intact is trickier, use double quotes and --, e.g. cp -- "$SOURCE" "$TARGET".
â€“Â pts
Oct 1 at 12:57

3

@pipe Space is one (very) small advantage, but I'm more interested in decompression speed. From the FreeBSD zpool-features manpage: "Typically, lz4 compression is approximately 50% faster on compressible data and 200% faster on incompressible data than lzjb. It is also approximately 80% faster on decompression, while giving approximately 10% better compression ratio."
â€“Â rowan194
Oct 1 at 13:13

@pts I wouldn't call obeying fundamental shell programming rules (double quotes around variables or using --) "trickier". That's as important as avoiding SQL injection, for example.
â€“Â glglgl
Oct 1 at 14:52

add a commentÂ |Â

up vote
11
down vote

favorite

I can think of a couple of options to fix this and regain (some) space:

Back up and restore to a new pool. Not really practical, as I do not have sufficient redundant storage to hold the temporary copy. The restore would also require the pool to be offline for several hours.

Write a script to re-copy any file with a timestamp older than 2013. Potentially risky, especially if it chokes on spaces or other special characters and ends up mangling the original name.

So... is there some way to get ZFS to recompress any legacy blocks using the current compression algorithm? Kind of like a scrub, but healing the compression.

A related question: is there some way to see the usage of each type of compression algorithm? zdb just shows overall compression stats, rather than breaking them down into individual algorithms.

Thanks.

edited Oct 1 at 2:33

asked Oct 1 at 2:04

rowan194

563

New contributor

2

I'm pretty sure you named the only two options. See also the discussion in issue 3013 for why this functionality doesn't exist and you might not want to do this at all.
â€“Â Michael Hamptonâ™¦
Oct 1 at 2:14

2

lz4 is supposedly at most 10% better on compressing than lzjb. If 20% of your data can be compressed 10% better you'll get at most 2% more free space. Is it worth it?
â€“Â pipe
Oct 1 at 12:01

If you write a shell script to do the copy, add export LC_ALL=C to the beginning of the script, and all non-ASCII special characters in filenames will be kept intact. Keeping whitespace and dash intact is trickier, use double quotes and --, e.g. cp -- "$SOURCE" "$TARGET".
â€“Â pts
Oct 1 at 12:57

3

@pipe Space is one (very) small advantage, but I'm more interested in decompression speed. From the FreeBSD zpool-features manpage: "Typically, lz4 compression is approximately 50% faster on compressible data and 200% faster on incompressible data than lzjb. It is also approximately 80% faster on decompression, while giving approximately 10% better compression ratio."
â€“Â rowan194
Oct 1 at 13:13

@pts I wouldn't call obeying fundamental shell programming rules (double quotes around variables or using --) "trickier". That's as important as avoiding SQL injection, for example.
â€“Â glglgl
Oct 1 at 14:52

add a commentÂ |Â

up vote
11
down vote

favorite

I can think of a couple of options to fix this and regain (some) space:

Back up and restore to a new pool. Not really practical, as I do not have sufficient redundant storage to hold the temporary copy. The restore would also require the pool to be offline for several hours.

Write a script to re-copy any file with a timestamp older than 2013. Potentially risky, especially if it chokes on spaces or other special characters and ends up mangling the original name.

So... is there some way to get ZFS to recompress any legacy blocks using the current compression algorithm? Kind of like a scrub, but healing the compression.

A related question: is there some way to see the usage of each type of compression algorithm? zdb just shows overall compression stats, rather than breaking them down into individual algorithms.

Thanks.

edited Oct 1 at 2:33

asked Oct 1 at 2:04

rowan194

563

New contributor

I can think of a couple of options to fix this and regain (some) space:

Back up and restore to a new pool. Not really practical, as I do not have sufficient redundant storage to hold the temporary copy. The restore would also require the pool to be offline for several hours.

Write a script to re-copy any file with a timestamp older than 2013. Potentially risky, especially if it chokes on spaces or other special characters and ends up mangling the original name.

So... is there some way to get ZFS to recompress any legacy blocks using the current compression algorithm? Kind of like a scrub, but healing the compression.

A related question: is there some way to see the usage of each type of compression algorithm? zdb just shows overall compression stats, rather than breaking them down into individual algorithms.

Thanks.

zfs

edited Oct 1 at 2:33

asked Oct 1 at 2:04

rowan194

563

New contributor

edited Oct 1 at 2:33

asked Oct 1 at 2:04

rowan194

563

New contributor

edited Oct 1 at 2:33

asked Oct 1 at 2:04

rowan194

563

New contributor

asked Oct 1 at 2:04

rowan194

563

asked Oct 1 at 2:04

rowan194

563

New contributor

rowan194 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

2

I'm pretty sure you named the only two options. See also the discussion in issue 3013 for why this functionality doesn't exist and you might not want to do this at all.
â€“Â Michael Hamptonâ™¦
Oct 1 at 2:14

2

lz4 is supposedly at most 10% better on compressing than lzjb. If 20% of your data can be compressed 10% better you'll get at most 2% more free space. Is it worth it?
â€“Â pipe
Oct 1 at 12:01

If you write a shell script to do the copy, add export LC_ALL=C to the beginning of the script, and all non-ASCII special characters in filenames will be kept intact. Keeping whitespace and dash intact is trickier, use double quotes and --, e.g. cp -- "$SOURCE" "$TARGET".
â€“Â pts
Oct 1 at 12:57

3

@pipe Space is one (very) small advantage, but I'm more interested in decompression speed. From the FreeBSD zpool-features manpage: "Typically, lz4 compression is approximately 50% faster on compressible data and 200% faster on incompressible data than lzjb. It is also approximately 80% faster on decompression, while giving approximately 10% better compression ratio."
â€“Â rowan194
Oct 1 at 13:13

@pts I wouldn't call obeying fundamental shell programming rules (double quotes around variables or using --) "trickier". That's as important as avoiding SQL injection, for example.
â€“Â glglgl
Oct 1 at 14:52

add a commentÂ |Â

2

I'm pretty sure you named the only two options. See also the discussion in issue 3013 for why this functionality doesn't exist and you might not want to do this at all.
â€“Â Michael Hamptonâ™¦
Oct 1 at 2:14

2

lz4 is supposedly at most 10% better on compressing than lzjb. If 20% of your data can be compressed 10% better you'll get at most 2% more free space. Is it worth it?
â€“Â pipe
Oct 1 at 12:01

If you write a shell script to do the copy, add export LC_ALL=C to the beginning of the script, and all non-ASCII special characters in filenames will be kept intact. Keeping whitespace and dash intact is trickier, use double quotes and --, e.g. cp -- "$SOURCE" "$TARGET".
â€“Â pts
Oct 1 at 12:57

3

@pipe Space is one (very) small advantage, but I'm more interested in decompression speed. From the FreeBSD zpool-features manpage: "Typically, lz4 compression is approximately 50% faster on compressible data and 200% faster on incompressible data than lzjb. It is also approximately 80% faster on decompression, while giving approximately 10% better compression ratio."
â€“Â rowan194
Oct 1 at 13:13

@pts I wouldn't call obeying fundamental shell programming rules (double quotes around variables or using --) "trickier". That's as important as avoiding SQL injection, for example.
â€“Â glglgl
Oct 1 at 14:52

I'm pretty sure you named the only two options. See also the discussion in issue 3013 for why this functionality doesn't exist and you might not want to do this at all.
â€“Â Michael Hamptonâ™¦
Oct 1 at 2:14

lz4 is supposedly at most 10% better on compressing than lzjb. If 20% of your data can be compressed 10% better you'll get at most 2% more free space. Is it worth it?
â€“Â pipe
Oct 1 at 12:01

If you write a shell script to do the copy, add export LC_ALL=C to the beginning of the script, and all non-ASCII special characters in filenames will be kept intact. Keeping whitespace and dash intact is trickier, use double quotes and --, e.g. cp -- "$SOURCE" "$TARGET".
â€“Â pts
Oct 1 at 12:57

@pipe Space is one (very) small advantage, but I'm more interested in decompression speed. From the FreeBSD zpool-features manpage: "Typically, lz4 compression is approximately 50% faster on compressible data and 200% faster on incompressible data than lzjb. It is also approximately 80% faster on decompression, while giving approximately 10% better compression ratio."
â€“Â rowan194
Oct 1 at 13:13

@pts I wouldn't call obeying fundamental shell programming rules (double quotes around variables or using --) "trickier". That's as important as avoiding SQL injection, for example.
â€“Â glglgl
Oct 1 at 14:52

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
13
down vote

You've have to recopy the data (full or partial) or zfs send/receive the data to a new pool or ZFS filesystem.

There aren't any other options.

answered Oct 1 at 2:28

ewwhite

171k73357707

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

rowan194 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f933387%2fzfs-re-compress-existing-files-after-change-in-compression-algorithm%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
13
down vote

You've have to recopy the data (full or partial) or zfs send/receive the data to a new pool or ZFS filesystem.

There aren't any other options.

answered Oct 1 at 2:28

ewwhite

171k73357707

add a commentÂ |Â

up vote
13
down vote

You've have to recopy the data (full or partial) or zfs send/receive the data to a new pool or ZFS filesystem.

There aren't any other options.

answered Oct 1 at 2:28

ewwhite

171k73357707

add a commentÂ |Â

up vote
13
down vote

You've have to recopy the data (full or partial) or zfs send/receive the data to a new pool or ZFS filesystem.

There aren't any other options.

answered Oct 1 at 2:28

ewwhite

171k73357707

You've have to recopy the data (full or partial) or zfs send/receive the data to a new pool or ZFS filesystem.

There aren't any other options.

answered Oct 1 at 2:28

ewwhite

171k73357707

answered Oct 1 at 2:28

ewwhite

171k73357707

answered Oct 1 at 2:28

ewwhite

171k73357707

answered Oct 1 at 2:28

ewwhite

171k73357707

add a commentÂ |Â

rowan194 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

rowan194 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu