LVM Snapshot without copy-on-write
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
I've been experimenting with LVM and how I can use it for managing data on my NFS server. With everything that I've read about snapshots, I am still unsure about how they perform in real life. Why does one need to allocate space in the snapshot if they are just a bunch of pointers to the original data? What is the point of the snapshot if modifying files on the origin also trigger modification on the snapshot via copy-on-write? I thought that the snapshot is supposed to be a "static" point in time of the original.
My expectation is such:
$ ls origin-data
> file1 file2
$ snapshot origin-data to origin-data-snapshot
$ modify origin-data and add new stuff
$ ls origin-data
> file1-modified file2 file3 file4
$ ls origin-data-snapshot
> file1 file2
$ sizeof origin-data-snapshot
> 0 bytes because they're all just pointers to blocks in origin-data!
If I'm misunderstanding, please explain and also explain how snapshots could be used in the way I'm expecting (like git commits, static, non-changing, pointers to data in a point in time that don't care about changes made to the origin). Does it involve RO or RW snapshots?
UPDATE: I've been experimenting with some test partitions and have a bit more understanding. While mounting both origin and it's snapshot, new files in origin obviously show up in something like df -h
but not in the snapshot. Meanwhile, lvdisplay
shows this percentage for "Allocated to snapshot" increasing. Using 10mb test files and 1gb test partitions, I see exactly how this percentage behaves in relation to my data, but why must this be so? Why does the new data show up on the snapshot and not origin? I would think the blocks behave like hard-links in that old data stays there because the snapshot points to it while new blocks are created next to them because origin points to the new and modified block. No?
backup lvm snapshot
add a comment |Â
up vote
2
down vote
favorite
I've been experimenting with LVM and how I can use it for managing data on my NFS server. With everything that I've read about snapshots, I am still unsure about how they perform in real life. Why does one need to allocate space in the snapshot if they are just a bunch of pointers to the original data? What is the point of the snapshot if modifying files on the origin also trigger modification on the snapshot via copy-on-write? I thought that the snapshot is supposed to be a "static" point in time of the original.
My expectation is such:
$ ls origin-data
> file1 file2
$ snapshot origin-data to origin-data-snapshot
$ modify origin-data and add new stuff
$ ls origin-data
> file1-modified file2 file3 file4
$ ls origin-data-snapshot
> file1 file2
$ sizeof origin-data-snapshot
> 0 bytes because they're all just pointers to blocks in origin-data!
If I'm misunderstanding, please explain and also explain how snapshots could be used in the way I'm expecting (like git commits, static, non-changing, pointers to data in a point in time that don't care about changes made to the origin). Does it involve RO or RW snapshots?
UPDATE: I've been experimenting with some test partitions and have a bit more understanding. While mounting both origin and it's snapshot, new files in origin obviously show up in something like df -h
but not in the snapshot. Meanwhile, lvdisplay
shows this percentage for "Allocated to snapshot" increasing. Using 10mb test files and 1gb test partitions, I see exactly how this percentage behaves in relation to my data, but why must this be so? Why does the new data show up on the snapshot and not origin? I would think the blocks behave like hard-links in that old data stays there because the snapshot points to it while new blocks are created next to them because origin points to the new and modified block. No?
backup lvm snapshot
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I've been experimenting with LVM and how I can use it for managing data on my NFS server. With everything that I've read about snapshots, I am still unsure about how they perform in real life. Why does one need to allocate space in the snapshot if they are just a bunch of pointers to the original data? What is the point of the snapshot if modifying files on the origin also trigger modification on the snapshot via copy-on-write? I thought that the snapshot is supposed to be a "static" point in time of the original.
My expectation is such:
$ ls origin-data
> file1 file2
$ snapshot origin-data to origin-data-snapshot
$ modify origin-data and add new stuff
$ ls origin-data
> file1-modified file2 file3 file4
$ ls origin-data-snapshot
> file1 file2
$ sizeof origin-data-snapshot
> 0 bytes because they're all just pointers to blocks in origin-data!
If I'm misunderstanding, please explain and also explain how snapshots could be used in the way I'm expecting (like git commits, static, non-changing, pointers to data in a point in time that don't care about changes made to the origin). Does it involve RO or RW snapshots?
UPDATE: I've been experimenting with some test partitions and have a bit more understanding. While mounting both origin and it's snapshot, new files in origin obviously show up in something like df -h
but not in the snapshot. Meanwhile, lvdisplay
shows this percentage for "Allocated to snapshot" increasing. Using 10mb test files and 1gb test partitions, I see exactly how this percentage behaves in relation to my data, but why must this be so? Why does the new data show up on the snapshot and not origin? I would think the blocks behave like hard-links in that old data stays there because the snapshot points to it while new blocks are created next to them because origin points to the new and modified block. No?
backup lvm snapshot
I've been experimenting with LVM and how I can use it for managing data on my NFS server. With everything that I've read about snapshots, I am still unsure about how they perform in real life. Why does one need to allocate space in the snapshot if they are just a bunch of pointers to the original data? What is the point of the snapshot if modifying files on the origin also trigger modification on the snapshot via copy-on-write? I thought that the snapshot is supposed to be a "static" point in time of the original.
My expectation is such:
$ ls origin-data
> file1 file2
$ snapshot origin-data to origin-data-snapshot
$ modify origin-data and add new stuff
$ ls origin-data
> file1-modified file2 file3 file4
$ ls origin-data-snapshot
> file1 file2
$ sizeof origin-data-snapshot
> 0 bytes because they're all just pointers to blocks in origin-data!
If I'm misunderstanding, please explain and also explain how snapshots could be used in the way I'm expecting (like git commits, static, non-changing, pointers to data in a point in time that don't care about changes made to the origin). Does it involve RO or RW snapshots?
UPDATE: I've been experimenting with some test partitions and have a bit more understanding. While mounting both origin and it's snapshot, new files in origin obviously show up in something like df -h
but not in the snapshot. Meanwhile, lvdisplay
shows this percentage for "Allocated to snapshot" increasing. Using 10mb test files and 1gb test partitions, I see exactly how this percentage behaves in relation to my data, but why must this be so? Why does the new data show up on the snapshot and not origin? I would think the blocks behave like hard-links in that old data stays there because the snapshot points to it while new blocks are created next to them because origin points to the new and modified block. No?
backup lvm snapshot
backup lvm snapshot
edited Mar 11 '14 at 6:17
asked Mar 11 '14 at 5:36
brianclements
1335
1335
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
5
down vote
accepted
The cost of a snapshot cannot possibly be zero bytes. When a block is changed in the source volume, and you have a snapshot, a copy of the original block prior to modification must be made - the original data must be available somehwere so that it's accessible from the snapshot.
That's what the snapshot size is (plus some metadata): original copies of blocks that have since been changed in the source.
Note that it might be an "accounting trick": an implementation could choose not to overwrite the original block on disk, but rather store the new data somewhere else and update the source block list (or whatever it is it uses to track). In this case the snapshot is "static" as per your definition. But it still causes the overall number of allocated blocks to grow whenever a source block is modified. This space usage should be (an is) accounted against the snapshot.
This is true both for RO and RW snapshots, except that it's a bit more complex in the RW case (you don't want to overwrite a block that was modified in the snapshot by an original block from the source if that is modified too, for example).
This pretty much answers it, that makes sense about old data that is overwritten on origin being pulled to the snapshot to maintain it's moment in time. But what then about creating new data on origin? I still see the "Allocated to snapshot" percentage increase onlvdisplay
on the snapshot when I continue to create brand new files to origin. Why doesn't that just count against origin?
â brianclements
Mar 11 '14 at 6:23
1
The snapshot doesn't work at the filesystem level, but at the block level. LVM doesn't know/understand the filesystem that sits on top of it, so it has to copy any block that is modified in the source to preserve it. That includes the blocks that were modified just for metadata (wherever the FS stored the fact that there is a new file), and all the newly touched data blocks in the source. A filesystem-level snapshot would (most likely) have different characteristics in this scenario.
â Mat
Mar 11 '14 at 6:38
Ah OK. So LVM simply cannot tell the difference between a file modification and a new file, it just treats them all the same and puts new/changed blocks in the snapshot.
â brianclements
Mar 11 '14 at 7:00
add a comment |Â
up vote
0
down vote
I just looked into this topic, like the OP, the core point of confusion stemmed from "thinking in files" while LVM works with physical extents.
Usually, LVM is located between the HDD and a file system, each of these three layers has its own term for the concept of "equally sized chunks of bytes":
hdd: sectors (512 bytes) -> LVM: physical extents (4MB) -> file system: blocks (e.g. 4K)
I created a 200MB large loop device, 100MB for a logical volume (testlv) and 60MB for a snapshot LV (snaplv).
The 100MB LV can be thought of as consisting of 25 physical extents, each representing 4MB worth of file system blocks. The snapshot LV initially also references these PEs, it does not use its own 15 PEs at this point. Whenever the user writes to either logical volume's file system, the file system will change the contents of one or more blocks, which of course are themselves stored in LVM physical extents.
Modifying a PE from testlv therefore means:
- copy the contents of the PE to one of the spare snaplv PEs (copy-on-write)
- change snaplv's reference to this "new" PE
- update the contents of the "original" testlv PE
Obviously, changing a PE from snaplv is almost the same, only the final step differs in that it is snaplv's copy of PE that will be updated.
New contributor
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
The cost of a snapshot cannot possibly be zero bytes. When a block is changed in the source volume, and you have a snapshot, a copy of the original block prior to modification must be made - the original data must be available somehwere so that it's accessible from the snapshot.
That's what the snapshot size is (plus some metadata): original copies of blocks that have since been changed in the source.
Note that it might be an "accounting trick": an implementation could choose not to overwrite the original block on disk, but rather store the new data somewhere else and update the source block list (or whatever it is it uses to track). In this case the snapshot is "static" as per your definition. But it still causes the overall number of allocated blocks to grow whenever a source block is modified. This space usage should be (an is) accounted against the snapshot.
This is true both for RO and RW snapshots, except that it's a bit more complex in the RW case (you don't want to overwrite a block that was modified in the snapshot by an original block from the source if that is modified too, for example).
This pretty much answers it, that makes sense about old data that is overwritten on origin being pulled to the snapshot to maintain it's moment in time. But what then about creating new data on origin? I still see the "Allocated to snapshot" percentage increase onlvdisplay
on the snapshot when I continue to create brand new files to origin. Why doesn't that just count against origin?
â brianclements
Mar 11 '14 at 6:23
1
The snapshot doesn't work at the filesystem level, but at the block level. LVM doesn't know/understand the filesystem that sits on top of it, so it has to copy any block that is modified in the source to preserve it. That includes the blocks that were modified just for metadata (wherever the FS stored the fact that there is a new file), and all the newly touched data blocks in the source. A filesystem-level snapshot would (most likely) have different characteristics in this scenario.
â Mat
Mar 11 '14 at 6:38
Ah OK. So LVM simply cannot tell the difference between a file modification and a new file, it just treats them all the same and puts new/changed blocks in the snapshot.
â brianclements
Mar 11 '14 at 7:00
add a comment |Â
up vote
5
down vote
accepted
The cost of a snapshot cannot possibly be zero bytes. When a block is changed in the source volume, and you have a snapshot, a copy of the original block prior to modification must be made - the original data must be available somehwere so that it's accessible from the snapshot.
That's what the snapshot size is (plus some metadata): original copies of blocks that have since been changed in the source.
Note that it might be an "accounting trick": an implementation could choose not to overwrite the original block on disk, but rather store the new data somewhere else and update the source block list (or whatever it is it uses to track). In this case the snapshot is "static" as per your definition. But it still causes the overall number of allocated blocks to grow whenever a source block is modified. This space usage should be (an is) accounted against the snapshot.
This is true both for RO and RW snapshots, except that it's a bit more complex in the RW case (you don't want to overwrite a block that was modified in the snapshot by an original block from the source if that is modified too, for example).
This pretty much answers it, that makes sense about old data that is overwritten on origin being pulled to the snapshot to maintain it's moment in time. But what then about creating new data on origin? I still see the "Allocated to snapshot" percentage increase onlvdisplay
on the snapshot when I continue to create brand new files to origin. Why doesn't that just count against origin?
â brianclements
Mar 11 '14 at 6:23
1
The snapshot doesn't work at the filesystem level, but at the block level. LVM doesn't know/understand the filesystem that sits on top of it, so it has to copy any block that is modified in the source to preserve it. That includes the blocks that were modified just for metadata (wherever the FS stored the fact that there is a new file), and all the newly touched data blocks in the source. A filesystem-level snapshot would (most likely) have different characteristics in this scenario.
â Mat
Mar 11 '14 at 6:38
Ah OK. So LVM simply cannot tell the difference between a file modification and a new file, it just treats them all the same and puts new/changed blocks in the snapshot.
â brianclements
Mar 11 '14 at 7:00
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
The cost of a snapshot cannot possibly be zero bytes. When a block is changed in the source volume, and you have a snapshot, a copy of the original block prior to modification must be made - the original data must be available somehwere so that it's accessible from the snapshot.
That's what the snapshot size is (plus some metadata): original copies of blocks that have since been changed in the source.
Note that it might be an "accounting trick": an implementation could choose not to overwrite the original block on disk, but rather store the new data somewhere else and update the source block list (or whatever it is it uses to track). In this case the snapshot is "static" as per your definition. But it still causes the overall number of allocated blocks to grow whenever a source block is modified. This space usage should be (an is) accounted against the snapshot.
This is true both for RO and RW snapshots, except that it's a bit more complex in the RW case (you don't want to overwrite a block that was modified in the snapshot by an original block from the source if that is modified too, for example).
The cost of a snapshot cannot possibly be zero bytes. When a block is changed in the source volume, and you have a snapshot, a copy of the original block prior to modification must be made - the original data must be available somehwere so that it's accessible from the snapshot.
That's what the snapshot size is (plus some metadata): original copies of blocks that have since been changed in the source.
Note that it might be an "accounting trick": an implementation could choose not to overwrite the original block on disk, but rather store the new data somewhere else and update the source block list (or whatever it is it uses to track). In this case the snapshot is "static" as per your definition. But it still causes the overall number of allocated blocks to grow whenever a source block is modified. This space usage should be (an is) accounted against the snapshot.
This is true both for RO and RW snapshots, except that it's a bit more complex in the RW case (you don't want to overwrite a block that was modified in the snapshot by an original block from the source if that is modified too, for example).
answered Mar 11 '14 at 6:02
Mat
38.2k7117124
38.2k7117124
This pretty much answers it, that makes sense about old data that is overwritten on origin being pulled to the snapshot to maintain it's moment in time. But what then about creating new data on origin? I still see the "Allocated to snapshot" percentage increase onlvdisplay
on the snapshot when I continue to create brand new files to origin. Why doesn't that just count against origin?
â brianclements
Mar 11 '14 at 6:23
1
The snapshot doesn't work at the filesystem level, but at the block level. LVM doesn't know/understand the filesystem that sits on top of it, so it has to copy any block that is modified in the source to preserve it. That includes the blocks that were modified just for metadata (wherever the FS stored the fact that there is a new file), and all the newly touched data blocks in the source. A filesystem-level snapshot would (most likely) have different characteristics in this scenario.
â Mat
Mar 11 '14 at 6:38
Ah OK. So LVM simply cannot tell the difference between a file modification and a new file, it just treats them all the same and puts new/changed blocks in the snapshot.
â brianclements
Mar 11 '14 at 7:00
add a comment |Â
This pretty much answers it, that makes sense about old data that is overwritten on origin being pulled to the snapshot to maintain it's moment in time. But what then about creating new data on origin? I still see the "Allocated to snapshot" percentage increase onlvdisplay
on the snapshot when I continue to create brand new files to origin. Why doesn't that just count against origin?
â brianclements
Mar 11 '14 at 6:23
1
The snapshot doesn't work at the filesystem level, but at the block level. LVM doesn't know/understand the filesystem that sits on top of it, so it has to copy any block that is modified in the source to preserve it. That includes the blocks that were modified just for metadata (wherever the FS stored the fact that there is a new file), and all the newly touched data blocks in the source. A filesystem-level snapshot would (most likely) have different characteristics in this scenario.
â Mat
Mar 11 '14 at 6:38
Ah OK. So LVM simply cannot tell the difference between a file modification and a new file, it just treats them all the same and puts new/changed blocks in the snapshot.
â brianclements
Mar 11 '14 at 7:00
This pretty much answers it, that makes sense about old data that is overwritten on origin being pulled to the snapshot to maintain it's moment in time. But what then about creating new data on origin? I still see the "Allocated to snapshot" percentage increase on
lvdisplay
on the snapshot when I continue to create brand new files to origin. Why doesn't that just count against origin?â brianclements
Mar 11 '14 at 6:23
This pretty much answers it, that makes sense about old data that is overwritten on origin being pulled to the snapshot to maintain it's moment in time. But what then about creating new data on origin? I still see the "Allocated to snapshot" percentage increase on
lvdisplay
on the snapshot when I continue to create brand new files to origin. Why doesn't that just count against origin?â brianclements
Mar 11 '14 at 6:23
1
1
The snapshot doesn't work at the filesystem level, but at the block level. LVM doesn't know/understand the filesystem that sits on top of it, so it has to copy any block that is modified in the source to preserve it. That includes the blocks that were modified just for metadata (wherever the FS stored the fact that there is a new file), and all the newly touched data blocks in the source. A filesystem-level snapshot would (most likely) have different characteristics in this scenario.
â Mat
Mar 11 '14 at 6:38
The snapshot doesn't work at the filesystem level, but at the block level. LVM doesn't know/understand the filesystem that sits on top of it, so it has to copy any block that is modified in the source to preserve it. That includes the blocks that were modified just for metadata (wherever the FS stored the fact that there is a new file), and all the newly touched data blocks in the source. A filesystem-level snapshot would (most likely) have different characteristics in this scenario.
â Mat
Mar 11 '14 at 6:38
Ah OK. So LVM simply cannot tell the difference between a file modification and a new file, it just treats them all the same and puts new/changed blocks in the snapshot.
â brianclements
Mar 11 '14 at 7:00
Ah OK. So LVM simply cannot tell the difference between a file modification and a new file, it just treats them all the same and puts new/changed blocks in the snapshot.
â brianclements
Mar 11 '14 at 7:00
add a comment |Â
up vote
0
down vote
I just looked into this topic, like the OP, the core point of confusion stemmed from "thinking in files" while LVM works with physical extents.
Usually, LVM is located between the HDD and a file system, each of these three layers has its own term for the concept of "equally sized chunks of bytes":
hdd: sectors (512 bytes) -> LVM: physical extents (4MB) -> file system: blocks (e.g. 4K)
I created a 200MB large loop device, 100MB for a logical volume (testlv) and 60MB for a snapshot LV (snaplv).
The 100MB LV can be thought of as consisting of 25 physical extents, each representing 4MB worth of file system blocks. The snapshot LV initially also references these PEs, it does not use its own 15 PEs at this point. Whenever the user writes to either logical volume's file system, the file system will change the contents of one or more blocks, which of course are themselves stored in LVM physical extents.
Modifying a PE from testlv therefore means:
- copy the contents of the PE to one of the spare snaplv PEs (copy-on-write)
- change snaplv's reference to this "new" PE
- update the contents of the "original" testlv PE
Obviously, changing a PE from snaplv is almost the same, only the final step differs in that it is snaplv's copy of PE that will be updated.
New contributor
add a comment |Â
up vote
0
down vote
I just looked into this topic, like the OP, the core point of confusion stemmed from "thinking in files" while LVM works with physical extents.
Usually, LVM is located between the HDD and a file system, each of these three layers has its own term for the concept of "equally sized chunks of bytes":
hdd: sectors (512 bytes) -> LVM: physical extents (4MB) -> file system: blocks (e.g. 4K)
I created a 200MB large loop device, 100MB for a logical volume (testlv) and 60MB for a snapshot LV (snaplv).
The 100MB LV can be thought of as consisting of 25 physical extents, each representing 4MB worth of file system blocks. The snapshot LV initially also references these PEs, it does not use its own 15 PEs at this point. Whenever the user writes to either logical volume's file system, the file system will change the contents of one or more blocks, which of course are themselves stored in LVM physical extents.
Modifying a PE from testlv therefore means:
- copy the contents of the PE to one of the spare snaplv PEs (copy-on-write)
- change snaplv's reference to this "new" PE
- update the contents of the "original" testlv PE
Obviously, changing a PE from snaplv is almost the same, only the final step differs in that it is snaplv's copy of PE that will be updated.
New contributor
add a comment |Â
up vote
0
down vote
up vote
0
down vote
I just looked into this topic, like the OP, the core point of confusion stemmed from "thinking in files" while LVM works with physical extents.
Usually, LVM is located between the HDD and a file system, each of these three layers has its own term for the concept of "equally sized chunks of bytes":
hdd: sectors (512 bytes) -> LVM: physical extents (4MB) -> file system: blocks (e.g. 4K)
I created a 200MB large loop device, 100MB for a logical volume (testlv) and 60MB for a snapshot LV (snaplv).
The 100MB LV can be thought of as consisting of 25 physical extents, each representing 4MB worth of file system blocks. The snapshot LV initially also references these PEs, it does not use its own 15 PEs at this point. Whenever the user writes to either logical volume's file system, the file system will change the contents of one or more blocks, which of course are themselves stored in LVM physical extents.
Modifying a PE from testlv therefore means:
- copy the contents of the PE to one of the spare snaplv PEs (copy-on-write)
- change snaplv's reference to this "new" PE
- update the contents of the "original" testlv PE
Obviously, changing a PE from snaplv is almost the same, only the final step differs in that it is snaplv's copy of PE that will be updated.
New contributor
I just looked into this topic, like the OP, the core point of confusion stemmed from "thinking in files" while LVM works with physical extents.
Usually, LVM is located between the HDD and a file system, each of these three layers has its own term for the concept of "equally sized chunks of bytes":
hdd: sectors (512 bytes) -> LVM: physical extents (4MB) -> file system: blocks (e.g. 4K)
I created a 200MB large loop device, 100MB for a logical volume (testlv) and 60MB for a snapshot LV (snaplv).
The 100MB LV can be thought of as consisting of 25 physical extents, each representing 4MB worth of file system blocks. The snapshot LV initially also references these PEs, it does not use its own 15 PEs at this point. Whenever the user writes to either logical volume's file system, the file system will change the contents of one or more blocks, which of course are themselves stored in LVM physical extents.
Modifying a PE from testlv therefore means:
- copy the contents of the PE to one of the spare snaplv PEs (copy-on-write)
- change snaplv's reference to this "new" PE
- update the contents of the "original" testlv PE
Obviously, changing a PE from snaplv is almost the same, only the final step differs in that it is snaplv's copy of PE that will be updated.
New contributor
New contributor
answered 27 mins ago
T Nierath
1011
1011
New contributor
New contributor
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f119075%2flvm-snapshot-without-copy-on-write%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password