btrfs replace on RAID1 is super slow with failed disk present
Clash Royale CLAN TAG#URR8PPP
I'm trying to replace a failed disk in a RAID1 btrfs filesystem.
I can still mount the partition rw
(after about 5 minute delay and lots of I/O kernel errors).
I started replace
with -r
in an attempt to have the failed disk not impact the speed of the operation:
-r
only read from <srcdev> if no other zero-defect mirror exists.
(enable this if your drive has lots of read errors, the access
would be very slow)
Still, I'm getting really poor performance. The partition is 3.6TiB, and in 9.25 hours I got:
3.8% done, 0 write errs, 0 uncorr. read errs
At this rate, it will take over 10 days to complete!!!
Due to circumstances beyond my control, this is too long to wait.
I'm seeing kernel errors regarding the failed disk quite commonly, averaging every 5 minutes or so:
Jan 26 09:31:53 tara kernel: print_req_error: I/O error, dev sdc, sector 68044920
Jan 26 09:31:53 tara kernel: BTRFS warning (device dm-3): lost page write due to IO error on /dev/mapper/vg4TBd2-ark
Jan 26 09:31:53 tara kernel: BTRFS error (device dm-3): bdev /dev/mapper/vg4TBd2-ark errs: wr 8396, rd 3024, flush 58, corrupt 0, gen 3
Jan 26 09:31:53 tara kernel: BTRFS error (device dm-3): error writing primary super block to device 2
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 02 eb 9e 23 00 00 04 00
Jan 26 09:32:32 tara kernel: print_req_error: critical medium error, dev sdc, sector 391967000
I'm guessing that the errors are due to btrfs trying to write accounting data to the disk (even though is is completely idle).
Even mounted ro
, btrfs may try to write to a disk. Mount option -o
nologreplay
:
Warning
currently, the tree log is replayed even with a read-only
mount! To disable that behaviour, mount also with nologreplay.
How can I speed up the process?
This article says that a replace
will continue after reboot.
I'm thinking:
- Cancel the current
replace
- Remove the failed disk
mount -o degraded,rw
- Hope there's no power outage given the gotcha of this one-time-only mount option)
At this point in time, I propose to simultaneously:
- Allow
replace
to continue without the failed disk present (a recentscrub
showed that the good disk has all the data) - Convert the data to
single
to allow mounting againrw
in case of power outage during the process
Is this a sound plan to have the replace
complete earlier?
My calculations say 6.5 hours (not 10 days) would be feasible given disk I/O speeds.
linux btrfs replace raid1
add a comment |
I'm trying to replace a failed disk in a RAID1 btrfs filesystem.
I can still mount the partition rw
(after about 5 minute delay and lots of I/O kernel errors).
I started replace
with -r
in an attempt to have the failed disk not impact the speed of the operation:
-r
only read from <srcdev> if no other zero-defect mirror exists.
(enable this if your drive has lots of read errors, the access
would be very slow)
Still, I'm getting really poor performance. The partition is 3.6TiB, and in 9.25 hours I got:
3.8% done, 0 write errs, 0 uncorr. read errs
At this rate, it will take over 10 days to complete!!!
Due to circumstances beyond my control, this is too long to wait.
I'm seeing kernel errors regarding the failed disk quite commonly, averaging every 5 minutes or so:
Jan 26 09:31:53 tara kernel: print_req_error: I/O error, dev sdc, sector 68044920
Jan 26 09:31:53 tara kernel: BTRFS warning (device dm-3): lost page write due to IO error on /dev/mapper/vg4TBd2-ark
Jan 26 09:31:53 tara kernel: BTRFS error (device dm-3): bdev /dev/mapper/vg4TBd2-ark errs: wr 8396, rd 3024, flush 58, corrupt 0, gen 3
Jan 26 09:31:53 tara kernel: BTRFS error (device dm-3): error writing primary super block to device 2
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 02 eb 9e 23 00 00 04 00
Jan 26 09:32:32 tara kernel: print_req_error: critical medium error, dev sdc, sector 391967000
I'm guessing that the errors are due to btrfs trying to write accounting data to the disk (even though is is completely idle).
Even mounted ro
, btrfs may try to write to a disk. Mount option -o
nologreplay
:
Warning
currently, the tree log is replayed even with a read-only
mount! To disable that behaviour, mount also with nologreplay.
How can I speed up the process?
This article says that a replace
will continue after reboot.
I'm thinking:
- Cancel the current
replace
- Remove the failed disk
mount -o degraded,rw
- Hope there's no power outage given the gotcha of this one-time-only mount option)
At this point in time, I propose to simultaneously:
- Allow
replace
to continue without the failed disk present (a recentscrub
showed that the good disk has all the data) - Convert the data to
single
to allow mounting againrw
in case of power outage during the process
Is this a sound plan to have the replace
complete earlier?
My calculations say 6.5 hours (not 10 days) would be feasible given disk I/O speeds.
linux btrfs replace raid1
If you convert the data to single (and metadata to dup), why would you need to bother with the replace? After the profile change you should be able to remove the bad device, add the replacement, change your profile back to RAID1 and then re-balance. I suspect that if you attempt to usereplace
after changing your profile,replace
isn't going to "know" what to do; Since you won't have RAID1 anymorereplace
won't know to complete the steps described above.
– Emmanuel Rosa
Jan 26 at 3:16
@EmmanuelRosa I'm proposing to do theconvert
after thereplace
(re)starts. I've read that doing areplace
is much faster than doing abalance
. There must have been a reason thatreplace
was created at a later date (as opposed to continue usingadd
thenremove
which would rebalance
as part of theremove
).
– Tom Hale
Jan 26 at 6:34
This saysreplace
is 2-3x faster than rebalancing at remove. I also read thatreplace
operates at 90% of the disk's I/O capacity, perhaps this is why. However, things may be different if the failed drive is already removed.
– Tom Hale
Jan 26 at 7:04
add a comment |
I'm trying to replace a failed disk in a RAID1 btrfs filesystem.
I can still mount the partition rw
(after about 5 minute delay and lots of I/O kernel errors).
I started replace
with -r
in an attempt to have the failed disk not impact the speed of the operation:
-r
only read from <srcdev> if no other zero-defect mirror exists.
(enable this if your drive has lots of read errors, the access
would be very slow)
Still, I'm getting really poor performance. The partition is 3.6TiB, and in 9.25 hours I got:
3.8% done, 0 write errs, 0 uncorr. read errs
At this rate, it will take over 10 days to complete!!!
Due to circumstances beyond my control, this is too long to wait.
I'm seeing kernel errors regarding the failed disk quite commonly, averaging every 5 minutes or so:
Jan 26 09:31:53 tara kernel: print_req_error: I/O error, dev sdc, sector 68044920
Jan 26 09:31:53 tara kernel: BTRFS warning (device dm-3): lost page write due to IO error on /dev/mapper/vg4TBd2-ark
Jan 26 09:31:53 tara kernel: BTRFS error (device dm-3): bdev /dev/mapper/vg4TBd2-ark errs: wr 8396, rd 3024, flush 58, corrupt 0, gen 3
Jan 26 09:31:53 tara kernel: BTRFS error (device dm-3): error writing primary super block to device 2
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 02 eb 9e 23 00 00 04 00
Jan 26 09:32:32 tara kernel: print_req_error: critical medium error, dev sdc, sector 391967000
I'm guessing that the errors are due to btrfs trying to write accounting data to the disk (even though is is completely idle).
Even mounted ro
, btrfs may try to write to a disk. Mount option -o
nologreplay
:
Warning
currently, the tree log is replayed even with a read-only
mount! To disable that behaviour, mount also with nologreplay.
How can I speed up the process?
This article says that a replace
will continue after reboot.
I'm thinking:
- Cancel the current
replace
- Remove the failed disk
mount -o degraded,rw
- Hope there's no power outage given the gotcha of this one-time-only mount option)
At this point in time, I propose to simultaneously:
- Allow
replace
to continue without the failed disk present (a recentscrub
showed that the good disk has all the data) - Convert the data to
single
to allow mounting againrw
in case of power outage during the process
Is this a sound plan to have the replace
complete earlier?
My calculations say 6.5 hours (not 10 days) would be feasible given disk I/O speeds.
linux btrfs replace raid1
I'm trying to replace a failed disk in a RAID1 btrfs filesystem.
I can still mount the partition rw
(after about 5 minute delay and lots of I/O kernel errors).
I started replace
with -r
in an attempt to have the failed disk not impact the speed of the operation:
-r
only read from <srcdev> if no other zero-defect mirror exists.
(enable this if your drive has lots of read errors, the access
would be very slow)
Still, I'm getting really poor performance. The partition is 3.6TiB, and in 9.25 hours I got:
3.8% done, 0 write errs, 0 uncorr. read errs
At this rate, it will take over 10 days to complete!!!
Due to circumstances beyond my control, this is too long to wait.
I'm seeing kernel errors regarding the failed disk quite commonly, averaging every 5 minutes or so:
Jan 26 09:31:53 tara kernel: print_req_error: I/O error, dev sdc, sector 68044920
Jan 26 09:31:53 tara kernel: BTRFS warning (device dm-3): lost page write due to IO error on /dev/mapper/vg4TBd2-ark
Jan 26 09:31:53 tara kernel: BTRFS error (device dm-3): bdev /dev/mapper/vg4TBd2-ark errs: wr 8396, rd 3024, flush 58, corrupt 0, gen 3
Jan 26 09:31:53 tara kernel: BTRFS error (device dm-3): error writing primary super block to device 2
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 Sense Key : Medium Error [current]
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 Add. Sense: Unrecovered read error
Jan 26 09:32:32 tara kernel: sd 2:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 02 eb 9e 23 00 00 04 00
Jan 26 09:32:32 tara kernel: print_req_error: critical medium error, dev sdc, sector 391967000
I'm guessing that the errors are due to btrfs trying to write accounting data to the disk (even though is is completely idle).
Even mounted ro
, btrfs may try to write to a disk. Mount option -o
nologreplay
:
Warning
currently, the tree log is replayed even with a read-only
mount! To disable that behaviour, mount also with nologreplay.
How can I speed up the process?
This article says that a replace
will continue after reboot.
I'm thinking:
- Cancel the current
replace
- Remove the failed disk
mount -o degraded,rw
- Hope there's no power outage given the gotcha of this one-time-only mount option)
At this point in time, I propose to simultaneously:
- Allow
replace
to continue without the failed disk present (a recentscrub
showed that the good disk has all the data) - Convert the data to
single
to allow mounting againrw
in case of power outage during the process
Is this a sound plan to have the replace
complete earlier?
My calculations say 6.5 hours (not 10 days) would be feasible given disk I/O speeds.
linux btrfs replace raid1
linux btrfs replace raid1
edited Jan 26 at 0:10
Tom Hale
asked Jan 25 at 23:16
Tom HaleTom Hale
6,98833796
6,98833796
If you convert the data to single (and metadata to dup), why would you need to bother with the replace? After the profile change you should be able to remove the bad device, add the replacement, change your profile back to RAID1 and then re-balance. I suspect that if you attempt to usereplace
after changing your profile,replace
isn't going to "know" what to do; Since you won't have RAID1 anymorereplace
won't know to complete the steps described above.
– Emmanuel Rosa
Jan 26 at 3:16
@EmmanuelRosa I'm proposing to do theconvert
after thereplace
(re)starts. I've read that doing areplace
is much faster than doing abalance
. There must have been a reason thatreplace
was created at a later date (as opposed to continue usingadd
thenremove
which would rebalance
as part of theremove
).
– Tom Hale
Jan 26 at 6:34
This saysreplace
is 2-3x faster than rebalancing at remove. I also read thatreplace
operates at 90% of the disk's I/O capacity, perhaps this is why. However, things may be different if the failed drive is already removed.
– Tom Hale
Jan 26 at 7:04
add a comment |
If you convert the data to single (and metadata to dup), why would you need to bother with the replace? After the profile change you should be able to remove the bad device, add the replacement, change your profile back to RAID1 and then re-balance. I suspect that if you attempt to usereplace
after changing your profile,replace
isn't going to "know" what to do; Since you won't have RAID1 anymorereplace
won't know to complete the steps described above.
– Emmanuel Rosa
Jan 26 at 3:16
@EmmanuelRosa I'm proposing to do theconvert
after thereplace
(re)starts. I've read that doing areplace
is much faster than doing abalance
. There must have been a reason thatreplace
was created at a later date (as opposed to continue usingadd
thenremove
which would rebalance
as part of theremove
).
– Tom Hale
Jan 26 at 6:34
This saysreplace
is 2-3x faster than rebalancing at remove. I also read thatreplace
operates at 90% of the disk's I/O capacity, perhaps this is why. However, things may be different if the failed drive is already removed.
– Tom Hale
Jan 26 at 7:04
If you convert the data to single (and metadata to dup), why would you need to bother with the replace? After the profile change you should be able to remove the bad device, add the replacement, change your profile back to RAID1 and then re-balance. I suspect that if you attempt to use
replace
after changing your profile, replace
isn't going to "know" what to do; Since you won't have RAID1 anymore replace
won't know to complete the steps described above.– Emmanuel Rosa
Jan 26 at 3:16
If you convert the data to single (and metadata to dup), why would you need to bother with the replace? After the profile change you should be able to remove the bad device, add the replacement, change your profile back to RAID1 and then re-balance. I suspect that if you attempt to use
replace
after changing your profile, replace
isn't going to "know" what to do; Since you won't have RAID1 anymore replace
won't know to complete the steps described above.– Emmanuel Rosa
Jan 26 at 3:16
@EmmanuelRosa I'm proposing to do the
convert
after the replace
(re)starts. I've read that doing a replace
is much faster than doing a balance
. There must have been a reason that replace
was created at a later date (as opposed to continue using add
then remove
which would rebalance
as part of the remove
).– Tom Hale
Jan 26 at 6:34
@EmmanuelRosa I'm proposing to do the
convert
after the replace
(re)starts. I've read that doing a replace
is much faster than doing a balance
. There must have been a reason that replace
was created at a later date (as opposed to continue using add
then remove
which would rebalance
as part of the remove
).– Tom Hale
Jan 26 at 6:34
This says
replace
is 2-3x faster than rebalancing at remove. I also read that replace
operates at 90% of the disk's I/O capacity, perhaps this is why. However, things may be different if the failed drive is already removed.– Tom Hale
Jan 26 at 7:04
This says
replace
is 2-3x faster than rebalancing at remove. I also read that replace
operates at 90% of the disk's I/O capacity, perhaps this is why. However, things may be different if the failed drive is already removed.– Tom Hale
Jan 26 at 7:04
add a comment |
2 Answers
2
active
oldest
votes
This answer mentions writes to the failed disk causing the replace
to grind to a halt.
It suggests to dmsetup
to setup a COW device on top of the failed disk such that any writes succeed.
Caution: In this case the filesystem was enclosed within a dmcrypt
device. See my comment regarding the "gotcha" and potential data loss if this is not the case.
add a comment |
Given that the replace
was crawling, I did the following:
- Ensured that the degraded filesystem was
noauto
in/etc/fstab
- Rebooted the machine (which took about 20 minutes due to I/O hangs)
Disabled the LVM VG containing the btrfs fs on the failed drive:
sudo vgchange -an <failed-vg>
Disabled the failed device:
echo 1 | sudo tee /sys/block/sdb/device/delete
Mounted the filesystem
-o ro,degraded
(degraded
can only be used once)Checked
replace status
and saw it was suspended:Started on 26.Jan 00:36:12, suspended on 26.Jan 10:13:30 at 4.1%, 0 write errs, 0
Mounted
-o remount,rw
and saw thereplace
continue:kernel: BTRFS info (device dm-5): continuing dev_replace from <missing disk> (devid 2) to target /dev/mapper/vg6TBd1-ark @4%
As I'm writing this:
replace status
shows a healthy 0.1% progress every 30 seconds or soiostat -d 1 -m <target-dev>
shows about 145MB/s (Seagate advertises 160MB/s)
Update:
After completion, I noticed that btrfs device usage /mountpoint
was showing some Data,DUP
and Metadata,single
, rather than only RAID1
, so I rebalanced:
btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mountpoint
Also, consider resize
ing if both devices now contain slack:
btrfs filesystem resize max /mountpoint
I would also recommend that you scrub
as I had 262016 correctable csum
errors seemingly related to the interrupted replace
.
BTRFS on top of LVM??
– roaima
Jan 27 at 10:22
1
@roaima I have some filesystems asRAID1
and some assingle
, and the ability to dynamically resize the block devices on which they reside.
– Tom Hale
Jan 28 at 3:43
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f496777%2fbtrfs-replace-on-raid1-is-super-slow-with-failed-disk-present%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
This answer mentions writes to the failed disk causing the replace
to grind to a halt.
It suggests to dmsetup
to setup a COW device on top of the failed disk such that any writes succeed.
Caution: In this case the filesystem was enclosed within a dmcrypt
device. See my comment regarding the "gotcha" and potential data loss if this is not the case.
add a comment |
This answer mentions writes to the failed disk causing the replace
to grind to a halt.
It suggests to dmsetup
to setup a COW device on top of the failed disk such that any writes succeed.
Caution: In this case the filesystem was enclosed within a dmcrypt
device. See my comment regarding the "gotcha" and potential data loss if this is not the case.
add a comment |
This answer mentions writes to the failed disk causing the replace
to grind to a halt.
It suggests to dmsetup
to setup a COW device on top of the failed disk such that any writes succeed.
Caution: In this case the filesystem was enclosed within a dmcrypt
device. See my comment regarding the "gotcha" and potential data loss if this is not the case.
This answer mentions writes to the failed disk causing the replace
to grind to a halt.
It suggests to dmsetup
to setup a COW device on top of the failed disk such that any writes succeed.
Caution: In this case the filesystem was enclosed within a dmcrypt
device. See my comment regarding the "gotcha" and potential data loss if this is not the case.
edited Jan 26 at 10:29
answered Jan 26 at 7:58
Tom HaleTom Hale
6,98833796
6,98833796
add a comment |
add a comment |
Given that the replace
was crawling, I did the following:
- Ensured that the degraded filesystem was
noauto
in/etc/fstab
- Rebooted the machine (which took about 20 minutes due to I/O hangs)
Disabled the LVM VG containing the btrfs fs on the failed drive:
sudo vgchange -an <failed-vg>
Disabled the failed device:
echo 1 | sudo tee /sys/block/sdb/device/delete
Mounted the filesystem
-o ro,degraded
(degraded
can only be used once)Checked
replace status
and saw it was suspended:Started on 26.Jan 00:36:12, suspended on 26.Jan 10:13:30 at 4.1%, 0 write errs, 0
Mounted
-o remount,rw
and saw thereplace
continue:kernel: BTRFS info (device dm-5): continuing dev_replace from <missing disk> (devid 2) to target /dev/mapper/vg6TBd1-ark @4%
As I'm writing this:
replace status
shows a healthy 0.1% progress every 30 seconds or soiostat -d 1 -m <target-dev>
shows about 145MB/s (Seagate advertises 160MB/s)
Update:
After completion, I noticed that btrfs device usage /mountpoint
was showing some Data,DUP
and Metadata,single
, rather than only RAID1
, so I rebalanced:
btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mountpoint
Also, consider resize
ing if both devices now contain slack:
btrfs filesystem resize max /mountpoint
I would also recommend that you scrub
as I had 262016 correctable csum
errors seemingly related to the interrupted replace
.
BTRFS on top of LVM??
– roaima
Jan 27 at 10:22
1
@roaima I have some filesystems asRAID1
and some assingle
, and the ability to dynamically resize the block devices on which they reside.
– Tom Hale
Jan 28 at 3:43
add a comment |
Given that the replace
was crawling, I did the following:
- Ensured that the degraded filesystem was
noauto
in/etc/fstab
- Rebooted the machine (which took about 20 minutes due to I/O hangs)
Disabled the LVM VG containing the btrfs fs on the failed drive:
sudo vgchange -an <failed-vg>
Disabled the failed device:
echo 1 | sudo tee /sys/block/sdb/device/delete
Mounted the filesystem
-o ro,degraded
(degraded
can only be used once)Checked
replace status
and saw it was suspended:Started on 26.Jan 00:36:12, suspended on 26.Jan 10:13:30 at 4.1%, 0 write errs, 0
Mounted
-o remount,rw
and saw thereplace
continue:kernel: BTRFS info (device dm-5): continuing dev_replace from <missing disk> (devid 2) to target /dev/mapper/vg6TBd1-ark @4%
As I'm writing this:
replace status
shows a healthy 0.1% progress every 30 seconds or soiostat -d 1 -m <target-dev>
shows about 145MB/s (Seagate advertises 160MB/s)
Update:
After completion, I noticed that btrfs device usage /mountpoint
was showing some Data,DUP
and Metadata,single
, rather than only RAID1
, so I rebalanced:
btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mountpoint
Also, consider resize
ing if both devices now contain slack:
btrfs filesystem resize max /mountpoint
I would also recommend that you scrub
as I had 262016 correctable csum
errors seemingly related to the interrupted replace
.
BTRFS on top of LVM??
– roaima
Jan 27 at 10:22
1
@roaima I have some filesystems asRAID1
and some assingle
, and the ability to dynamically resize the block devices on which they reside.
– Tom Hale
Jan 28 at 3:43
add a comment |
Given that the replace
was crawling, I did the following:
- Ensured that the degraded filesystem was
noauto
in/etc/fstab
- Rebooted the machine (which took about 20 minutes due to I/O hangs)
Disabled the LVM VG containing the btrfs fs on the failed drive:
sudo vgchange -an <failed-vg>
Disabled the failed device:
echo 1 | sudo tee /sys/block/sdb/device/delete
Mounted the filesystem
-o ro,degraded
(degraded
can only be used once)Checked
replace status
and saw it was suspended:Started on 26.Jan 00:36:12, suspended on 26.Jan 10:13:30 at 4.1%, 0 write errs, 0
Mounted
-o remount,rw
and saw thereplace
continue:kernel: BTRFS info (device dm-5): continuing dev_replace from <missing disk> (devid 2) to target /dev/mapper/vg6TBd1-ark @4%
As I'm writing this:
replace status
shows a healthy 0.1% progress every 30 seconds or soiostat -d 1 -m <target-dev>
shows about 145MB/s (Seagate advertises 160MB/s)
Update:
After completion, I noticed that btrfs device usage /mountpoint
was showing some Data,DUP
and Metadata,single
, rather than only RAID1
, so I rebalanced:
btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mountpoint
Also, consider resize
ing if both devices now contain slack:
btrfs filesystem resize max /mountpoint
I would also recommend that you scrub
as I had 262016 correctable csum
errors seemingly related to the interrupted replace
.
Given that the replace
was crawling, I did the following:
- Ensured that the degraded filesystem was
noauto
in/etc/fstab
- Rebooted the machine (which took about 20 minutes due to I/O hangs)
Disabled the LVM VG containing the btrfs fs on the failed drive:
sudo vgchange -an <failed-vg>
Disabled the failed device:
echo 1 | sudo tee /sys/block/sdb/device/delete
Mounted the filesystem
-o ro,degraded
(degraded
can only be used once)Checked
replace status
and saw it was suspended:Started on 26.Jan 00:36:12, suspended on 26.Jan 10:13:30 at 4.1%, 0 write errs, 0
Mounted
-o remount,rw
and saw thereplace
continue:kernel: BTRFS info (device dm-5): continuing dev_replace from <missing disk> (devid 2) to target /dev/mapper/vg6TBd1-ark @4%
As I'm writing this:
replace status
shows a healthy 0.1% progress every 30 seconds or soiostat -d 1 -m <target-dev>
shows about 145MB/s (Seagate advertises 160MB/s)
Update:
After completion, I noticed that btrfs device usage /mountpoint
was showing some Data,DUP
and Metadata,single
, rather than only RAID1
, so I rebalanced:
btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mountpoint
Also, consider resize
ing if both devices now contain slack:
btrfs filesystem resize max /mountpoint
I would also recommend that you scrub
as I had 262016 correctable csum
errors seemingly related to the interrupted replace
.
edited Jan 28 at 11:23
answered Jan 26 at 10:51
Tom HaleTom Hale
6,98833796
6,98833796
BTRFS on top of LVM??
– roaima
Jan 27 at 10:22
1
@roaima I have some filesystems asRAID1
and some assingle
, and the ability to dynamically resize the block devices on which they reside.
– Tom Hale
Jan 28 at 3:43
add a comment |
BTRFS on top of LVM??
– roaima
Jan 27 at 10:22
1
@roaima I have some filesystems asRAID1
and some assingle
, and the ability to dynamically resize the block devices on which they reside.
– Tom Hale
Jan 28 at 3:43
BTRFS on top of LVM??
– roaima
Jan 27 at 10:22
BTRFS on top of LVM??
– roaima
Jan 27 at 10:22
1
1
@roaima I have some filesystems as
RAID1
and some as single
, and the ability to dynamically resize the block devices on which they reside.– Tom Hale
Jan 28 at 3:43
@roaima I have some filesystems as
RAID1
and some as single
, and the ability to dynamically resize the block devices on which they reside.– Tom Hale
Jan 28 at 3:43
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f496777%2fbtrfs-replace-on-raid1-is-super-slow-with-failed-disk-present%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you convert the data to single (and metadata to dup), why would you need to bother with the replace? After the profile change you should be able to remove the bad device, add the replacement, change your profile back to RAID1 and then re-balance. I suspect that if you attempt to use
replace
after changing your profile,replace
isn't going to "know" what to do; Since you won't have RAID1 anymorereplace
won't know to complete the steps described above.– Emmanuel Rosa
Jan 26 at 3:16
@EmmanuelRosa I'm proposing to do the
convert
after thereplace
(re)starts. I've read that doing areplace
is much faster than doing abalance
. There must have been a reason thatreplace
was created at a later date (as opposed to continue usingadd
thenremove
which would rebalance
as part of theremove
).– Tom Hale
Jan 26 at 6:34
This says
replace
is 2-3x faster than rebalancing at remove. I also read thatreplace
operates at 90% of the disk's I/O capacity, perhaps this is why. However, things may be different if the failed drive is already removed.– Tom Hale
Jan 26 at 7:04