How to recover a btrfs filesystem with two identical devices
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
TL;DR: due to a long story, I got a Btrfs RAID1 filesystem comprised of devices /dev/sde1
and /dev/sde1
(IDs 1 and 2). Btrfs will not mount rw, saying one device is missing. How do I figure out which of these devices is the working one so I can remove the other, and how do I add the correct second drive (/dev/sdb1
)? Currently, adding a drive fails because I can only mount read-only.
I have two external hard drives with Btrfs in RAID1 (mirror). Drive A is fine, but drive B got millions of errors during a regular scrub. While testing, device B seems fine, so I guess they just got out of sync (the host is a laptop and can survive a power outage due to its battery, but the drives cannot, so I guess one came online before the other or something). I wanted to rebuild the mirror from device A on device B.
After some searching, I figured the replace
subcommand was the thing to use. I want to replace device B with, yup, device B. Naturally, I tried whether btrfs understands this command:
btrfs replace start /dev/deviceB /dev/deviceB /mountpoint
Unfortunately that didn't work. The man page says: "On a live filesystem, [start] duplicate[s] the data to the target device which is currently stored on the source device." So I just passed the other available device instead, because it can duplicate from there:
btrfs replace start /dev/deviceA /dev/deviceB /mountpoint
I should have read the man page better, because later on it says "After completion of the operation, the source device is removed from the filesystem." So now I have a filesystem with only /dev/deviceB
in it.
But it never removed the original (corrupt) device B.
So now I have this situation:
$ btrfs device usage /mountpoint
/dev/sde1, ID: 1
Device size: 3.64TiB
Device slack: 0.00B
Data,single: 1.00GiB
Data,RAID1: 2.00TiB
Data,DUP: 40.91GiB
Metadata,single: 1.00GiB
Metadata,RAID1: 5.00GiB
Metadata,DUP: 3.00GiB
System,single: 32.00MiB
System,RAID1: 32.00MiB
System,DUP: 128.00MiB
Unallocated: 1.59TiB
/dev/sde1, ID: 2
Device size: 3.64TiB
Device slack: 0.00B
Data,RAID1: 2.00TiB
Metadata,RAID1: 5.00GiB
System,RAID1: 32.00MiB
Unallocated: 1.63TiB
(Where /dev/sde1
is device B. I am able to mount it with -o degraded,ro
.)
How should I resolve this situation?
I tried adding device A (sdb1
) but that fails, saying "ERROR: error adding device '/dev/sdb1': Read-only file system". I am not sure how to proceed, as I cannot tell which device ID is which, so removing either (in order to let it mount rw) might be catastrophic. And I'm not sure removing a device is the best course of action at this point anyway. Perhaps I should (after figuring out which device ID it is) use replace
with a device ID as argument instead?
The filesystem on device A is no longer recognized as Btrfs, and when inspecting it with a hexdumper, it indeed seems invalid: it used to contain the literal string BTRFS somewhere near the beginning (iirc just after 0x10 000) but no longer does. The data still seems to be there, just not the correct header (first non-zero data is now at 0x400 000).
data-recovery btrfs
add a comment |Â
up vote
1
down vote
favorite
TL;DR: due to a long story, I got a Btrfs RAID1 filesystem comprised of devices /dev/sde1
and /dev/sde1
(IDs 1 and 2). Btrfs will not mount rw, saying one device is missing. How do I figure out which of these devices is the working one so I can remove the other, and how do I add the correct second drive (/dev/sdb1
)? Currently, adding a drive fails because I can only mount read-only.
I have two external hard drives with Btrfs in RAID1 (mirror). Drive A is fine, but drive B got millions of errors during a regular scrub. While testing, device B seems fine, so I guess they just got out of sync (the host is a laptop and can survive a power outage due to its battery, but the drives cannot, so I guess one came online before the other or something). I wanted to rebuild the mirror from device A on device B.
After some searching, I figured the replace
subcommand was the thing to use. I want to replace device B with, yup, device B. Naturally, I tried whether btrfs understands this command:
btrfs replace start /dev/deviceB /dev/deviceB /mountpoint
Unfortunately that didn't work. The man page says: "On a live filesystem, [start] duplicate[s] the data to the target device which is currently stored on the source device." So I just passed the other available device instead, because it can duplicate from there:
btrfs replace start /dev/deviceA /dev/deviceB /mountpoint
I should have read the man page better, because later on it says "After completion of the operation, the source device is removed from the filesystem." So now I have a filesystem with only /dev/deviceB
in it.
But it never removed the original (corrupt) device B.
So now I have this situation:
$ btrfs device usage /mountpoint
/dev/sde1, ID: 1
Device size: 3.64TiB
Device slack: 0.00B
Data,single: 1.00GiB
Data,RAID1: 2.00TiB
Data,DUP: 40.91GiB
Metadata,single: 1.00GiB
Metadata,RAID1: 5.00GiB
Metadata,DUP: 3.00GiB
System,single: 32.00MiB
System,RAID1: 32.00MiB
System,DUP: 128.00MiB
Unallocated: 1.59TiB
/dev/sde1, ID: 2
Device size: 3.64TiB
Device slack: 0.00B
Data,RAID1: 2.00TiB
Metadata,RAID1: 5.00GiB
System,RAID1: 32.00MiB
Unallocated: 1.63TiB
(Where /dev/sde1
is device B. I am able to mount it with -o degraded,ro
.)
How should I resolve this situation?
I tried adding device A (sdb1
) but that fails, saying "ERROR: error adding device '/dev/sdb1': Read-only file system". I am not sure how to proceed, as I cannot tell which device ID is which, so removing either (in order to let it mount rw) might be catastrophic. And I'm not sure removing a device is the best course of action at this point anyway. Perhaps I should (after figuring out which device ID it is) use replace
with a device ID as argument instead?
The filesystem on device A is no longer recognized as Btrfs, and when inspecting it with a hexdumper, it indeed seems invalid: it used to contain the literal string BTRFS somewhere near the beginning (iirc just after 0x10 000) but no longer does. The data still seems to be there, just not the correct header (first non-zero data is now at 0x400 000).
data-recovery btrfs
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
TL;DR: due to a long story, I got a Btrfs RAID1 filesystem comprised of devices /dev/sde1
and /dev/sde1
(IDs 1 and 2). Btrfs will not mount rw, saying one device is missing. How do I figure out which of these devices is the working one so I can remove the other, and how do I add the correct second drive (/dev/sdb1
)? Currently, adding a drive fails because I can only mount read-only.
I have two external hard drives with Btrfs in RAID1 (mirror). Drive A is fine, but drive B got millions of errors during a regular scrub. While testing, device B seems fine, so I guess they just got out of sync (the host is a laptop and can survive a power outage due to its battery, but the drives cannot, so I guess one came online before the other or something). I wanted to rebuild the mirror from device A on device B.
After some searching, I figured the replace
subcommand was the thing to use. I want to replace device B with, yup, device B. Naturally, I tried whether btrfs understands this command:
btrfs replace start /dev/deviceB /dev/deviceB /mountpoint
Unfortunately that didn't work. The man page says: "On a live filesystem, [start] duplicate[s] the data to the target device which is currently stored on the source device." So I just passed the other available device instead, because it can duplicate from there:
btrfs replace start /dev/deviceA /dev/deviceB /mountpoint
I should have read the man page better, because later on it says "After completion of the operation, the source device is removed from the filesystem." So now I have a filesystem with only /dev/deviceB
in it.
But it never removed the original (corrupt) device B.
So now I have this situation:
$ btrfs device usage /mountpoint
/dev/sde1, ID: 1
Device size: 3.64TiB
Device slack: 0.00B
Data,single: 1.00GiB
Data,RAID1: 2.00TiB
Data,DUP: 40.91GiB
Metadata,single: 1.00GiB
Metadata,RAID1: 5.00GiB
Metadata,DUP: 3.00GiB
System,single: 32.00MiB
System,RAID1: 32.00MiB
System,DUP: 128.00MiB
Unallocated: 1.59TiB
/dev/sde1, ID: 2
Device size: 3.64TiB
Device slack: 0.00B
Data,RAID1: 2.00TiB
Metadata,RAID1: 5.00GiB
System,RAID1: 32.00MiB
Unallocated: 1.63TiB
(Where /dev/sde1
is device B. I am able to mount it with -o degraded,ro
.)
How should I resolve this situation?
I tried adding device A (sdb1
) but that fails, saying "ERROR: error adding device '/dev/sdb1': Read-only file system". I am not sure how to proceed, as I cannot tell which device ID is which, so removing either (in order to let it mount rw) might be catastrophic. And I'm not sure removing a device is the best course of action at this point anyway. Perhaps I should (after figuring out which device ID it is) use replace
with a device ID as argument instead?
The filesystem on device A is no longer recognized as Btrfs, and when inspecting it with a hexdumper, it indeed seems invalid: it used to contain the literal string BTRFS somewhere near the beginning (iirc just after 0x10 000) but no longer does. The data still seems to be there, just not the correct header (first non-zero data is now at 0x400 000).
data-recovery btrfs
TL;DR: due to a long story, I got a Btrfs RAID1 filesystem comprised of devices /dev/sde1
and /dev/sde1
(IDs 1 and 2). Btrfs will not mount rw, saying one device is missing. How do I figure out which of these devices is the working one so I can remove the other, and how do I add the correct second drive (/dev/sdb1
)? Currently, adding a drive fails because I can only mount read-only.
I have two external hard drives with Btrfs in RAID1 (mirror). Drive A is fine, but drive B got millions of errors during a regular scrub. While testing, device B seems fine, so I guess they just got out of sync (the host is a laptop and can survive a power outage due to its battery, but the drives cannot, so I guess one came online before the other or something). I wanted to rebuild the mirror from device A on device B.
After some searching, I figured the replace
subcommand was the thing to use. I want to replace device B with, yup, device B. Naturally, I tried whether btrfs understands this command:
btrfs replace start /dev/deviceB /dev/deviceB /mountpoint
Unfortunately that didn't work. The man page says: "On a live filesystem, [start] duplicate[s] the data to the target device which is currently stored on the source device." So I just passed the other available device instead, because it can duplicate from there:
btrfs replace start /dev/deviceA /dev/deviceB /mountpoint
I should have read the man page better, because later on it says "After completion of the operation, the source device is removed from the filesystem." So now I have a filesystem with only /dev/deviceB
in it.
But it never removed the original (corrupt) device B.
So now I have this situation:
$ btrfs device usage /mountpoint
/dev/sde1, ID: 1
Device size: 3.64TiB
Device slack: 0.00B
Data,single: 1.00GiB
Data,RAID1: 2.00TiB
Data,DUP: 40.91GiB
Metadata,single: 1.00GiB
Metadata,RAID1: 5.00GiB
Metadata,DUP: 3.00GiB
System,single: 32.00MiB
System,RAID1: 32.00MiB
System,DUP: 128.00MiB
Unallocated: 1.59TiB
/dev/sde1, ID: 2
Device size: 3.64TiB
Device slack: 0.00B
Data,RAID1: 2.00TiB
Metadata,RAID1: 5.00GiB
System,RAID1: 32.00MiB
Unallocated: 1.63TiB
(Where /dev/sde1
is device B. I am able to mount it with -o degraded,ro
.)
How should I resolve this situation?
I tried adding device A (sdb1
) but that fails, saying "ERROR: error adding device '/dev/sdb1': Read-only file system". I am not sure how to proceed, as I cannot tell which device ID is which, so removing either (in order to let it mount rw) might be catastrophic. And I'm not sure removing a device is the best course of action at this point anyway. Perhaps I should (after figuring out which device ID it is) use replace
with a device ID as argument instead?
The filesystem on device A is no longer recognized as Btrfs, and when inspecting it with a hexdumper, it indeed seems invalid: it used to contain the literal string BTRFS somewhere near the beginning (iirc just after 0x10 000) but no longer does. The data still seems to be there, just not the correct header (first non-zero data is now at 0x400 000).
data-recovery btrfs
asked Apr 10 at 16:54
Luc
799616
799616
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
I'll start by giving the usual caveat that goes unheeded (I'm guilty too)... Backup your data NOW. Of course if you had enough freespace to backup your data, you would just recreate the filesystem, right? And keep in mind this is an art and hard to get right, which is why I suggest conversing with the guys on the btrfs IRC channel on irc.freenode.net.
The first thing I'd try is to recover device A. This might be accomplished with btrfs rescue super-recover /dev/deviceA
or btrfsck --repair /dev/deviceA
. If either is successful, then you can wipe deviceB
and add it as a new device (or perhaps replace if deviceA
still thinks its raided).
Otherwise, in a situation like this, I like to first use dm-snapshot
to create a snapshot of the device and work on the snapshot, so I don't make bad things worse. Sometimes it takes me a few tries to get the sequence of btrfs command right. You'll need a lot of free space for the snapshot file (based on above I'm thinking 10-100G).
Looking at the output above, devid 1
appears to be the one you want to keep because it has more used space than devid 2
. Running btrfs filesystem show
can also give more information about which drive is missing (look for the devid that isn't listed or that has no device path next to it). Make sure that you've not mounted the btrfs as readonly, because otherwise you won't be able to do any writes to fix it. You could try first removing the device using btrfs device delete missing /mountpoint
and if that doesn't work btrfs device remove 2 /mountpoint
. If that fails, try converting blocks from RAID1
to single
with btrfs balance -mconvert=single -sconvert=single -dconvert=single /mountpoint
and then try the device removal again. If anything is successful, then you can add deviceA
as a device and reconvert everything back to RAID1
. And these convert commands can take a lot of time, so patience is required.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
I'll start by giving the usual caveat that goes unheeded (I'm guilty too)... Backup your data NOW. Of course if you had enough freespace to backup your data, you would just recreate the filesystem, right? And keep in mind this is an art and hard to get right, which is why I suggest conversing with the guys on the btrfs IRC channel on irc.freenode.net.
The first thing I'd try is to recover device A. This might be accomplished with btrfs rescue super-recover /dev/deviceA
or btrfsck --repair /dev/deviceA
. If either is successful, then you can wipe deviceB
and add it as a new device (or perhaps replace if deviceA
still thinks its raided).
Otherwise, in a situation like this, I like to first use dm-snapshot
to create a snapshot of the device and work on the snapshot, so I don't make bad things worse. Sometimes it takes me a few tries to get the sequence of btrfs command right. You'll need a lot of free space for the snapshot file (based on above I'm thinking 10-100G).
Looking at the output above, devid 1
appears to be the one you want to keep because it has more used space than devid 2
. Running btrfs filesystem show
can also give more information about which drive is missing (look for the devid that isn't listed or that has no device path next to it). Make sure that you've not mounted the btrfs as readonly, because otherwise you won't be able to do any writes to fix it. You could try first removing the device using btrfs device delete missing /mountpoint
and if that doesn't work btrfs device remove 2 /mountpoint
. If that fails, try converting blocks from RAID1
to single
with btrfs balance -mconvert=single -sconvert=single -dconvert=single /mountpoint
and then try the device removal again. If anything is successful, then you can add deviceA
as a device and reconvert everything back to RAID1
. And these convert commands can take a lot of time, so patience is required.
add a comment |Â
up vote
1
down vote
I'll start by giving the usual caveat that goes unheeded (I'm guilty too)... Backup your data NOW. Of course if you had enough freespace to backup your data, you would just recreate the filesystem, right? And keep in mind this is an art and hard to get right, which is why I suggest conversing with the guys on the btrfs IRC channel on irc.freenode.net.
The first thing I'd try is to recover device A. This might be accomplished with btrfs rescue super-recover /dev/deviceA
or btrfsck --repair /dev/deviceA
. If either is successful, then you can wipe deviceB
and add it as a new device (or perhaps replace if deviceA
still thinks its raided).
Otherwise, in a situation like this, I like to first use dm-snapshot
to create a snapshot of the device and work on the snapshot, so I don't make bad things worse. Sometimes it takes me a few tries to get the sequence of btrfs command right. You'll need a lot of free space for the snapshot file (based on above I'm thinking 10-100G).
Looking at the output above, devid 1
appears to be the one you want to keep because it has more used space than devid 2
. Running btrfs filesystem show
can also give more information about which drive is missing (look for the devid that isn't listed or that has no device path next to it). Make sure that you've not mounted the btrfs as readonly, because otherwise you won't be able to do any writes to fix it. You could try first removing the device using btrfs device delete missing /mountpoint
and if that doesn't work btrfs device remove 2 /mountpoint
. If that fails, try converting blocks from RAID1
to single
with btrfs balance -mconvert=single -sconvert=single -dconvert=single /mountpoint
and then try the device removal again. If anything is successful, then you can add deviceA
as a device and reconvert everything back to RAID1
. And these convert commands can take a lot of time, so patience is required.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
I'll start by giving the usual caveat that goes unheeded (I'm guilty too)... Backup your data NOW. Of course if you had enough freespace to backup your data, you would just recreate the filesystem, right? And keep in mind this is an art and hard to get right, which is why I suggest conversing with the guys on the btrfs IRC channel on irc.freenode.net.
The first thing I'd try is to recover device A. This might be accomplished with btrfs rescue super-recover /dev/deviceA
or btrfsck --repair /dev/deviceA
. If either is successful, then you can wipe deviceB
and add it as a new device (or perhaps replace if deviceA
still thinks its raided).
Otherwise, in a situation like this, I like to first use dm-snapshot
to create a snapshot of the device and work on the snapshot, so I don't make bad things worse. Sometimes it takes me a few tries to get the sequence of btrfs command right. You'll need a lot of free space for the snapshot file (based on above I'm thinking 10-100G).
Looking at the output above, devid 1
appears to be the one you want to keep because it has more used space than devid 2
. Running btrfs filesystem show
can also give more information about which drive is missing (look for the devid that isn't listed or that has no device path next to it). Make sure that you've not mounted the btrfs as readonly, because otherwise you won't be able to do any writes to fix it. You could try first removing the device using btrfs device delete missing /mountpoint
and if that doesn't work btrfs device remove 2 /mountpoint
. If that fails, try converting blocks from RAID1
to single
with btrfs balance -mconvert=single -sconvert=single -dconvert=single /mountpoint
and then try the device removal again. If anything is successful, then you can add deviceA
as a device and reconvert everything back to RAID1
. And these convert commands can take a lot of time, so patience is required.
I'll start by giving the usual caveat that goes unheeded (I'm guilty too)... Backup your data NOW. Of course if you had enough freespace to backup your data, you would just recreate the filesystem, right? And keep in mind this is an art and hard to get right, which is why I suggest conversing with the guys on the btrfs IRC channel on irc.freenode.net.
The first thing I'd try is to recover device A. This might be accomplished with btrfs rescue super-recover /dev/deviceA
or btrfsck --repair /dev/deviceA
. If either is successful, then you can wipe deviceB
and add it as a new device (or perhaps replace if deviceA
still thinks its raided).
Otherwise, in a situation like this, I like to first use dm-snapshot
to create a snapshot of the device and work on the snapshot, so I don't make bad things worse. Sometimes it takes me a few tries to get the sequence of btrfs command right. You'll need a lot of free space for the snapshot file (based on above I'm thinking 10-100G).
Looking at the output above, devid 1
appears to be the one you want to keep because it has more used space than devid 2
. Running btrfs filesystem show
can also give more information about which drive is missing (look for the devid that isn't listed or that has no device path next to it). Make sure that you've not mounted the btrfs as readonly, because otherwise you won't be able to do any writes to fix it. You could try first removing the device using btrfs device delete missing /mountpoint
and if that doesn't work btrfs device remove 2 /mountpoint
. If that fails, try converting blocks from RAID1
to single
with btrfs balance -mconvert=single -sconvert=single -dconvert=single /mountpoint
and then try the device removal again. If anything is successful, then you can add deviceA
as a device and reconvert everything back to RAID1
. And these convert commands can take a lot of time, so patience is required.
edited Jun 23 at 9:38
answered Jun 23 at 9:28
crass
963
963
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f436820%2fhow-to-recover-a-btrfs-filesystem-with-two-identical-devices%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password