How to recover a btrfs filesystem with two identical devices

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












TL;DR: due to a long story, I got a Btrfs RAID1 filesystem comprised of devices /dev/sde1 and /dev/sde1 (IDs 1 and 2). Btrfs will not mount rw, saying one device is missing. How do I figure out which of these devices is the working one so I can remove the other, and how do I add the correct second drive (/dev/sdb1)? Currently, adding a drive fails because I can only mount read-only.




I have two external hard drives with Btrfs in RAID1 (mirror). Drive A is fine, but drive B got millions of errors during a regular scrub. While testing, device B seems fine, so I guess they just got out of sync (the host is a laptop and can survive a power outage due to its battery, but the drives cannot, so I guess one came online before the other or something). I wanted to rebuild the mirror from device A on device B.



After some searching, I figured the replace subcommand was the thing to use. I want to replace device B with, yup, device B. Naturally, I tried whether btrfs understands this command:



btrfs replace start /dev/deviceB /dev/deviceB /mountpoint



Unfortunately that didn't work. The man page says: "On a live filesystem, [start] duplicate[s] the data to the target device which is currently stored on the source device." So I just passed the other available device instead, because it can duplicate from there:



btrfs replace start /dev/deviceA /dev/deviceB /mountpoint



I should have read the man page better, because later on it says "After completion of the operation, the source device is removed from the filesystem." So now I have a filesystem with only /dev/deviceB in it.



But it never removed the original (corrupt) device B.



So now I have this situation:



$ btrfs device usage /mountpoint
/dev/sde1, ID: 1
Device size: 3.64TiB
Device slack: 0.00B
Data,single: 1.00GiB
Data,RAID1: 2.00TiB
Data,DUP: 40.91GiB
Metadata,single: 1.00GiB
Metadata,RAID1: 5.00GiB
Metadata,DUP: 3.00GiB
System,single: 32.00MiB
System,RAID1: 32.00MiB
System,DUP: 128.00MiB
Unallocated: 1.59TiB

/dev/sde1, ID: 2
Device size: 3.64TiB
Device slack: 0.00B
Data,RAID1: 2.00TiB
Metadata,RAID1: 5.00GiB
System,RAID1: 32.00MiB
Unallocated: 1.63TiB


(Where /dev/sde1 is device B. I am able to mount it with -o degraded,ro.)



How should I resolve this situation?



I tried adding device A (sdb1) but that fails, saying "ERROR: error adding device '/dev/sdb1': Read-only file system". I am not sure how to proceed, as I cannot tell which device ID is which, so removing either (in order to let it mount rw) might be catastrophic. And I'm not sure removing a device is the best course of action at this point anyway. Perhaps I should (after figuring out which device ID it is) use replace with a device ID as argument instead?



The filesystem on device A is no longer recognized as Btrfs, and when inspecting it with a hexdumper, it indeed seems invalid: it used to contain the literal string BTRFS somewhere near the beginning (iirc just after 0x10 000) but no longer does. The data still seems to be there, just not the correct header (first non-zero data is now at 0x400 000).







share|improve this question
























    up vote
    1
    down vote

    favorite












    TL;DR: due to a long story, I got a Btrfs RAID1 filesystem comprised of devices /dev/sde1 and /dev/sde1 (IDs 1 and 2). Btrfs will not mount rw, saying one device is missing. How do I figure out which of these devices is the working one so I can remove the other, and how do I add the correct second drive (/dev/sdb1)? Currently, adding a drive fails because I can only mount read-only.




    I have two external hard drives with Btrfs in RAID1 (mirror). Drive A is fine, but drive B got millions of errors during a regular scrub. While testing, device B seems fine, so I guess they just got out of sync (the host is a laptop and can survive a power outage due to its battery, but the drives cannot, so I guess one came online before the other or something). I wanted to rebuild the mirror from device A on device B.



    After some searching, I figured the replace subcommand was the thing to use. I want to replace device B with, yup, device B. Naturally, I tried whether btrfs understands this command:



    btrfs replace start /dev/deviceB /dev/deviceB /mountpoint



    Unfortunately that didn't work. The man page says: "On a live filesystem, [start] duplicate[s] the data to the target device which is currently stored on the source device." So I just passed the other available device instead, because it can duplicate from there:



    btrfs replace start /dev/deviceA /dev/deviceB /mountpoint



    I should have read the man page better, because later on it says "After completion of the operation, the source device is removed from the filesystem." So now I have a filesystem with only /dev/deviceB in it.



    But it never removed the original (corrupt) device B.



    So now I have this situation:



    $ btrfs device usage /mountpoint
    /dev/sde1, ID: 1
    Device size: 3.64TiB
    Device slack: 0.00B
    Data,single: 1.00GiB
    Data,RAID1: 2.00TiB
    Data,DUP: 40.91GiB
    Metadata,single: 1.00GiB
    Metadata,RAID1: 5.00GiB
    Metadata,DUP: 3.00GiB
    System,single: 32.00MiB
    System,RAID1: 32.00MiB
    System,DUP: 128.00MiB
    Unallocated: 1.59TiB

    /dev/sde1, ID: 2
    Device size: 3.64TiB
    Device slack: 0.00B
    Data,RAID1: 2.00TiB
    Metadata,RAID1: 5.00GiB
    System,RAID1: 32.00MiB
    Unallocated: 1.63TiB


    (Where /dev/sde1 is device B. I am able to mount it with -o degraded,ro.)



    How should I resolve this situation?



    I tried adding device A (sdb1) but that fails, saying "ERROR: error adding device '/dev/sdb1': Read-only file system". I am not sure how to proceed, as I cannot tell which device ID is which, so removing either (in order to let it mount rw) might be catastrophic. And I'm not sure removing a device is the best course of action at this point anyway. Perhaps I should (after figuring out which device ID it is) use replace with a device ID as argument instead?



    The filesystem on device A is no longer recognized as Btrfs, and when inspecting it with a hexdumper, it indeed seems invalid: it used to contain the literal string BTRFS somewhere near the beginning (iirc just after 0x10 000) but no longer does. The data still seems to be there, just not the correct header (first non-zero data is now at 0x400 000).







    share|improve this question






















      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      TL;DR: due to a long story, I got a Btrfs RAID1 filesystem comprised of devices /dev/sde1 and /dev/sde1 (IDs 1 and 2). Btrfs will not mount rw, saying one device is missing. How do I figure out which of these devices is the working one so I can remove the other, and how do I add the correct second drive (/dev/sdb1)? Currently, adding a drive fails because I can only mount read-only.




      I have two external hard drives with Btrfs in RAID1 (mirror). Drive A is fine, but drive B got millions of errors during a regular scrub. While testing, device B seems fine, so I guess they just got out of sync (the host is a laptop and can survive a power outage due to its battery, but the drives cannot, so I guess one came online before the other or something). I wanted to rebuild the mirror from device A on device B.



      After some searching, I figured the replace subcommand was the thing to use. I want to replace device B with, yup, device B. Naturally, I tried whether btrfs understands this command:



      btrfs replace start /dev/deviceB /dev/deviceB /mountpoint



      Unfortunately that didn't work. The man page says: "On a live filesystem, [start] duplicate[s] the data to the target device which is currently stored on the source device." So I just passed the other available device instead, because it can duplicate from there:



      btrfs replace start /dev/deviceA /dev/deviceB /mountpoint



      I should have read the man page better, because later on it says "After completion of the operation, the source device is removed from the filesystem." So now I have a filesystem with only /dev/deviceB in it.



      But it never removed the original (corrupt) device B.



      So now I have this situation:



      $ btrfs device usage /mountpoint
      /dev/sde1, ID: 1
      Device size: 3.64TiB
      Device slack: 0.00B
      Data,single: 1.00GiB
      Data,RAID1: 2.00TiB
      Data,DUP: 40.91GiB
      Metadata,single: 1.00GiB
      Metadata,RAID1: 5.00GiB
      Metadata,DUP: 3.00GiB
      System,single: 32.00MiB
      System,RAID1: 32.00MiB
      System,DUP: 128.00MiB
      Unallocated: 1.59TiB

      /dev/sde1, ID: 2
      Device size: 3.64TiB
      Device slack: 0.00B
      Data,RAID1: 2.00TiB
      Metadata,RAID1: 5.00GiB
      System,RAID1: 32.00MiB
      Unallocated: 1.63TiB


      (Where /dev/sde1 is device B. I am able to mount it with -o degraded,ro.)



      How should I resolve this situation?



      I tried adding device A (sdb1) but that fails, saying "ERROR: error adding device '/dev/sdb1': Read-only file system". I am not sure how to proceed, as I cannot tell which device ID is which, so removing either (in order to let it mount rw) might be catastrophic. And I'm not sure removing a device is the best course of action at this point anyway. Perhaps I should (after figuring out which device ID it is) use replace with a device ID as argument instead?



      The filesystem on device A is no longer recognized as Btrfs, and when inspecting it with a hexdumper, it indeed seems invalid: it used to contain the literal string BTRFS somewhere near the beginning (iirc just after 0x10 000) but no longer does. The data still seems to be there, just not the correct header (first non-zero data is now at 0x400 000).







      share|improve this question












      TL;DR: due to a long story, I got a Btrfs RAID1 filesystem comprised of devices /dev/sde1 and /dev/sde1 (IDs 1 and 2). Btrfs will not mount rw, saying one device is missing. How do I figure out which of these devices is the working one so I can remove the other, and how do I add the correct second drive (/dev/sdb1)? Currently, adding a drive fails because I can only mount read-only.




      I have two external hard drives with Btrfs in RAID1 (mirror). Drive A is fine, but drive B got millions of errors during a regular scrub. While testing, device B seems fine, so I guess they just got out of sync (the host is a laptop and can survive a power outage due to its battery, but the drives cannot, so I guess one came online before the other or something). I wanted to rebuild the mirror from device A on device B.



      After some searching, I figured the replace subcommand was the thing to use. I want to replace device B with, yup, device B. Naturally, I tried whether btrfs understands this command:



      btrfs replace start /dev/deviceB /dev/deviceB /mountpoint



      Unfortunately that didn't work. The man page says: "On a live filesystem, [start] duplicate[s] the data to the target device which is currently stored on the source device." So I just passed the other available device instead, because it can duplicate from there:



      btrfs replace start /dev/deviceA /dev/deviceB /mountpoint



      I should have read the man page better, because later on it says "After completion of the operation, the source device is removed from the filesystem." So now I have a filesystem with only /dev/deviceB in it.



      But it never removed the original (corrupt) device B.



      So now I have this situation:



      $ btrfs device usage /mountpoint
      /dev/sde1, ID: 1
      Device size: 3.64TiB
      Device slack: 0.00B
      Data,single: 1.00GiB
      Data,RAID1: 2.00TiB
      Data,DUP: 40.91GiB
      Metadata,single: 1.00GiB
      Metadata,RAID1: 5.00GiB
      Metadata,DUP: 3.00GiB
      System,single: 32.00MiB
      System,RAID1: 32.00MiB
      System,DUP: 128.00MiB
      Unallocated: 1.59TiB

      /dev/sde1, ID: 2
      Device size: 3.64TiB
      Device slack: 0.00B
      Data,RAID1: 2.00TiB
      Metadata,RAID1: 5.00GiB
      System,RAID1: 32.00MiB
      Unallocated: 1.63TiB


      (Where /dev/sde1 is device B. I am able to mount it with -o degraded,ro.)



      How should I resolve this situation?



      I tried adding device A (sdb1) but that fails, saying "ERROR: error adding device '/dev/sdb1': Read-only file system". I am not sure how to proceed, as I cannot tell which device ID is which, so removing either (in order to let it mount rw) might be catastrophic. And I'm not sure removing a device is the best course of action at this point anyway. Perhaps I should (after figuring out which device ID it is) use replace with a device ID as argument instead?



      The filesystem on device A is no longer recognized as Btrfs, and when inspecting it with a hexdumper, it indeed seems invalid: it used to contain the literal string BTRFS somewhere near the beginning (iirc just after 0x10 000) but no longer does. The data still seems to be there, just not the correct header (first non-zero data is now at 0x400 000).









      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 10 at 16:54









      Luc

      799616




      799616




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote













          I'll start by giving the usual caveat that goes unheeded (I'm guilty too)... Backup your data NOW. Of course if you had enough freespace to backup your data, you would just recreate the filesystem, right? And keep in mind this is an art and hard to get right, which is why I suggest conversing with the guys on the btrfs IRC channel on irc.freenode.net.



          The first thing I'd try is to recover device A. This might be accomplished with btrfs rescue super-recover /dev/deviceA or btrfsck --repair /dev/deviceA. If either is successful, then you can wipe deviceB and add it as a new device (or perhaps replace if deviceA still thinks its raided).



          Otherwise, in a situation like this, I like to first use dm-snapshot to create a snapshot of the device and work on the snapshot, so I don't make bad things worse. Sometimes it takes me a few tries to get the sequence of btrfs command right. You'll need a lot of free space for the snapshot file (based on above I'm thinking 10-100G).



          Looking at the output above, devid 1 appears to be the one you want to keep because it has more used space than devid 2. Running btrfs filesystem show can also give more information about which drive is missing (look for the devid that isn't listed or that has no device path next to it). Make sure that you've not mounted the btrfs as readonly, because otherwise you won't be able to do any writes to fix it. You could try first removing the device using btrfs device delete missing /mountpoint and if that doesn't work btrfs device remove 2 /mountpoint. If that fails, try converting blocks from RAID1 to single with btrfs balance -mconvert=single -sconvert=single -dconvert=single /mountpoint and then try the device removal again. If anything is successful, then you can add deviceA as a device and reconvert everything back to RAID1. And these convert commands can take a lot of time, so patience is required.






          share|improve this answer






















            Your Answer







            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "106"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f436820%2fhow-to-recover-a-btrfs-filesystem-with-two-identical-devices%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote













            I'll start by giving the usual caveat that goes unheeded (I'm guilty too)... Backup your data NOW. Of course if you had enough freespace to backup your data, you would just recreate the filesystem, right? And keep in mind this is an art and hard to get right, which is why I suggest conversing with the guys on the btrfs IRC channel on irc.freenode.net.



            The first thing I'd try is to recover device A. This might be accomplished with btrfs rescue super-recover /dev/deviceA or btrfsck --repair /dev/deviceA. If either is successful, then you can wipe deviceB and add it as a new device (or perhaps replace if deviceA still thinks its raided).



            Otherwise, in a situation like this, I like to first use dm-snapshot to create a snapshot of the device and work on the snapshot, so I don't make bad things worse. Sometimes it takes me a few tries to get the sequence of btrfs command right. You'll need a lot of free space for the snapshot file (based on above I'm thinking 10-100G).



            Looking at the output above, devid 1 appears to be the one you want to keep because it has more used space than devid 2. Running btrfs filesystem show can also give more information about which drive is missing (look for the devid that isn't listed or that has no device path next to it). Make sure that you've not mounted the btrfs as readonly, because otherwise you won't be able to do any writes to fix it. You could try first removing the device using btrfs device delete missing /mountpoint and if that doesn't work btrfs device remove 2 /mountpoint. If that fails, try converting blocks from RAID1 to single with btrfs balance -mconvert=single -sconvert=single -dconvert=single /mountpoint and then try the device removal again. If anything is successful, then you can add deviceA as a device and reconvert everything back to RAID1. And these convert commands can take a lot of time, so patience is required.






            share|improve this answer


























              up vote
              1
              down vote













              I'll start by giving the usual caveat that goes unheeded (I'm guilty too)... Backup your data NOW. Of course if you had enough freespace to backup your data, you would just recreate the filesystem, right? And keep in mind this is an art and hard to get right, which is why I suggest conversing with the guys on the btrfs IRC channel on irc.freenode.net.



              The first thing I'd try is to recover device A. This might be accomplished with btrfs rescue super-recover /dev/deviceA or btrfsck --repair /dev/deviceA. If either is successful, then you can wipe deviceB and add it as a new device (or perhaps replace if deviceA still thinks its raided).



              Otherwise, in a situation like this, I like to first use dm-snapshot to create a snapshot of the device and work on the snapshot, so I don't make bad things worse. Sometimes it takes me a few tries to get the sequence of btrfs command right. You'll need a lot of free space for the snapshot file (based on above I'm thinking 10-100G).



              Looking at the output above, devid 1 appears to be the one you want to keep because it has more used space than devid 2. Running btrfs filesystem show can also give more information about which drive is missing (look for the devid that isn't listed or that has no device path next to it). Make sure that you've not mounted the btrfs as readonly, because otherwise you won't be able to do any writes to fix it. You could try first removing the device using btrfs device delete missing /mountpoint and if that doesn't work btrfs device remove 2 /mountpoint. If that fails, try converting blocks from RAID1 to single with btrfs balance -mconvert=single -sconvert=single -dconvert=single /mountpoint and then try the device removal again. If anything is successful, then you can add deviceA as a device and reconvert everything back to RAID1. And these convert commands can take a lot of time, so patience is required.






              share|improve this answer
























                up vote
                1
                down vote










                up vote
                1
                down vote









                I'll start by giving the usual caveat that goes unheeded (I'm guilty too)... Backup your data NOW. Of course if you had enough freespace to backup your data, you would just recreate the filesystem, right? And keep in mind this is an art and hard to get right, which is why I suggest conversing with the guys on the btrfs IRC channel on irc.freenode.net.



                The first thing I'd try is to recover device A. This might be accomplished with btrfs rescue super-recover /dev/deviceA or btrfsck --repair /dev/deviceA. If either is successful, then you can wipe deviceB and add it as a new device (or perhaps replace if deviceA still thinks its raided).



                Otherwise, in a situation like this, I like to first use dm-snapshot to create a snapshot of the device and work on the snapshot, so I don't make bad things worse. Sometimes it takes me a few tries to get the sequence of btrfs command right. You'll need a lot of free space for the snapshot file (based on above I'm thinking 10-100G).



                Looking at the output above, devid 1 appears to be the one you want to keep because it has more used space than devid 2. Running btrfs filesystem show can also give more information about which drive is missing (look for the devid that isn't listed or that has no device path next to it). Make sure that you've not mounted the btrfs as readonly, because otherwise you won't be able to do any writes to fix it. You could try first removing the device using btrfs device delete missing /mountpoint and if that doesn't work btrfs device remove 2 /mountpoint. If that fails, try converting blocks from RAID1 to single with btrfs balance -mconvert=single -sconvert=single -dconvert=single /mountpoint and then try the device removal again. If anything is successful, then you can add deviceA as a device and reconvert everything back to RAID1. And these convert commands can take a lot of time, so patience is required.






                share|improve this answer














                I'll start by giving the usual caveat that goes unheeded (I'm guilty too)... Backup your data NOW. Of course if you had enough freespace to backup your data, you would just recreate the filesystem, right? And keep in mind this is an art and hard to get right, which is why I suggest conversing with the guys on the btrfs IRC channel on irc.freenode.net.



                The first thing I'd try is to recover device A. This might be accomplished with btrfs rescue super-recover /dev/deviceA or btrfsck --repair /dev/deviceA. If either is successful, then you can wipe deviceB and add it as a new device (or perhaps replace if deviceA still thinks its raided).



                Otherwise, in a situation like this, I like to first use dm-snapshot to create a snapshot of the device and work on the snapshot, so I don't make bad things worse. Sometimes it takes me a few tries to get the sequence of btrfs command right. You'll need a lot of free space for the snapshot file (based on above I'm thinking 10-100G).



                Looking at the output above, devid 1 appears to be the one you want to keep because it has more used space than devid 2. Running btrfs filesystem show can also give more information about which drive is missing (look for the devid that isn't listed or that has no device path next to it). Make sure that you've not mounted the btrfs as readonly, because otherwise you won't be able to do any writes to fix it. You could try first removing the device using btrfs device delete missing /mountpoint and if that doesn't work btrfs device remove 2 /mountpoint. If that fails, try converting blocks from RAID1 to single with btrfs balance -mconvert=single -sconvert=single -dconvert=single /mountpoint and then try the device removal again. If anything is successful, then you can add deviceA as a device and reconvert everything back to RAID1. And these convert commands can take a lot of time, so patience is required.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Jun 23 at 9:38

























                answered Jun 23 at 9:28









                crass

                963




                963






















                     

                    draft saved


                    draft discarded


























                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f436820%2fhow-to-recover-a-btrfs-filesystem-with-two-identical-devices%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    How to check contact read email or not when send email to Individual?

                    Bahrain

                    Postfix configuration issue with fips on centos 7; mailgun relay