Why isn't a disconnected and then reconnected mdadm RAID1 disk automatically re-attached and re-sync?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












1















I'm doing the following test with Debian 9 under Hyper-V: I have two (virtual) disks, each with one partition, and /dev/md0 defined as RAID1 of /dev/sda1 and /dev/sdb1. The root EXT4 file system defined on md0 works well.



Then I reboot the machine removing the sdb disk, by deleting the virtual hardware. Everything works fine, I get an email saying that the array is now clean but degraded:



md0 : active raid1 sda1[0]
4877312 blocks super 1.2 [2/1] [U_]


Ok, as expected. I re-attach sdb, creating a new virtual disk using the same disk file and reboot the machine but the system doesn't seem to re-detect the disk. Nothing is changed, the array has still 1 drive and is in clean, degraded state. mdadm --detail /dev/md0 still reports the disk as removed.



I expected the disk to be re-detected, re-attached and re-sync automatically the next boot, since the uuid and disk name match.



I re-added the disk manually, mdadm --manage /dev/md0 --add /dev/sdb1, the system syncs it and the array goes back to clean.



Is this the way the system is supposed to work?



PS: I have a significant number of mdadm: Found some drive for an array that is already active messages at boot. mdadm: giving up.










share|improve this question




























    1















    I'm doing the following test with Debian 9 under Hyper-V: I have two (virtual) disks, each with one partition, and /dev/md0 defined as RAID1 of /dev/sda1 and /dev/sdb1. The root EXT4 file system defined on md0 works well.



    Then I reboot the machine removing the sdb disk, by deleting the virtual hardware. Everything works fine, I get an email saying that the array is now clean but degraded:



    md0 : active raid1 sda1[0]
    4877312 blocks super 1.2 [2/1] [U_]


    Ok, as expected. I re-attach sdb, creating a new virtual disk using the same disk file and reboot the machine but the system doesn't seem to re-detect the disk. Nothing is changed, the array has still 1 drive and is in clean, degraded state. mdadm --detail /dev/md0 still reports the disk as removed.



    I expected the disk to be re-detected, re-attached and re-sync automatically the next boot, since the uuid and disk name match.



    I re-added the disk manually, mdadm --manage /dev/md0 --add /dev/sdb1, the system syncs it and the array goes back to clean.



    Is this the way the system is supposed to work?



    PS: I have a significant number of mdadm: Found some drive for an array that is already active messages at boot. mdadm: giving up.










    share|improve this question


























      1












      1








      1


      0






      I'm doing the following test with Debian 9 under Hyper-V: I have two (virtual) disks, each with one partition, and /dev/md0 defined as RAID1 of /dev/sda1 and /dev/sdb1. The root EXT4 file system defined on md0 works well.



      Then I reboot the machine removing the sdb disk, by deleting the virtual hardware. Everything works fine, I get an email saying that the array is now clean but degraded:



      md0 : active raid1 sda1[0]
      4877312 blocks super 1.2 [2/1] [U_]


      Ok, as expected. I re-attach sdb, creating a new virtual disk using the same disk file and reboot the machine but the system doesn't seem to re-detect the disk. Nothing is changed, the array has still 1 drive and is in clean, degraded state. mdadm --detail /dev/md0 still reports the disk as removed.



      I expected the disk to be re-detected, re-attached and re-sync automatically the next boot, since the uuid and disk name match.



      I re-added the disk manually, mdadm --manage /dev/md0 --add /dev/sdb1, the system syncs it and the array goes back to clean.



      Is this the way the system is supposed to work?



      PS: I have a significant number of mdadm: Found some drive for an array that is already active messages at boot. mdadm: giving up.










      share|improve this question
















      I'm doing the following test with Debian 9 under Hyper-V: I have two (virtual) disks, each with one partition, and /dev/md0 defined as RAID1 of /dev/sda1 and /dev/sdb1. The root EXT4 file system defined on md0 works well.



      Then I reboot the machine removing the sdb disk, by deleting the virtual hardware. Everything works fine, I get an email saying that the array is now clean but degraded:



      md0 : active raid1 sda1[0]
      4877312 blocks super 1.2 [2/1] [U_]


      Ok, as expected. I re-attach sdb, creating a new virtual disk using the same disk file and reboot the machine but the system doesn't seem to re-detect the disk. Nothing is changed, the array has still 1 drive and is in clean, degraded state. mdadm --detail /dev/md0 still reports the disk as removed.



      I expected the disk to be re-detected, re-attached and re-sync automatically the next boot, since the uuid and disk name match.



      I re-added the disk manually, mdadm --manage /dev/md0 --add /dev/sdb1, the system syncs it and the array goes back to clean.



      Is this the way the system is supposed to work?



      PS: I have a significant number of mdadm: Found some drive for an array that is already active messages at boot. mdadm: giving up.







      debian mdadm






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Feb 6 at 8:05







      ragazzojp

















      asked Feb 5 at 17:59









      ragazzojpragazzojp

      84




      84




















          1 Answer
          1






          active

          oldest

          votes


















          1














          If a RAID disk just vanishes suddenly (as far as the OS is concerned), causing the RAID array to become degraded, and then comes back, is it because the system administrator intentionally pulled the disk and then reinserted it? Or perhaps because there is an intermittent connection somewhere, perhaps a bad cable or a loose connection?



          If the system had a way of knowing that the removal and restoration was intentional, it could automatically pick it up. But a software RAID has no such knowledge, so it assumes the worst and acts as if the disk or its connection has become unreliable for some reason, until told otherwise.



          A hardware RAID controller with hot-pluggable disks may have extra circuitry to detect when a disk in a particular slot has been physically removed and replaced, and may work with an extra assumption that all the disks in slots associated with that RAID controller are always supposed to be RAID disks.



          So when a disk vanishes and a hot-plug monitor circuit indicates the disk was physically removed, the controller can check any disk subsequently plugged into that same slot for its own type of RAID metadata. If there is no metadata present, it's presumably a new disk fresh from the factory, and can be freely overwritten as soon as the hot-plug monitor circuit indicates the disk is fully slotted in. Likewise, if the metadata indicates it's the same disk being re-inserted, the RAID set can be automatically recovered.



          If the metadata is present and indicates that the disk used to belong to a different RAID set, the system administrator might be setting up to rescue data from a different server that uses the same type of RAID controller, and so it will be best to wait for further instructions: the administrator will decide whether to overwrite the disk or to import its RAID set as another RAID volume with its existing data.



          But if a disk vanishes while the hot-plug monitor circuit indicates it's still physically present, the hardware RAID controller will have a very good case for declaring it faulty, even if it later re-appears on its own.



          (An important side lesson: If you are moving hardware RAID disks from a failed server to another similar server to salvage the failed server's data, make sure the recipient server does not have any of its own RAID sets in a degraded state before plugging in the disk containing the only copy of some critical data. )






          share|improve this answer























          • Ok, so you're saying that this is the intended behavior, and even the weekly cron job won't reattach it.

            – ragazzojp
            Feb 6 at 12:46











          • Yes. It's the difference between having the right one bit of information, or not having it.

            – telcoM
            Feb 6 at 15:02










          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498868%2fwhy-isnt-a-disconnected-and-then-reconnected-mdadm-raid1-disk-automatically-re%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          If a RAID disk just vanishes suddenly (as far as the OS is concerned), causing the RAID array to become degraded, and then comes back, is it because the system administrator intentionally pulled the disk and then reinserted it? Or perhaps because there is an intermittent connection somewhere, perhaps a bad cable or a loose connection?



          If the system had a way of knowing that the removal and restoration was intentional, it could automatically pick it up. But a software RAID has no such knowledge, so it assumes the worst and acts as if the disk or its connection has become unreliable for some reason, until told otherwise.



          A hardware RAID controller with hot-pluggable disks may have extra circuitry to detect when a disk in a particular slot has been physically removed and replaced, and may work with an extra assumption that all the disks in slots associated with that RAID controller are always supposed to be RAID disks.



          So when a disk vanishes and a hot-plug monitor circuit indicates the disk was physically removed, the controller can check any disk subsequently plugged into that same slot for its own type of RAID metadata. If there is no metadata present, it's presumably a new disk fresh from the factory, and can be freely overwritten as soon as the hot-plug monitor circuit indicates the disk is fully slotted in. Likewise, if the metadata indicates it's the same disk being re-inserted, the RAID set can be automatically recovered.



          If the metadata is present and indicates that the disk used to belong to a different RAID set, the system administrator might be setting up to rescue data from a different server that uses the same type of RAID controller, and so it will be best to wait for further instructions: the administrator will decide whether to overwrite the disk or to import its RAID set as another RAID volume with its existing data.



          But if a disk vanishes while the hot-plug monitor circuit indicates it's still physically present, the hardware RAID controller will have a very good case for declaring it faulty, even if it later re-appears on its own.



          (An important side lesson: If you are moving hardware RAID disks from a failed server to another similar server to salvage the failed server's data, make sure the recipient server does not have any of its own RAID sets in a degraded state before plugging in the disk containing the only copy of some critical data. )






          share|improve this answer























          • Ok, so you're saying that this is the intended behavior, and even the weekly cron job won't reattach it.

            – ragazzojp
            Feb 6 at 12:46











          • Yes. It's the difference between having the right one bit of information, or not having it.

            – telcoM
            Feb 6 at 15:02















          1














          If a RAID disk just vanishes suddenly (as far as the OS is concerned), causing the RAID array to become degraded, and then comes back, is it because the system administrator intentionally pulled the disk and then reinserted it? Or perhaps because there is an intermittent connection somewhere, perhaps a bad cable or a loose connection?



          If the system had a way of knowing that the removal and restoration was intentional, it could automatically pick it up. But a software RAID has no such knowledge, so it assumes the worst and acts as if the disk or its connection has become unreliable for some reason, until told otherwise.



          A hardware RAID controller with hot-pluggable disks may have extra circuitry to detect when a disk in a particular slot has been physically removed and replaced, and may work with an extra assumption that all the disks in slots associated with that RAID controller are always supposed to be RAID disks.



          So when a disk vanishes and a hot-plug monitor circuit indicates the disk was physically removed, the controller can check any disk subsequently plugged into that same slot for its own type of RAID metadata. If there is no metadata present, it's presumably a new disk fresh from the factory, and can be freely overwritten as soon as the hot-plug monitor circuit indicates the disk is fully slotted in. Likewise, if the metadata indicates it's the same disk being re-inserted, the RAID set can be automatically recovered.



          If the metadata is present and indicates that the disk used to belong to a different RAID set, the system administrator might be setting up to rescue data from a different server that uses the same type of RAID controller, and so it will be best to wait for further instructions: the administrator will decide whether to overwrite the disk or to import its RAID set as another RAID volume with its existing data.



          But if a disk vanishes while the hot-plug monitor circuit indicates it's still physically present, the hardware RAID controller will have a very good case for declaring it faulty, even if it later re-appears on its own.



          (An important side lesson: If you are moving hardware RAID disks from a failed server to another similar server to salvage the failed server's data, make sure the recipient server does not have any of its own RAID sets in a degraded state before plugging in the disk containing the only copy of some critical data. )






          share|improve this answer























          • Ok, so you're saying that this is the intended behavior, and even the weekly cron job won't reattach it.

            – ragazzojp
            Feb 6 at 12:46











          • Yes. It's the difference between having the right one bit of information, or not having it.

            – telcoM
            Feb 6 at 15:02













          1












          1








          1







          If a RAID disk just vanishes suddenly (as far as the OS is concerned), causing the RAID array to become degraded, and then comes back, is it because the system administrator intentionally pulled the disk and then reinserted it? Or perhaps because there is an intermittent connection somewhere, perhaps a bad cable or a loose connection?



          If the system had a way of knowing that the removal and restoration was intentional, it could automatically pick it up. But a software RAID has no such knowledge, so it assumes the worst and acts as if the disk or its connection has become unreliable for some reason, until told otherwise.



          A hardware RAID controller with hot-pluggable disks may have extra circuitry to detect when a disk in a particular slot has been physically removed and replaced, and may work with an extra assumption that all the disks in slots associated with that RAID controller are always supposed to be RAID disks.



          So when a disk vanishes and a hot-plug monitor circuit indicates the disk was physically removed, the controller can check any disk subsequently plugged into that same slot for its own type of RAID metadata. If there is no metadata present, it's presumably a new disk fresh from the factory, and can be freely overwritten as soon as the hot-plug monitor circuit indicates the disk is fully slotted in. Likewise, if the metadata indicates it's the same disk being re-inserted, the RAID set can be automatically recovered.



          If the metadata is present and indicates that the disk used to belong to a different RAID set, the system administrator might be setting up to rescue data from a different server that uses the same type of RAID controller, and so it will be best to wait for further instructions: the administrator will decide whether to overwrite the disk or to import its RAID set as another RAID volume with its existing data.



          But if a disk vanishes while the hot-plug monitor circuit indicates it's still physically present, the hardware RAID controller will have a very good case for declaring it faulty, even if it later re-appears on its own.



          (An important side lesson: If you are moving hardware RAID disks from a failed server to another similar server to salvage the failed server's data, make sure the recipient server does not have any of its own RAID sets in a degraded state before plugging in the disk containing the only copy of some critical data. )






          share|improve this answer













          If a RAID disk just vanishes suddenly (as far as the OS is concerned), causing the RAID array to become degraded, and then comes back, is it because the system administrator intentionally pulled the disk and then reinserted it? Or perhaps because there is an intermittent connection somewhere, perhaps a bad cable or a loose connection?



          If the system had a way of knowing that the removal and restoration was intentional, it could automatically pick it up. But a software RAID has no such knowledge, so it assumes the worst and acts as if the disk or its connection has become unreliable for some reason, until told otherwise.



          A hardware RAID controller with hot-pluggable disks may have extra circuitry to detect when a disk in a particular slot has been physically removed and replaced, and may work with an extra assumption that all the disks in slots associated with that RAID controller are always supposed to be RAID disks.



          So when a disk vanishes and a hot-plug monitor circuit indicates the disk was physically removed, the controller can check any disk subsequently plugged into that same slot for its own type of RAID metadata. If there is no metadata present, it's presumably a new disk fresh from the factory, and can be freely overwritten as soon as the hot-plug monitor circuit indicates the disk is fully slotted in. Likewise, if the metadata indicates it's the same disk being re-inserted, the RAID set can be automatically recovered.



          If the metadata is present and indicates that the disk used to belong to a different RAID set, the system administrator might be setting up to rescue data from a different server that uses the same type of RAID controller, and so it will be best to wait for further instructions: the administrator will decide whether to overwrite the disk or to import its RAID set as another RAID volume with its existing data.



          But if a disk vanishes while the hot-plug monitor circuit indicates it's still physically present, the hardware RAID controller will have a very good case for declaring it faulty, even if it later re-appears on its own.



          (An important side lesson: If you are moving hardware RAID disks from a failed server to another similar server to salvage the failed server's data, make sure the recipient server does not have any of its own RAID sets in a degraded state before plugging in the disk containing the only copy of some critical data. )







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Feb 6 at 11:41









          telcoMtelcoM

          18.5k12347




          18.5k12347












          • Ok, so you're saying that this is the intended behavior, and even the weekly cron job won't reattach it.

            – ragazzojp
            Feb 6 at 12:46











          • Yes. It's the difference between having the right one bit of information, or not having it.

            – telcoM
            Feb 6 at 15:02

















          • Ok, so you're saying that this is the intended behavior, and even the weekly cron job won't reattach it.

            – ragazzojp
            Feb 6 at 12:46











          • Yes. It's the difference between having the right one bit of information, or not having it.

            – telcoM
            Feb 6 at 15:02
















          Ok, so you're saying that this is the intended behavior, and even the weekly cron job won't reattach it.

          – ragazzojp
          Feb 6 at 12:46





          Ok, so you're saying that this is the intended behavior, and even the weekly cron job won't reattach it.

          – ragazzojp
          Feb 6 at 12:46













          Yes. It's the difference between having the right one bit of information, or not having it.

          – telcoM
          Feb 6 at 15:02





          Yes. It's the difference between having the right one bit of information, or not having it.

          – telcoM
          Feb 6 at 15:02

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498868%2fwhy-isnt-a-disconnected-and-then-reconnected-mdadm-raid1-disk-automatically-re%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown






          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Bahrain

          Postfix configuration issue with fips on centos 7; mailgun relay