Linux VM's disk went read-only = no choice but reboot?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
5
down vote

favorite
4












I have several Linux VMs on VMware + SAN.



What happened



A problem occured on the SAN (failed path) so that for some time, there were I/O errors on the Linux VMs drives. When the path failover had been done, it was too late: every Linux machine considered most of its drives as not "trustworthy" anymore, setting them as read-only devices. The root filesystem's drives were also impacted.



What I tried




  • mount -o rw,remount / without success,


  • echo running > /sys/block/sda/device/state without success,

  • dug into /sys to find a solution without success.

What I may have not tried



  • blockdev --setrw /dev/sda

Finally...



I had to reboot all my Linux VMs. The Windows VMs were fine...



Some more info from VMware...



The problem is described here. VMware suggests to increase the Linux scsi timeout to prevent this problem to happen.



The question!



However, when the problem does eventually happen, is there a way to get the drives back to read-write mode? (once the SAN is back to normal)










share|improve this question



















  • 2




    I have been able to get a disk back to read-write with mount -o remount /mountpoint on a real system. Perhaps that would work inside a VM too.
    – Michael Suelmann
    Nov 2 '13 at 17:08










  • Thanks, but I already tried that, and it didn't work... I've edited my question accordingly.
    – Totor
    Nov 2 '13 at 17:35














up vote
5
down vote

favorite
4












I have several Linux VMs on VMware + SAN.



What happened



A problem occured on the SAN (failed path) so that for some time, there were I/O errors on the Linux VMs drives. When the path failover had been done, it was too late: every Linux machine considered most of its drives as not "trustworthy" anymore, setting them as read-only devices. The root filesystem's drives were also impacted.



What I tried




  • mount -o rw,remount / without success,


  • echo running > /sys/block/sda/device/state without success,

  • dug into /sys to find a solution without success.

What I may have not tried



  • blockdev --setrw /dev/sda

Finally...



I had to reboot all my Linux VMs. The Windows VMs were fine...



Some more info from VMware...



The problem is described here. VMware suggests to increase the Linux scsi timeout to prevent this problem to happen.



The question!



However, when the problem does eventually happen, is there a way to get the drives back to read-write mode? (once the SAN is back to normal)










share|improve this question



















  • 2




    I have been able to get a disk back to read-write with mount -o remount /mountpoint on a real system. Perhaps that would work inside a VM too.
    – Michael Suelmann
    Nov 2 '13 at 17:08










  • Thanks, but I already tried that, and it didn't work... I've edited my question accordingly.
    – Totor
    Nov 2 '13 at 17:35












up vote
5
down vote

favorite
4









up vote
5
down vote

favorite
4






4





I have several Linux VMs on VMware + SAN.



What happened



A problem occured on the SAN (failed path) so that for some time, there were I/O errors on the Linux VMs drives. When the path failover had been done, it was too late: every Linux machine considered most of its drives as not "trustworthy" anymore, setting them as read-only devices. The root filesystem's drives were also impacted.



What I tried




  • mount -o rw,remount / without success,


  • echo running > /sys/block/sda/device/state without success,

  • dug into /sys to find a solution without success.

What I may have not tried



  • blockdev --setrw /dev/sda

Finally...



I had to reboot all my Linux VMs. The Windows VMs were fine...



Some more info from VMware...



The problem is described here. VMware suggests to increase the Linux scsi timeout to prevent this problem to happen.



The question!



However, when the problem does eventually happen, is there a way to get the drives back to read-write mode? (once the SAN is back to normal)










share|improve this question















I have several Linux VMs on VMware + SAN.



What happened



A problem occured on the SAN (failed path) so that for some time, there were I/O errors on the Linux VMs drives. When the path failover had been done, it was too late: every Linux machine considered most of its drives as not "trustworthy" anymore, setting them as read-only devices. The root filesystem's drives were also impacted.



What I tried




  • mount -o rw,remount / without success,


  • echo running > /sys/block/sda/device/state without success,

  • dug into /sys to find a solution without success.

What I may have not tried



  • blockdev --setrw /dev/sda

Finally...



I had to reboot all my Linux VMs. The Windows VMs were fine...



Some more info from VMware...



The problem is described here. VMware suggests to increase the Linux scsi timeout to prevent this problem to happen.



The question!



However, when the problem does eventually happen, is there a way to get the drives back to read-write mode? (once the SAN is back to normal)







linux vmware block-device reboot readonly






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 8 '14 at 16:08

























asked Nov 2 '13 at 16:18









Totor

8,091124779




8,091124779







  • 2




    I have been able to get a disk back to read-write with mount -o remount /mountpoint on a real system. Perhaps that would work inside a VM too.
    – Michael Suelmann
    Nov 2 '13 at 17:08










  • Thanks, but I already tried that, and it didn't work... I've edited my question accordingly.
    – Totor
    Nov 2 '13 at 17:35












  • 2




    I have been able to get a disk back to read-write with mount -o remount /mountpoint on a real system. Perhaps that would work inside a VM too.
    – Michael Suelmann
    Nov 2 '13 at 17:08










  • Thanks, but I already tried that, and it didn't work... I've edited my question accordingly.
    – Totor
    Nov 2 '13 at 17:35







2




2




I have been able to get a disk back to read-write with mount -o remount /mountpoint on a real system. Perhaps that would work inside a VM too.
– Michael Suelmann
Nov 2 '13 at 17:08




I have been able to get a disk back to read-write with mount -o remount /mountpoint on a real system. Perhaps that would work inside a VM too.
– Michael Suelmann
Nov 2 '13 at 17:08












Thanks, but I already tried that, and it didn't work... I've edited my question accordingly.
– Totor
Nov 2 '13 at 17:35




Thanks, but I already tried that, and it didn't work... I've edited my question accordingly.
– Totor
Nov 2 '13 at 17:35










4 Answers
4






active

oldest

votes

















up vote
1
down vote













We have had this problem here a couple of times, usually due to the network going down for an extended period. The problem is not that the file system is read-only but that the disk device itself is marked read-only. No option here other than reboot. Increasing the scsi timeout will work for transient glitches such as a path failover. Won't work well for a 15-minute network outage.






share|improve this answer




















  • Hmm, this is a Linux kernel limitation then. It deserves a bug report...
    – Totor
    Mar 20 '14 at 14:34










  • Did you try blockdev --setrw /dev/sda? I edited my question accordingly.
    – Totor
    Apr 8 '14 at 17:37










  • That's a new one to me. Hopefully I'll never see the problem again but I'll try this when it does happen. Thanks.
    – Doug O'Neal
    Apr 10 '14 at 12:34

















up vote
1
down vote













From the man of mount :



 errors=panic
Define the behavior when an error is encountered. (Either
ignore errors and just mark the filesystem erroneous and con‐
tinue, or remount the filesystem read-only, or panic and halt
the system.) The default is set in the filesystem superblock,
and can be changed using tune2fs(8).


So you should mount your VM with the continue option instead of remount-ro.



mount -o errors=continue
mount -o remount





share|improve this answer



























    up vote
    0
    down vote













    I've had this happen on a RHEL system when rebooting/re-configuring the attached SAN. What worked for me was to deactivate the volume group and LVM, and then reactivate it.



    vgchange -a n /vg_group_name
    lvchange -a n /lvm_group_name



    Then you must reactivate them.



    vgchange -a y /vg_group_name
    lvchange -a y /lvm_group_name



    Then just try and remount everything with a mount -a.






    share|improve this answer




















    • Probably doesn't work for the root / filesystem, which was the problematic fs in my case...
      – Totor
      Aug 10 '17 at 13:55










    • Not sure. Hopefully this will help someone fighting with production SAN issues like I have been.
      – G_Style
      Aug 10 '17 at 20:18

















    up vote
    0
    down vote













    Having run test cases using a test VM running on an NFS datastore that I've been intentionally disabling, I haven't found anything that worked. The blockdev command didn't work, and the vg / lv commands do refuse to work on a mounted root / system.



    At this point, the best option seems to be to set errors=panic in /etc/fstabso the VM just hard fails.






    share|improve this answer




















      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "106"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f98575%2flinux-vms-disk-went-read-only-no-choice-but-reboot%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote













      We have had this problem here a couple of times, usually due to the network going down for an extended period. The problem is not that the file system is read-only but that the disk device itself is marked read-only. No option here other than reboot. Increasing the scsi timeout will work for transient glitches such as a path failover. Won't work well for a 15-minute network outage.






      share|improve this answer




















      • Hmm, this is a Linux kernel limitation then. It deserves a bug report...
        – Totor
        Mar 20 '14 at 14:34










      • Did you try blockdev --setrw /dev/sda? I edited my question accordingly.
        – Totor
        Apr 8 '14 at 17:37










      • That's a new one to me. Hopefully I'll never see the problem again but I'll try this when it does happen. Thanks.
        – Doug O'Neal
        Apr 10 '14 at 12:34














      up vote
      1
      down vote













      We have had this problem here a couple of times, usually due to the network going down for an extended period. The problem is not that the file system is read-only but that the disk device itself is marked read-only. No option here other than reboot. Increasing the scsi timeout will work for transient glitches such as a path failover. Won't work well for a 15-minute network outage.






      share|improve this answer




















      • Hmm, this is a Linux kernel limitation then. It deserves a bug report...
        – Totor
        Mar 20 '14 at 14:34










      • Did you try blockdev --setrw /dev/sda? I edited my question accordingly.
        – Totor
        Apr 8 '14 at 17:37










      • That's a new one to me. Hopefully I'll never see the problem again but I'll try this when it does happen. Thanks.
        – Doug O'Neal
        Apr 10 '14 at 12:34












      up vote
      1
      down vote










      up vote
      1
      down vote









      We have had this problem here a couple of times, usually due to the network going down for an extended period. The problem is not that the file system is read-only but that the disk device itself is marked read-only. No option here other than reboot. Increasing the scsi timeout will work for transient glitches such as a path failover. Won't work well for a 15-minute network outage.






      share|improve this answer












      We have had this problem here a couple of times, usually due to the network going down for an extended period. The problem is not that the file system is read-only but that the disk device itself is marked read-only. No option here other than reboot. Increasing the scsi timeout will work for transient glitches such as a path failover. Won't work well for a 15-minute network outage.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Mar 19 '14 at 12:51









      Doug O'Neal

      2,8321817




      2,8321817











      • Hmm, this is a Linux kernel limitation then. It deserves a bug report...
        – Totor
        Mar 20 '14 at 14:34










      • Did you try blockdev --setrw /dev/sda? I edited my question accordingly.
        – Totor
        Apr 8 '14 at 17:37










      • That's a new one to me. Hopefully I'll never see the problem again but I'll try this when it does happen. Thanks.
        – Doug O'Neal
        Apr 10 '14 at 12:34
















      • Hmm, this is a Linux kernel limitation then. It deserves a bug report...
        – Totor
        Mar 20 '14 at 14:34










      • Did you try blockdev --setrw /dev/sda? I edited my question accordingly.
        – Totor
        Apr 8 '14 at 17:37










      • That's a new one to me. Hopefully I'll never see the problem again but I'll try this when it does happen. Thanks.
        – Doug O'Neal
        Apr 10 '14 at 12:34















      Hmm, this is a Linux kernel limitation then. It deserves a bug report...
      – Totor
      Mar 20 '14 at 14:34




      Hmm, this is a Linux kernel limitation then. It deserves a bug report...
      – Totor
      Mar 20 '14 at 14:34












      Did you try blockdev --setrw /dev/sda? I edited my question accordingly.
      – Totor
      Apr 8 '14 at 17:37




      Did you try blockdev --setrw /dev/sda? I edited my question accordingly.
      – Totor
      Apr 8 '14 at 17:37












      That's a new one to me. Hopefully I'll never see the problem again but I'll try this when it does happen. Thanks.
      – Doug O'Neal
      Apr 10 '14 at 12:34




      That's a new one to me. Hopefully I'll never see the problem again but I'll try this when it does happen. Thanks.
      – Doug O'Neal
      Apr 10 '14 at 12:34












      up vote
      1
      down vote













      From the man of mount :



       errors=panic
      Define the behavior when an error is encountered. (Either
      ignore errors and just mark the filesystem erroneous and con‐
      tinue, or remount the filesystem read-only, or panic and halt
      the system.) The default is set in the filesystem superblock,
      and can be changed using tune2fs(8).


      So you should mount your VM with the continue option instead of remount-ro.



      mount -o errors=continue
      mount -o remount





      share|improve this answer
























        up vote
        1
        down vote













        From the man of mount :



         errors=panic
        Define the behavior when an error is encountered. (Either
        ignore errors and just mark the filesystem erroneous and con‐
        tinue, or remount the filesystem read-only, or panic and halt
        the system.) The default is set in the filesystem superblock,
        and can be changed using tune2fs(8).


        So you should mount your VM with the continue option instead of remount-ro.



        mount -o errors=continue
        mount -o remount





        share|improve this answer






















          up vote
          1
          down vote










          up vote
          1
          down vote









          From the man of mount :



           errors=panic
          Define the behavior when an error is encountered. (Either
          ignore errors and just mark the filesystem erroneous and con‐
          tinue, or remount the filesystem read-only, or panic and halt
          the system.) The default is set in the filesystem superblock,
          and can be changed using tune2fs(8).


          So you should mount your VM with the continue option instead of remount-ro.



          mount -o errors=continue
          mount -o remount





          share|improve this answer












          From the man of mount :



           errors=panic
          Define the behavior when an error is encountered. (Either
          ignore errors and just mark the filesystem erroneous and con‐
          tinue, or remount the filesystem read-only, or panic and halt
          the system.) The default is set in the filesystem superblock,
          and can be changed using tune2fs(8).


          So you should mount your VM with the continue option instead of remount-ro.



          mount -o errors=continue
          mount -o remount






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jul 6 '17 at 9:19









          Pierre.Sassoulas

          1115




          1115




















              up vote
              0
              down vote













              I've had this happen on a RHEL system when rebooting/re-configuring the attached SAN. What worked for me was to deactivate the volume group and LVM, and then reactivate it.



              vgchange -a n /vg_group_name
              lvchange -a n /lvm_group_name



              Then you must reactivate them.



              vgchange -a y /vg_group_name
              lvchange -a y /lvm_group_name



              Then just try and remount everything with a mount -a.






              share|improve this answer




















              • Probably doesn't work for the root / filesystem, which was the problematic fs in my case...
                – Totor
                Aug 10 '17 at 13:55










              • Not sure. Hopefully this will help someone fighting with production SAN issues like I have been.
                – G_Style
                Aug 10 '17 at 20:18














              up vote
              0
              down vote













              I've had this happen on a RHEL system when rebooting/re-configuring the attached SAN. What worked for me was to deactivate the volume group and LVM, and then reactivate it.



              vgchange -a n /vg_group_name
              lvchange -a n /lvm_group_name



              Then you must reactivate them.



              vgchange -a y /vg_group_name
              lvchange -a y /lvm_group_name



              Then just try and remount everything with a mount -a.






              share|improve this answer




















              • Probably doesn't work for the root / filesystem, which was the problematic fs in my case...
                – Totor
                Aug 10 '17 at 13:55










              • Not sure. Hopefully this will help someone fighting with production SAN issues like I have been.
                – G_Style
                Aug 10 '17 at 20:18












              up vote
              0
              down vote










              up vote
              0
              down vote









              I've had this happen on a RHEL system when rebooting/re-configuring the attached SAN. What worked for me was to deactivate the volume group and LVM, and then reactivate it.



              vgchange -a n /vg_group_name
              lvchange -a n /lvm_group_name



              Then you must reactivate them.



              vgchange -a y /vg_group_name
              lvchange -a y /lvm_group_name



              Then just try and remount everything with a mount -a.






              share|improve this answer












              I've had this happen on a RHEL system when rebooting/re-configuring the attached SAN. What worked for me was to deactivate the volume group and LVM, and then reactivate it.



              vgchange -a n /vg_group_name
              lvchange -a n /lvm_group_name



              Then you must reactivate them.



              vgchange -a y /vg_group_name
              lvchange -a y /lvm_group_name



              Then just try and remount everything with a mount -a.







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Aug 4 '17 at 15:58









              G_Style

              314




              314











              • Probably doesn't work for the root / filesystem, which was the problematic fs in my case...
                – Totor
                Aug 10 '17 at 13:55










              • Not sure. Hopefully this will help someone fighting with production SAN issues like I have been.
                – G_Style
                Aug 10 '17 at 20:18
















              • Probably doesn't work for the root / filesystem, which was the problematic fs in my case...
                – Totor
                Aug 10 '17 at 13:55










              • Not sure. Hopefully this will help someone fighting with production SAN issues like I have been.
                – G_Style
                Aug 10 '17 at 20:18















              Probably doesn't work for the root / filesystem, which was the problematic fs in my case...
              – Totor
              Aug 10 '17 at 13:55




              Probably doesn't work for the root / filesystem, which was the problematic fs in my case...
              – Totor
              Aug 10 '17 at 13:55












              Not sure. Hopefully this will help someone fighting with production SAN issues like I have been.
              – G_Style
              Aug 10 '17 at 20:18




              Not sure. Hopefully this will help someone fighting with production SAN issues like I have been.
              – G_Style
              Aug 10 '17 at 20:18










              up vote
              0
              down vote













              Having run test cases using a test VM running on an NFS datastore that I've been intentionally disabling, I haven't found anything that worked. The blockdev command didn't work, and the vg / lv commands do refuse to work on a mounted root / system.



              At this point, the best option seems to be to set errors=panic in /etc/fstabso the VM just hard fails.






              share|improve this answer
























                up vote
                0
                down vote













                Having run test cases using a test VM running on an NFS datastore that I've been intentionally disabling, I haven't found anything that worked. The blockdev command didn't work, and the vg / lv commands do refuse to work on a mounted root / system.



                At this point, the best option seems to be to set errors=panic in /etc/fstabso the VM just hard fails.






                share|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Having run test cases using a test VM running on an NFS datastore that I've been intentionally disabling, I haven't found anything that worked. The blockdev command didn't work, and the vg / lv commands do refuse to work on a mounted root / system.



                  At this point, the best option seems to be to set errors=panic in /etc/fstabso the VM just hard fails.






                  share|improve this answer












                  Having run test cases using a test VM running on an NFS datastore that I've been intentionally disabling, I haven't found anything that worked. The blockdev command didn't work, and the vg / lv commands do refuse to work on a mounted root / system.



                  At this point, the best option seems to be to set errors=panic in /etc/fstabso the VM just hard fails.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 19 at 16:20









                  gerardw

                  1011




                  1011



























                       

                      draft saved


                      draft discarded















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f98575%2flinux-vms-disk-went-read-only-no-choice-but-reboot%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown






                      Popular posts from this blog

                      How to check contact read email or not when send email to Individual?

                      Displaying single band from multi-band raster using QGIS

                      How many registers does an x86_64 CPU actually have?