Raid array 'clean, degraded'?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












2














Today I noticed that there are bunch of messages complaining about the RAID array (it's a software RAID10), so I started looking into it but need help because I'm unsure if I interpret the status output correctly (I've kinda forgotten the actual RAID set-up because the machine is at a remote location and I configured it about a year or two ago)... if I remember correctly the system was suppose to have 8x 2TB disks, but that's about all I can remember.



System mail:



 N 14 root@edmedia.loca Wed May 25 21:30 32/1059 Fail event on /dev/md/0:EDMedia
N 15 root@edmedia.loca Thu May 26 06:25 30/1025 DegradedArray event on /dev/md/0:EDMedia
N 16 root@edmedia.loca Thu May 26 06:25 30/1025 SparesMissing event on /dev/md/0:EDMedia


The bit that's specifically confusing me, now that I'm looking at the outputs, is this:



Number Major Minor RaidDevice State
0 0 0 0 removed


Does it mean that a disk has been removed (or that it dropped from the array)? Should I try re-adding '/dev/sda1' to it? And is there any way I can tell that '/dev/sda1' was part of '/dev/md0' without adding a partitioned disk in-use by something, only to make things worse?




Status outputs:




'mdadm -D /dev/md0' output:



/dev/md0:
Version : 1.2
Creation Time : Mon Feb 8 23:15:33 2016
Raid Level : raid10
Array Size : 2197509120 (2095.71 GiB 2250.25 GB)
Used Dev Size : 1465006080 (1397.14 GiB 1500.17 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Thu Sep 1 19:54:05 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Layout : near=2
Chunk Size : 512K

Name : EDMEDIA:0
UUID : 6ebf98c8:d52a13f0:7ab1bffb:4dbe22b6
Events : 4963861

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1


'lsblk' output:



NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.4T 0 disk
└─sda1 8:1 0 1.4T 0 part
sdb 8:16 0 1.4T 0 disk
└─sdb1 8:17 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdc 8:32 0 1.4T 0 disk
└─sdc1 8:33 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdd 8:48 0 1.4T 0 disk
└─sdd1 8:49 0 1.4T 0 part
sdj 8:144 0 298.1G 0 disk
└─sdj1 8:145 0 298.1G 0 part
sr0 11:0 1 1024M 0 rom


'df' output:



Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 2146148144 1235118212 801988884 61% /
udev 10240 0 10240 0% /dev
tmpfs 1637644 17124 1620520 2% /run
tmpfs 4094104 0 4094104 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 4094104 0 4094104 0% /sys/fs/cgroup
/dev/md0p2 242446 34463 195465 15% /boot


'watch -n1 cat /proc/mdstat' output:



Every 1.0s: cat /proc/mdstat Thu Sep 1 21:26:22 2016

Personalities : [raid10]
md0 : active raid10 sdb1[1] sdc1[2]
2197509120 blocks super 1.2 512K chunks 2 near-copies [3/2] [_UU]
bitmap: 16/17 pages [64KB], 65536KB chunk

unused devices: <none>









share|improve this question























  • Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
    – Kārlis K.
    Sep 1 '16 at 17:50










  • Maybe worth adding the relevant contents from /proc/mdstat as well.
    – Stephen Harris
    Sep 1 '16 at 18:09










  • Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
    – Kārlis K.
    Sep 1 '16 at 18:26










  • /etc/mdadm.conf or /etc/mdadm/mdadm.conf could be also helpful.
    – rudimeier
    Sep 1 '16 at 18:53










  • clean means there were no pending writes when the array was shut down. degraded means the array is missing at least one component.
    – Mark
    Feb 15 '17 at 21:30















2














Today I noticed that there are bunch of messages complaining about the RAID array (it's a software RAID10), so I started looking into it but need help because I'm unsure if I interpret the status output correctly (I've kinda forgotten the actual RAID set-up because the machine is at a remote location and I configured it about a year or two ago)... if I remember correctly the system was suppose to have 8x 2TB disks, but that's about all I can remember.



System mail:



 N 14 root@edmedia.loca Wed May 25 21:30 32/1059 Fail event on /dev/md/0:EDMedia
N 15 root@edmedia.loca Thu May 26 06:25 30/1025 DegradedArray event on /dev/md/0:EDMedia
N 16 root@edmedia.loca Thu May 26 06:25 30/1025 SparesMissing event on /dev/md/0:EDMedia


The bit that's specifically confusing me, now that I'm looking at the outputs, is this:



Number Major Minor RaidDevice State
0 0 0 0 removed


Does it mean that a disk has been removed (or that it dropped from the array)? Should I try re-adding '/dev/sda1' to it? And is there any way I can tell that '/dev/sda1' was part of '/dev/md0' without adding a partitioned disk in-use by something, only to make things worse?




Status outputs:




'mdadm -D /dev/md0' output:



/dev/md0:
Version : 1.2
Creation Time : Mon Feb 8 23:15:33 2016
Raid Level : raid10
Array Size : 2197509120 (2095.71 GiB 2250.25 GB)
Used Dev Size : 1465006080 (1397.14 GiB 1500.17 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Thu Sep 1 19:54:05 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Layout : near=2
Chunk Size : 512K

Name : EDMEDIA:0
UUID : 6ebf98c8:d52a13f0:7ab1bffb:4dbe22b6
Events : 4963861

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1


'lsblk' output:



NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.4T 0 disk
└─sda1 8:1 0 1.4T 0 part
sdb 8:16 0 1.4T 0 disk
└─sdb1 8:17 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdc 8:32 0 1.4T 0 disk
└─sdc1 8:33 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdd 8:48 0 1.4T 0 disk
└─sdd1 8:49 0 1.4T 0 part
sdj 8:144 0 298.1G 0 disk
└─sdj1 8:145 0 298.1G 0 part
sr0 11:0 1 1024M 0 rom


'df' output:



Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 2146148144 1235118212 801988884 61% /
udev 10240 0 10240 0% /dev
tmpfs 1637644 17124 1620520 2% /run
tmpfs 4094104 0 4094104 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 4094104 0 4094104 0% /sys/fs/cgroup
/dev/md0p2 242446 34463 195465 15% /boot


'watch -n1 cat /proc/mdstat' output:



Every 1.0s: cat /proc/mdstat Thu Sep 1 21:26:22 2016

Personalities : [raid10]
md0 : active raid10 sdb1[1] sdc1[2]
2197509120 blocks super 1.2 512K chunks 2 near-copies [3/2] [_UU]
bitmap: 16/17 pages [64KB], 65536KB chunk

unused devices: <none>









share|improve this question























  • Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
    – Kārlis K.
    Sep 1 '16 at 17:50










  • Maybe worth adding the relevant contents from /proc/mdstat as well.
    – Stephen Harris
    Sep 1 '16 at 18:09










  • Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
    – Kārlis K.
    Sep 1 '16 at 18:26










  • /etc/mdadm.conf or /etc/mdadm/mdadm.conf could be also helpful.
    – rudimeier
    Sep 1 '16 at 18:53










  • clean means there were no pending writes when the array was shut down. degraded means the array is missing at least one component.
    – Mark
    Feb 15 '17 at 21:30













2












2








2







Today I noticed that there are bunch of messages complaining about the RAID array (it's a software RAID10), so I started looking into it but need help because I'm unsure if I interpret the status output correctly (I've kinda forgotten the actual RAID set-up because the machine is at a remote location and I configured it about a year or two ago)... if I remember correctly the system was suppose to have 8x 2TB disks, but that's about all I can remember.



System mail:



 N 14 root@edmedia.loca Wed May 25 21:30 32/1059 Fail event on /dev/md/0:EDMedia
N 15 root@edmedia.loca Thu May 26 06:25 30/1025 DegradedArray event on /dev/md/0:EDMedia
N 16 root@edmedia.loca Thu May 26 06:25 30/1025 SparesMissing event on /dev/md/0:EDMedia


The bit that's specifically confusing me, now that I'm looking at the outputs, is this:



Number Major Minor RaidDevice State
0 0 0 0 removed


Does it mean that a disk has been removed (or that it dropped from the array)? Should I try re-adding '/dev/sda1' to it? And is there any way I can tell that '/dev/sda1' was part of '/dev/md0' without adding a partitioned disk in-use by something, only to make things worse?




Status outputs:




'mdadm -D /dev/md0' output:



/dev/md0:
Version : 1.2
Creation Time : Mon Feb 8 23:15:33 2016
Raid Level : raid10
Array Size : 2197509120 (2095.71 GiB 2250.25 GB)
Used Dev Size : 1465006080 (1397.14 GiB 1500.17 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Thu Sep 1 19:54:05 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Layout : near=2
Chunk Size : 512K

Name : EDMEDIA:0
UUID : 6ebf98c8:d52a13f0:7ab1bffb:4dbe22b6
Events : 4963861

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1


'lsblk' output:



NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.4T 0 disk
└─sda1 8:1 0 1.4T 0 part
sdb 8:16 0 1.4T 0 disk
└─sdb1 8:17 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdc 8:32 0 1.4T 0 disk
└─sdc1 8:33 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdd 8:48 0 1.4T 0 disk
└─sdd1 8:49 0 1.4T 0 part
sdj 8:144 0 298.1G 0 disk
└─sdj1 8:145 0 298.1G 0 part
sr0 11:0 1 1024M 0 rom


'df' output:



Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 2146148144 1235118212 801988884 61% /
udev 10240 0 10240 0% /dev
tmpfs 1637644 17124 1620520 2% /run
tmpfs 4094104 0 4094104 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 4094104 0 4094104 0% /sys/fs/cgroup
/dev/md0p2 242446 34463 195465 15% /boot


'watch -n1 cat /proc/mdstat' output:



Every 1.0s: cat /proc/mdstat Thu Sep 1 21:26:22 2016

Personalities : [raid10]
md0 : active raid10 sdb1[1] sdc1[2]
2197509120 blocks super 1.2 512K chunks 2 near-copies [3/2] [_UU]
bitmap: 16/17 pages [64KB], 65536KB chunk

unused devices: <none>









share|improve this question















Today I noticed that there are bunch of messages complaining about the RAID array (it's a software RAID10), so I started looking into it but need help because I'm unsure if I interpret the status output correctly (I've kinda forgotten the actual RAID set-up because the machine is at a remote location and I configured it about a year or two ago)... if I remember correctly the system was suppose to have 8x 2TB disks, but that's about all I can remember.



System mail:



 N 14 root@edmedia.loca Wed May 25 21:30 32/1059 Fail event on /dev/md/0:EDMedia
N 15 root@edmedia.loca Thu May 26 06:25 30/1025 DegradedArray event on /dev/md/0:EDMedia
N 16 root@edmedia.loca Thu May 26 06:25 30/1025 SparesMissing event on /dev/md/0:EDMedia


The bit that's specifically confusing me, now that I'm looking at the outputs, is this:



Number Major Minor RaidDevice State
0 0 0 0 removed


Does it mean that a disk has been removed (or that it dropped from the array)? Should I try re-adding '/dev/sda1' to it? And is there any way I can tell that '/dev/sda1' was part of '/dev/md0' without adding a partitioned disk in-use by something, only to make things worse?




Status outputs:




'mdadm -D /dev/md0' output:



/dev/md0:
Version : 1.2
Creation Time : Mon Feb 8 23:15:33 2016
Raid Level : raid10
Array Size : 2197509120 (2095.71 GiB 2250.25 GB)
Used Dev Size : 1465006080 (1397.14 GiB 1500.17 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Thu Sep 1 19:54:05 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Layout : near=2
Chunk Size : 512K

Name : EDMEDIA:0
UUID : 6ebf98c8:d52a13f0:7ab1bffb:4dbe22b6
Events : 4963861

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1


'lsblk' output:



NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.4T 0 disk
└─sda1 8:1 0 1.4T 0 part
sdb 8:16 0 1.4T 0 disk
└─sdb1 8:17 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdc 8:32 0 1.4T 0 disk
└─sdc1 8:33 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdd 8:48 0 1.4T 0 disk
└─sdd1 8:49 0 1.4T 0 part
sdj 8:144 0 298.1G 0 disk
└─sdj1 8:145 0 298.1G 0 part
sr0 11:0 1 1024M 0 rom


'df' output:



Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 2146148144 1235118212 801988884 61% /
udev 10240 0 10240 0% /dev
tmpfs 1637644 17124 1620520 2% /run
tmpfs 4094104 0 4094104 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 4094104 0 4094104 0% /sys/fs/cgroup
/dev/md0p2 242446 34463 195465 15% /boot


'watch -n1 cat /proc/mdstat' output:



Every 1.0s: cat /proc/mdstat Thu Sep 1 21:26:22 2016

Personalities : [raid10]
md0 : active raid10 sdb1[1] sdc1[2]
2197509120 blocks super 1.2 512K chunks 2 near-copies [3/2] [_UU]
bitmap: 16/17 pages [64KB], 65536KB chunk

unused devices: <none>






raid






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 16 at 15:48









Kusalananda

121k16229372




121k16229372










asked Sep 1 '16 at 17:38









Kārlis K.

8610




8610











  • Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
    – Kārlis K.
    Sep 1 '16 at 17:50










  • Maybe worth adding the relevant contents from /proc/mdstat as well.
    – Stephen Harris
    Sep 1 '16 at 18:09










  • Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
    – Kārlis K.
    Sep 1 '16 at 18:26










  • /etc/mdadm.conf or /etc/mdadm/mdadm.conf could be also helpful.
    – rudimeier
    Sep 1 '16 at 18:53










  • clean means there were no pending writes when the array was shut down. degraded means the array is missing at least one component.
    – Mark
    Feb 15 '17 at 21:30
















  • Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
    – Kārlis K.
    Sep 1 '16 at 17:50










  • Maybe worth adding the relevant contents from /proc/mdstat as well.
    – Stephen Harris
    Sep 1 '16 at 18:09










  • Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
    – Kārlis K.
    Sep 1 '16 at 18:26










  • /etc/mdadm.conf or /etc/mdadm/mdadm.conf could be also helpful.
    – rudimeier
    Sep 1 '16 at 18:53










  • clean means there were no pending writes when the array was shut down. degraded means the array is missing at least one component.
    – Mark
    Feb 15 '17 at 21:30















Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
– Kārlis K.
Sep 1 '16 at 17:50




Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
– Kārlis K.
Sep 1 '16 at 17:50












Maybe worth adding the relevant contents from /proc/mdstat as well.
– Stephen Harris
Sep 1 '16 at 18:09




Maybe worth adding the relevant contents from /proc/mdstat as well.
– Stephen Harris
Sep 1 '16 at 18:09












Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
– Kārlis K.
Sep 1 '16 at 18:26




Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
– Kārlis K.
Sep 1 '16 at 18:26












/etc/mdadm.conf or /etc/mdadm/mdadm.conf could be also helpful.
– rudimeier
Sep 1 '16 at 18:53




/etc/mdadm.conf or /etc/mdadm/mdadm.conf could be also helpful.
– rudimeier
Sep 1 '16 at 18:53












clean means there were no pending writes when the array was shut down. degraded means the array is missing at least one component.
– Mark
Feb 15 '17 at 21:30




clean means there were no pending writes when the array was shut down. degraded means the array is missing at least one component.
– Mark
Feb 15 '17 at 21:30










2 Answers
2






active

oldest

votes


















0














Seems that your raid10 array was configured to have 2 active drives plus one spare. The spare is missing.



This can have several reasons:



  1. Maybe you removed the spare disk from the server

  2. Maybe one drive died and the existing hot spare became active now after a rebuild.

  3. Maybe the hot spare died before it could be ever used.

  4. Maybe one drive (or cable) "was" broken at one time in past and has been automatically removed from the array.

You may check if your server has one broken disk which you don't even see anymore in lsblk output. Could also be that one of your other drives (sda1 or sdd1) was part of your array in past but is broken now. (It can't be sdj1 because it's too small ).



Remove all broken drives from the server.



To avoid the warnings re-add a hot spare drive (maybe one of the unused, non-broken ones) or configure your array to not have a hot-spare anymore.
Be aware that in case 4 the probability that the same drive will fail again is high.



BTW to see what exactly happened in past you could grep the old logfiles for relevant messages.






share|improve this answer






















  • According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array? Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xb42fa1af
    – Kārlis K.
    Sep 1 '16 at 19:01











  • Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
    – rudimeier
    Sep 1 '16 at 19:09










  • Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
    – Kārlis K.
    Sep 1 '16 at 19:25










  • BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
    – rudimeier
    Sep 1 '16 at 19:58


















0














Inspected the system logs as rudimeier suggested and found out there had been a power outage event back at May, after which the RAID array errors started popping up. Since this is a software RAID10 (1+0), I'm thankful only the spare disk flew out of the array instead of the whole array irreversibly crashing. After doing a couple HDD tests with the trusty old Hiren's boot CD and just for variety - Partition Wizard bootable... all suspicious disks checked out with no errors/issues.



I erased (with Partition Wizard bootable, so that the disk would be unformatted and unpartitioned) and then re-added the spare using:



mdadm --add /dev/md0 /dev/sda1





share|improve this answer






















    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f307278%2fraid-array-clean-degraded%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Seems that your raid10 array was configured to have 2 active drives plus one spare. The spare is missing.



    This can have several reasons:



    1. Maybe you removed the spare disk from the server

    2. Maybe one drive died and the existing hot spare became active now after a rebuild.

    3. Maybe the hot spare died before it could be ever used.

    4. Maybe one drive (or cable) "was" broken at one time in past and has been automatically removed from the array.

    You may check if your server has one broken disk which you don't even see anymore in lsblk output. Could also be that one of your other drives (sda1 or sdd1) was part of your array in past but is broken now. (It can't be sdj1 because it's too small ).



    Remove all broken drives from the server.



    To avoid the warnings re-add a hot spare drive (maybe one of the unused, non-broken ones) or configure your array to not have a hot-spare anymore.
    Be aware that in case 4 the probability that the same drive will fail again is high.



    BTW to see what exactly happened in past you could grep the old logfiles for relevant messages.






    share|improve this answer






















    • According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array? Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xb42fa1af
      – Kārlis K.
      Sep 1 '16 at 19:01











    • Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
      – rudimeier
      Sep 1 '16 at 19:09










    • Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
      – Kārlis K.
      Sep 1 '16 at 19:25










    • BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
      – rudimeier
      Sep 1 '16 at 19:58















    0














    Seems that your raid10 array was configured to have 2 active drives plus one spare. The spare is missing.



    This can have several reasons:



    1. Maybe you removed the spare disk from the server

    2. Maybe one drive died and the existing hot spare became active now after a rebuild.

    3. Maybe the hot spare died before it could be ever used.

    4. Maybe one drive (or cable) "was" broken at one time in past and has been automatically removed from the array.

    You may check if your server has one broken disk which you don't even see anymore in lsblk output. Could also be that one of your other drives (sda1 or sdd1) was part of your array in past but is broken now. (It can't be sdj1 because it's too small ).



    Remove all broken drives from the server.



    To avoid the warnings re-add a hot spare drive (maybe one of the unused, non-broken ones) or configure your array to not have a hot-spare anymore.
    Be aware that in case 4 the probability that the same drive will fail again is high.



    BTW to see what exactly happened in past you could grep the old logfiles for relevant messages.






    share|improve this answer






















    • According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array? Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xb42fa1af
      – Kārlis K.
      Sep 1 '16 at 19:01











    • Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
      – rudimeier
      Sep 1 '16 at 19:09










    • Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
      – Kārlis K.
      Sep 1 '16 at 19:25










    • BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
      – rudimeier
      Sep 1 '16 at 19:58













    0












    0








    0






    Seems that your raid10 array was configured to have 2 active drives plus one spare. The spare is missing.



    This can have several reasons:



    1. Maybe you removed the spare disk from the server

    2. Maybe one drive died and the existing hot spare became active now after a rebuild.

    3. Maybe the hot spare died before it could be ever used.

    4. Maybe one drive (or cable) "was" broken at one time in past and has been automatically removed from the array.

    You may check if your server has one broken disk which you don't even see anymore in lsblk output. Could also be that one of your other drives (sda1 or sdd1) was part of your array in past but is broken now. (It can't be sdj1 because it's too small ).



    Remove all broken drives from the server.



    To avoid the warnings re-add a hot spare drive (maybe one of the unused, non-broken ones) or configure your array to not have a hot-spare anymore.
    Be aware that in case 4 the probability that the same drive will fail again is high.



    BTW to see what exactly happened in past you could grep the old logfiles for relevant messages.






    share|improve this answer














    Seems that your raid10 array was configured to have 2 active drives plus one spare. The spare is missing.



    This can have several reasons:



    1. Maybe you removed the spare disk from the server

    2. Maybe one drive died and the existing hot spare became active now after a rebuild.

    3. Maybe the hot spare died before it could be ever used.

    4. Maybe one drive (or cable) "was" broken at one time in past and has been automatically removed from the array.

    You may check if your server has one broken disk which you don't even see anymore in lsblk output. Could also be that one of your other drives (sda1 or sdd1) was part of your array in past but is broken now. (It can't be sdj1 because it's too small ).



    Remove all broken drives from the server.



    To avoid the warnings re-add a hot spare drive (maybe one of the unused, non-broken ones) or configure your array to not have a hot-spare anymore.
    Be aware that in case 4 the probability that the same drive will fail again is high.



    BTW to see what exactly happened in past you could grep the old logfiles for relevant messages.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Sep 1 '16 at 19:02

























    answered Sep 1 '16 at 18:49









    rudimeier

    5,4071732




    5,4071732











    • According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array? Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xb42fa1af
      – Kārlis K.
      Sep 1 '16 at 19:01











    • Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
      – rudimeier
      Sep 1 '16 at 19:09










    • Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
      – Kārlis K.
      Sep 1 '16 at 19:25










    • BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
      – rudimeier
      Sep 1 '16 at 19:58
















    • According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array? Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xb42fa1af
      – Kārlis K.
      Sep 1 '16 at 19:01











    • Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
      – rudimeier
      Sep 1 '16 at 19:09










    • Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
      – Kārlis K.
      Sep 1 '16 at 19:25










    • BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
      – rudimeier
      Sep 1 '16 at 19:58















    According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array? Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xb42fa1af
    – Kārlis K.
    Sep 1 '16 at 19:01





    According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array? Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xb42fa1af
    – Kārlis K.
    Sep 1 '16 at 19:01













    Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
    – rudimeier
    Sep 1 '16 at 19:09




    Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
    – rudimeier
    Sep 1 '16 at 19:09












    Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
    – Kārlis K.
    Sep 1 '16 at 19:25




    Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
    – Kārlis K.
    Sep 1 '16 at 19:25












    BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
    – rudimeier
    Sep 1 '16 at 19:58




    BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
    – rudimeier
    Sep 1 '16 at 19:58













    0














    Inspected the system logs as rudimeier suggested and found out there had been a power outage event back at May, after which the RAID array errors started popping up. Since this is a software RAID10 (1+0), I'm thankful only the spare disk flew out of the array instead of the whole array irreversibly crashing. After doing a couple HDD tests with the trusty old Hiren's boot CD and just for variety - Partition Wizard bootable... all suspicious disks checked out with no errors/issues.



    I erased (with Partition Wizard bootable, so that the disk would be unformatted and unpartitioned) and then re-added the spare using:



    mdadm --add /dev/md0 /dev/sda1





    share|improve this answer



























      0














      Inspected the system logs as rudimeier suggested and found out there had been a power outage event back at May, after which the RAID array errors started popping up. Since this is a software RAID10 (1+0), I'm thankful only the spare disk flew out of the array instead of the whole array irreversibly crashing. After doing a couple HDD tests with the trusty old Hiren's boot CD and just for variety - Partition Wizard bootable... all suspicious disks checked out with no errors/issues.



      I erased (with Partition Wizard bootable, so that the disk would be unformatted and unpartitioned) and then re-added the spare using:



      mdadm --add /dev/md0 /dev/sda1





      share|improve this answer

























        0












        0








        0






        Inspected the system logs as rudimeier suggested and found out there had been a power outage event back at May, after which the RAID array errors started popping up. Since this is a software RAID10 (1+0), I'm thankful only the spare disk flew out of the array instead of the whole array irreversibly crashing. After doing a couple HDD tests with the trusty old Hiren's boot CD and just for variety - Partition Wizard bootable... all suspicious disks checked out with no errors/issues.



        I erased (with Partition Wizard bootable, so that the disk would be unformatted and unpartitioned) and then re-added the spare using:



        mdadm --add /dev/md0 /dev/sda1





        share|improve this answer














        Inspected the system logs as rudimeier suggested and found out there had been a power outage event back at May, after which the RAID array errors started popping up. Since this is a software RAID10 (1+0), I'm thankful only the spare disk flew out of the array instead of the whole array irreversibly crashing. After doing a couple HDD tests with the trusty old Hiren's boot CD and just for variety - Partition Wizard bootable... all suspicious disks checked out with no errors/issues.



        I erased (with Partition Wizard bootable, so that the disk would be unformatted and unpartitioned) and then re-added the spare using:



        mdadm --add /dev/md0 /dev/sda1






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 13 '17 at 12:36









        Community

        1




        1










        answered Sep 6 '16 at 16:53









        Kārlis K.

        8610




        8610



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f307278%2fraid-array-clean-degraded%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown






            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?