Raid array 'clean, degraded'?
Clash Royale CLAN TAG#URR8PPP
Today I noticed that there are bunch of messages complaining about the RAID array (it's a software RAID10), so I started looking into it but need help because I'm unsure if I interpret the status output correctly (I've kinda forgotten the actual RAID set-up because the machine is at a remote location and I configured it about a year or two ago)... if I remember correctly the system was suppose to have 8x 2TB disks, but that's about all I can remember.
System mail:
N 14 root@edmedia.loca Wed May 25 21:30 32/1059 Fail event on /dev/md/0:EDMedia
N 15 root@edmedia.loca Thu May 26 06:25 30/1025 DegradedArray event on /dev/md/0:EDMedia
N 16 root@edmedia.loca Thu May 26 06:25 30/1025 SparesMissing event on /dev/md/0:EDMedia
The bit that's specifically confusing me, now that I'm looking at the outputs, is this:
Number Major Minor RaidDevice State
0 0 0 0 removed
Does it mean that a disk has been removed (or that it dropped from the array)? Should I try re-adding '/dev/sda1' to it? And is there any way I can tell that '/dev/sda1' was part of '/dev/md0' without adding a partitioned disk in-use by something, only to make things worse?
Status outputs:
'mdadm -D /dev/md0' output:
/dev/md0:
Version : 1.2
Creation Time : Mon Feb 8 23:15:33 2016
Raid Level : raid10
Array Size : 2197509120 (2095.71 GiB 2250.25 GB)
Used Dev Size : 1465006080 (1397.14 GiB 1500.17 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Sep 1 19:54:05 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Name : EDMEDIA:0
UUID : 6ebf98c8:d52a13f0:7ab1bffb:4dbe22b6
Events : 4963861
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
'lsblk' output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.4T 0 disk
└─sda1 8:1 0 1.4T 0 part
sdb 8:16 0 1.4T 0 disk
└─sdb1 8:17 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdc 8:32 0 1.4T 0 disk
└─sdc1 8:33 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdd 8:48 0 1.4T 0 disk
└─sdd1 8:49 0 1.4T 0 part
sdj 8:144 0 298.1G 0 disk
└─sdj1 8:145 0 298.1G 0 part
sr0 11:0 1 1024M 0 rom
'df' output:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 2146148144 1235118212 801988884 61% /
udev 10240 0 10240 0% /dev
tmpfs 1637644 17124 1620520 2% /run
tmpfs 4094104 0 4094104 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 4094104 0 4094104 0% /sys/fs/cgroup
/dev/md0p2 242446 34463 195465 15% /boot
'watch -n1 cat /proc/mdstat' output:
Every 1.0s: cat /proc/mdstat Thu Sep 1 21:26:22 2016
Personalities : [raid10]
md0 : active raid10 sdb1[1] sdc1[2]
2197509120 blocks super 1.2 512K chunks 2 near-copies [3/2] [_UU]
bitmap: 16/17 pages [64KB], 65536KB chunk
unused devices: <none>
raid
add a comment |
Today I noticed that there are bunch of messages complaining about the RAID array (it's a software RAID10), so I started looking into it but need help because I'm unsure if I interpret the status output correctly (I've kinda forgotten the actual RAID set-up because the machine is at a remote location and I configured it about a year or two ago)... if I remember correctly the system was suppose to have 8x 2TB disks, but that's about all I can remember.
System mail:
N 14 root@edmedia.loca Wed May 25 21:30 32/1059 Fail event on /dev/md/0:EDMedia
N 15 root@edmedia.loca Thu May 26 06:25 30/1025 DegradedArray event on /dev/md/0:EDMedia
N 16 root@edmedia.loca Thu May 26 06:25 30/1025 SparesMissing event on /dev/md/0:EDMedia
The bit that's specifically confusing me, now that I'm looking at the outputs, is this:
Number Major Minor RaidDevice State
0 0 0 0 removed
Does it mean that a disk has been removed (or that it dropped from the array)? Should I try re-adding '/dev/sda1' to it? And is there any way I can tell that '/dev/sda1' was part of '/dev/md0' without adding a partitioned disk in-use by something, only to make things worse?
Status outputs:
'mdadm -D /dev/md0' output:
/dev/md0:
Version : 1.2
Creation Time : Mon Feb 8 23:15:33 2016
Raid Level : raid10
Array Size : 2197509120 (2095.71 GiB 2250.25 GB)
Used Dev Size : 1465006080 (1397.14 GiB 1500.17 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Sep 1 19:54:05 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Name : EDMEDIA:0
UUID : 6ebf98c8:d52a13f0:7ab1bffb:4dbe22b6
Events : 4963861
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
'lsblk' output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.4T 0 disk
└─sda1 8:1 0 1.4T 0 part
sdb 8:16 0 1.4T 0 disk
└─sdb1 8:17 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdc 8:32 0 1.4T 0 disk
└─sdc1 8:33 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdd 8:48 0 1.4T 0 disk
└─sdd1 8:49 0 1.4T 0 part
sdj 8:144 0 298.1G 0 disk
└─sdj1 8:145 0 298.1G 0 part
sr0 11:0 1 1024M 0 rom
'df' output:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 2146148144 1235118212 801988884 61% /
udev 10240 0 10240 0% /dev
tmpfs 1637644 17124 1620520 2% /run
tmpfs 4094104 0 4094104 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 4094104 0 4094104 0% /sys/fs/cgroup
/dev/md0p2 242446 34463 195465 15% /boot
'watch -n1 cat /proc/mdstat' output:
Every 1.0s: cat /proc/mdstat Thu Sep 1 21:26:22 2016
Personalities : [raid10]
md0 : active raid10 sdb1[1] sdc1[2]
2197509120 blocks super 1.2 512K chunks 2 near-copies [3/2] [_UU]
bitmap: 16/17 pages [64KB], 65536KB chunk
unused devices: <none>
raid
Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
– Kārlis K.
Sep 1 '16 at 17:50
Maybe worth adding the relevant contents from/proc/mdstat
as well.
– Stephen Harris
Sep 1 '16 at 18:09
Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
– Kārlis K.
Sep 1 '16 at 18:26
/etc/mdadm.conf
or/etc/mdadm/mdadm.conf
could be also helpful.
– rudimeier
Sep 1 '16 at 18:53
clean
means there were no pending writes when the array was shut down.degraded
means the array is missing at least one component.
– Mark
Feb 15 '17 at 21:30
add a comment |
Today I noticed that there are bunch of messages complaining about the RAID array (it's a software RAID10), so I started looking into it but need help because I'm unsure if I interpret the status output correctly (I've kinda forgotten the actual RAID set-up because the machine is at a remote location and I configured it about a year or two ago)... if I remember correctly the system was suppose to have 8x 2TB disks, but that's about all I can remember.
System mail:
N 14 root@edmedia.loca Wed May 25 21:30 32/1059 Fail event on /dev/md/0:EDMedia
N 15 root@edmedia.loca Thu May 26 06:25 30/1025 DegradedArray event on /dev/md/0:EDMedia
N 16 root@edmedia.loca Thu May 26 06:25 30/1025 SparesMissing event on /dev/md/0:EDMedia
The bit that's specifically confusing me, now that I'm looking at the outputs, is this:
Number Major Minor RaidDevice State
0 0 0 0 removed
Does it mean that a disk has been removed (or that it dropped from the array)? Should I try re-adding '/dev/sda1' to it? And is there any way I can tell that '/dev/sda1' was part of '/dev/md0' without adding a partitioned disk in-use by something, only to make things worse?
Status outputs:
'mdadm -D /dev/md0' output:
/dev/md0:
Version : 1.2
Creation Time : Mon Feb 8 23:15:33 2016
Raid Level : raid10
Array Size : 2197509120 (2095.71 GiB 2250.25 GB)
Used Dev Size : 1465006080 (1397.14 GiB 1500.17 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Sep 1 19:54:05 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Name : EDMEDIA:0
UUID : 6ebf98c8:d52a13f0:7ab1bffb:4dbe22b6
Events : 4963861
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
'lsblk' output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.4T 0 disk
└─sda1 8:1 0 1.4T 0 part
sdb 8:16 0 1.4T 0 disk
└─sdb1 8:17 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdc 8:32 0 1.4T 0 disk
└─sdc1 8:33 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdd 8:48 0 1.4T 0 disk
└─sdd1 8:49 0 1.4T 0 part
sdj 8:144 0 298.1G 0 disk
└─sdj1 8:145 0 298.1G 0 part
sr0 11:0 1 1024M 0 rom
'df' output:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 2146148144 1235118212 801988884 61% /
udev 10240 0 10240 0% /dev
tmpfs 1637644 17124 1620520 2% /run
tmpfs 4094104 0 4094104 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 4094104 0 4094104 0% /sys/fs/cgroup
/dev/md0p2 242446 34463 195465 15% /boot
'watch -n1 cat /proc/mdstat' output:
Every 1.0s: cat /proc/mdstat Thu Sep 1 21:26:22 2016
Personalities : [raid10]
md0 : active raid10 sdb1[1] sdc1[2]
2197509120 blocks super 1.2 512K chunks 2 near-copies [3/2] [_UU]
bitmap: 16/17 pages [64KB], 65536KB chunk
unused devices: <none>
raid
Today I noticed that there are bunch of messages complaining about the RAID array (it's a software RAID10), so I started looking into it but need help because I'm unsure if I interpret the status output correctly (I've kinda forgotten the actual RAID set-up because the machine is at a remote location and I configured it about a year or two ago)... if I remember correctly the system was suppose to have 8x 2TB disks, but that's about all I can remember.
System mail:
N 14 root@edmedia.loca Wed May 25 21:30 32/1059 Fail event on /dev/md/0:EDMedia
N 15 root@edmedia.loca Thu May 26 06:25 30/1025 DegradedArray event on /dev/md/0:EDMedia
N 16 root@edmedia.loca Thu May 26 06:25 30/1025 SparesMissing event on /dev/md/0:EDMedia
The bit that's specifically confusing me, now that I'm looking at the outputs, is this:
Number Major Minor RaidDevice State
0 0 0 0 removed
Does it mean that a disk has been removed (or that it dropped from the array)? Should I try re-adding '/dev/sda1' to it? And is there any way I can tell that '/dev/sda1' was part of '/dev/md0' without adding a partitioned disk in-use by something, only to make things worse?
Status outputs:
'mdadm -D /dev/md0' output:
/dev/md0:
Version : 1.2
Creation Time : Mon Feb 8 23:15:33 2016
Raid Level : raid10
Array Size : 2197509120 (2095.71 GiB 2250.25 GB)
Used Dev Size : 1465006080 (1397.14 GiB 1500.17 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Sep 1 19:54:05 2016
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Name : EDMEDIA:0
UUID : 6ebf98c8:d52a13f0:7ab1bffb:4dbe22b6
Events : 4963861
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
'lsblk' output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.4T 0 disk
└─sda1 8:1 0 1.4T 0 part
sdb 8:16 0 1.4T 0 disk
└─sdb1 8:17 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdc 8:32 0 1.4T 0 disk
└─sdc1 8:33 0 1.4T 0 part
└─md0 9:0 0 2T 0 raid10
├─md0p1 259:0 0 1.5M 0 md
├─md0p2 259:1 0 244.5M 0 md /boot
└─md0p3 259:2 0 2T 0 md
├─EDMedia--vg-root 253:0 0 2T 0 lvm /
└─EDMedia--vg-swap_1 253:1 0 16G 0 lvm [SWAP]
sdd 8:48 0 1.4T 0 disk
└─sdd1 8:49 0 1.4T 0 part
sdj 8:144 0 298.1G 0 disk
└─sdj1 8:145 0 298.1G 0 part
sr0 11:0 1 1024M 0 rom
'df' output:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/dm-0 2146148144 1235118212 801988884 61% /
udev 10240 0 10240 0% /dev
tmpfs 1637644 17124 1620520 2% /run
tmpfs 4094104 0 4094104 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 4094104 0 4094104 0% /sys/fs/cgroup
/dev/md0p2 242446 34463 195465 15% /boot
'watch -n1 cat /proc/mdstat' output:
Every 1.0s: cat /proc/mdstat Thu Sep 1 21:26:22 2016
Personalities : [raid10]
md0 : active raid10 sdb1[1] sdc1[2]
2197509120 blocks super 1.2 512K chunks 2 near-copies [3/2] [_UU]
bitmap: 16/17 pages [64KB], 65536KB chunk
unused devices: <none>
raid
raid
edited Dec 16 at 15:48
Kusalananda
121k16229372
121k16229372
asked Sep 1 '16 at 17:38
Kārlis K.
8610
8610
Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
– Kārlis K.
Sep 1 '16 at 17:50
Maybe worth adding the relevant contents from/proc/mdstat
as well.
– Stephen Harris
Sep 1 '16 at 18:09
Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
– Kārlis K.
Sep 1 '16 at 18:26
/etc/mdadm.conf
or/etc/mdadm/mdadm.conf
could be also helpful.
– rudimeier
Sep 1 '16 at 18:53
clean
means there were no pending writes when the array was shut down.degraded
means the array is missing at least one component.
– Mark
Feb 15 '17 at 21:30
add a comment |
Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
– Kārlis K.
Sep 1 '16 at 17:50
Maybe worth adding the relevant contents from/proc/mdstat
as well.
– Stephen Harris
Sep 1 '16 at 18:09
Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
– Kārlis K.
Sep 1 '16 at 18:26
/etc/mdadm.conf
or/etc/mdadm/mdadm.conf
could be also helpful.
– rudimeier
Sep 1 '16 at 18:53
clean
means there were no pending writes when the array was shut down.degraded
means the array is missing at least one component.
– Mark
Feb 15 '17 at 21:30
Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
– Kārlis K.
Sep 1 '16 at 17:50
Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
– Kārlis K.
Sep 1 '16 at 17:50
Maybe worth adding the relevant contents from
/proc/mdstat
as well.– Stephen Harris
Sep 1 '16 at 18:09
Maybe worth adding the relevant contents from
/proc/mdstat
as well.– Stephen Harris
Sep 1 '16 at 18:09
Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
– Kārlis K.
Sep 1 '16 at 18:26
Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
– Kārlis K.
Sep 1 '16 at 18:26
/etc/mdadm.conf
or /etc/mdadm/mdadm.conf
could be also helpful.– rudimeier
Sep 1 '16 at 18:53
/etc/mdadm.conf
or /etc/mdadm/mdadm.conf
could be also helpful.– rudimeier
Sep 1 '16 at 18:53
clean
means there were no pending writes when the array was shut down. degraded
means the array is missing at least one component.– Mark
Feb 15 '17 at 21:30
clean
means there were no pending writes when the array was shut down. degraded
means the array is missing at least one component.– Mark
Feb 15 '17 at 21:30
add a comment |
2 Answers
2
active
oldest
votes
Seems that your raid10 array was configured to have 2 active drives plus one spare. The spare is missing.
This can have several reasons:
- Maybe you removed the spare disk from the server
- Maybe one drive died and the existing hot spare became active now after a rebuild.
- Maybe the hot spare died before it could be ever used.
- Maybe one drive (or cable) "was" broken at one time in past and has been automatically removed from the array.
You may check if your server has one broken disk which you don't even see anymore in lsblk output. Could also be that one of your other drives (sda1 or sdd1) was part of your array in past but is broken now. (It can't be sdj1 because it's too small ).
Remove all broken drives from the server.
To avoid the warnings re-add a hot spare drive (maybe one of the unused, non-broken ones) or configure your array to not have a hot-spare anymore.
Be aware that in case 4 the probability that the same drive will fail again is high.
BTW to see what exactly happened in past you could grep the old logfiles for relevant messages.
According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array?Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb42fa1af
– Kārlis K.
Sep 1 '16 at 19:01
Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
– rudimeier
Sep 1 '16 at 19:09
Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
– Kārlis K.
Sep 1 '16 at 19:25
BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
– rudimeier
Sep 1 '16 at 19:58
add a comment |
Inspected the system logs as rudimeier suggested and found out there had been a power outage event back at May, after which the RAID array errors started popping up. Since this is a software RAID10 (1+0), I'm thankful only the spare disk flew out of the array instead of the whole array irreversibly crashing. After doing a couple HDD tests with the trusty old Hiren's boot CD and just for variety - Partition Wizard bootable... all suspicious disks checked out with no errors/issues.
I erased (with Partition Wizard bootable, so that the disk would be unformatted and unpartitioned) and then re-added the spare using:
mdadm --add /dev/md0 /dev/sda1
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f307278%2fraid-array-clean-degraded%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Seems that your raid10 array was configured to have 2 active drives plus one spare. The spare is missing.
This can have several reasons:
- Maybe you removed the spare disk from the server
- Maybe one drive died and the existing hot spare became active now after a rebuild.
- Maybe the hot spare died before it could be ever used.
- Maybe one drive (or cable) "was" broken at one time in past and has been automatically removed from the array.
You may check if your server has one broken disk which you don't even see anymore in lsblk output. Could also be that one of your other drives (sda1 or sdd1) was part of your array in past but is broken now. (It can't be sdj1 because it's too small ).
Remove all broken drives from the server.
To avoid the warnings re-add a hot spare drive (maybe one of the unused, non-broken ones) or configure your array to not have a hot-spare anymore.
Be aware that in case 4 the probability that the same drive will fail again is high.
BTW to see what exactly happened in past you could grep the old logfiles for relevant messages.
According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array?Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb42fa1af
– Kārlis K.
Sep 1 '16 at 19:01
Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
– rudimeier
Sep 1 '16 at 19:09
Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
– Kārlis K.
Sep 1 '16 at 19:25
BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
– rudimeier
Sep 1 '16 at 19:58
add a comment |
Seems that your raid10 array was configured to have 2 active drives plus one spare. The spare is missing.
This can have several reasons:
- Maybe you removed the spare disk from the server
- Maybe one drive died and the existing hot spare became active now after a rebuild.
- Maybe the hot spare died before it could be ever used.
- Maybe one drive (or cable) "was" broken at one time in past and has been automatically removed from the array.
You may check if your server has one broken disk which you don't even see anymore in lsblk output. Could also be that one of your other drives (sda1 or sdd1) was part of your array in past but is broken now. (It can't be sdj1 because it's too small ).
Remove all broken drives from the server.
To avoid the warnings re-add a hot spare drive (maybe one of the unused, non-broken ones) or configure your array to not have a hot-spare anymore.
Be aware that in case 4 the probability that the same drive will fail again is high.
BTW to see what exactly happened in past you could grep the old logfiles for relevant messages.
According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array?Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb42fa1af
– Kārlis K.
Sep 1 '16 at 19:01
Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
– rudimeier
Sep 1 '16 at 19:09
Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
– Kārlis K.
Sep 1 '16 at 19:25
BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
– rudimeier
Sep 1 '16 at 19:58
add a comment |
Seems that your raid10 array was configured to have 2 active drives plus one spare. The spare is missing.
This can have several reasons:
- Maybe you removed the spare disk from the server
- Maybe one drive died and the existing hot spare became active now after a rebuild.
- Maybe the hot spare died before it could be ever used.
- Maybe one drive (or cable) "was" broken at one time in past and has been automatically removed from the array.
You may check if your server has one broken disk which you don't even see anymore in lsblk output. Could also be that one of your other drives (sda1 or sdd1) was part of your array in past but is broken now. (It can't be sdj1 because it's too small ).
Remove all broken drives from the server.
To avoid the warnings re-add a hot spare drive (maybe one of the unused, non-broken ones) or configure your array to not have a hot-spare anymore.
Be aware that in case 4 the probability that the same drive will fail again is high.
BTW to see what exactly happened in past you could grep the old logfiles for relevant messages.
Seems that your raid10 array was configured to have 2 active drives plus one spare. The spare is missing.
This can have several reasons:
- Maybe you removed the spare disk from the server
- Maybe one drive died and the existing hot spare became active now after a rebuild.
- Maybe the hot spare died before it could be ever used.
- Maybe one drive (or cable) "was" broken at one time in past and has been automatically removed from the array.
You may check if your server has one broken disk which you don't even see anymore in lsblk output. Could also be that one of your other drives (sda1 or sdd1) was part of your array in past but is broken now. (It can't be sdj1 because it's too small ).
Remove all broken drives from the server.
To avoid the warnings re-add a hot spare drive (maybe one of the unused, non-broken ones) or configure your array to not have a hot-spare anymore.
Be aware that in case 4 the probability that the same drive will fail again is high.
BTW to see what exactly happened in past you could grep the old logfiles for relevant messages.
edited Sep 1 '16 at 19:02
answered Sep 1 '16 at 18:49
rudimeier
5,4071732
5,4071732
According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array?Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb42fa1af
– Kārlis K.
Sep 1 '16 at 19:01
Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
– rudimeier
Sep 1 '16 at 19:09
Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
– Kārlis K.
Sep 1 '16 at 19:25
BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
– rudimeier
Sep 1 '16 at 19:58
add a comment |
According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array?Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb42fa1af
– Kārlis K.
Sep 1 '16 at 19:01
Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
– rudimeier
Sep 1 '16 at 19:09
Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
– Kārlis K.
Sep 1 '16 at 19:25
BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
– rudimeier
Sep 1 '16 at 19:58
According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array?
Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb42fa1af
– Kārlis K.
Sep 1 '16 at 19:01
According to 'fdisk -l' there is still a disk at "/dev/sda1" up until now I was worried that perhaps it was the system disk, but perhaps it was the missing hotspare and I should attempt re-adding it to the array?
Disk /dev/sda: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb42fa1af
– Kārlis K.
Sep 1 '16 at 19:01
Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
– rudimeier
Sep 1 '16 at 19:09
Before re-adding anything I would check all drives. (smartctl, badblocks). Also, as mentioned, watch the old logs to see which drive failed in past.
– rudimeier
Sep 1 '16 at 19:09
Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
– Kārlis K.
Sep 1 '16 at 19:25
Yeah, it's for the best to do so - I'll analyze the logs first, then do some drive checks/tests .... the system is located rather far away from me, so it might be a while till I get to physically inspect it.
– Kārlis K.
Sep 1 '16 at 19:25
BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
– rudimeier
Sep 1 '16 at 19:58
BTW about drive names: note that if one drive is missing, removed, broken, whatever the other drives may have had different names in past. Maybe sdc was sdd in past, etc. You may find information about the physical sata ports or drive's serial numbers in the logs.
– rudimeier
Sep 1 '16 at 19:58
add a comment |
Inspected the system logs as rudimeier suggested and found out there had been a power outage event back at May, after which the RAID array errors started popping up. Since this is a software RAID10 (1+0), I'm thankful only the spare disk flew out of the array instead of the whole array irreversibly crashing. After doing a couple HDD tests with the trusty old Hiren's boot CD and just for variety - Partition Wizard bootable... all suspicious disks checked out with no errors/issues.
I erased (with Partition Wizard bootable, so that the disk would be unformatted and unpartitioned) and then re-added the spare using:
mdadm --add /dev/md0 /dev/sda1
add a comment |
Inspected the system logs as rudimeier suggested and found out there had been a power outage event back at May, after which the RAID array errors started popping up. Since this is a software RAID10 (1+0), I'm thankful only the spare disk flew out of the array instead of the whole array irreversibly crashing. After doing a couple HDD tests with the trusty old Hiren's boot CD and just for variety - Partition Wizard bootable... all suspicious disks checked out with no errors/issues.
I erased (with Partition Wizard bootable, so that the disk would be unformatted and unpartitioned) and then re-added the spare using:
mdadm --add /dev/md0 /dev/sda1
add a comment |
Inspected the system logs as rudimeier suggested and found out there had been a power outage event back at May, after which the RAID array errors started popping up. Since this is a software RAID10 (1+0), I'm thankful only the spare disk flew out of the array instead of the whole array irreversibly crashing. After doing a couple HDD tests with the trusty old Hiren's boot CD and just for variety - Partition Wizard bootable... all suspicious disks checked out with no errors/issues.
I erased (with Partition Wizard bootable, so that the disk would be unformatted and unpartitioned) and then re-added the spare using:
mdadm --add /dev/md0 /dev/sda1
Inspected the system logs as rudimeier suggested and found out there had been a power outage event back at May, after which the RAID array errors started popping up. Since this is a software RAID10 (1+0), I'm thankful only the spare disk flew out of the array instead of the whole array irreversibly crashing. After doing a couple HDD tests with the trusty old Hiren's boot CD and just for variety - Partition Wizard bootable... all suspicious disks checked out with no errors/issues.
I erased (with Partition Wizard bootable, so that the disk would be unformatted and unpartitioned) and then re-added the spare using:
mdadm --add /dev/md0 /dev/sda1
edited Apr 13 '17 at 12:36
Community♦
1
1
answered Sep 6 '16 at 16:53
Kārlis K.
8610
8610
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f307278%2fraid-array-clean-degraded%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Worth adding - the system works, is booting and doesn't seem to have any other issues, hence I'm trying to figure out if this is because there's an actual problem with the array, or is it simply because of 'spare=1' in the mdadm config...
– Kārlis K.
Sep 1 '16 at 17:50
Maybe worth adding the relevant contents from
/proc/mdstat
as well.– Stephen Harris
Sep 1 '16 at 18:09
Output added. Shouldn't it be '/dev/sda1' & '/dev/sdb1' ... not sdb1 & sdc1?
– Kārlis K.
Sep 1 '16 at 18:26
/etc/mdadm.conf
or/etc/mdadm/mdadm.conf
could be also helpful.– rudimeier
Sep 1 '16 at 18:53
clean
means there were no pending writes when the array was shut down.degraded
means the array is missing at least one component.– Mark
Feb 15 '17 at 21:30