Repairing a RAID5 array
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I'm trying to repair a RAID5 array, consisting of 3 2TB disks. After working perfectly for quite some time, the computer (running Debian) suddenly wouldn't boot anymore and got stuck at a GRUB prompt. I'm pretty sure it has to do with the RAID array.
Since it is difficult to give a full account of everything tried already, I will try to describe the current status.
mdadm --detail /dev/md0
outputs:
/dev/md0:
Version : 1.2
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Sun Mar 22 16:18:56 2015
State : active, degraded, Not Started
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : ubuntu:0 (local to host ubuntu)
UUID : ae2b72c0:60444678:25797b77:3695130a
Events : 57
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
mdadm --examine /dev/sda1
gives:
mdadm: No md superblock detected on /dev/sda1.
which makes sense, because I reformatted this partition because I believed it to be the faulty one.
mdadm --examine /dev/sdb1
gives:
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ae2b72c0:60444678:25797b77:3695130a
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f1817af9:1d964693:774d5d63:bfa69e3d
Update Time : Sun Mar 22 16:18:56 2015
Checksum : ab7c79ae - correct
Events : 57
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : .AA ('A' == active, '.' == missing)
mdadm --detail /dev/sdc1
gives:
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ae2b72c0:60444678:25797b77:3695130a
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f076b568:007e3f9b:71a19ea2:474e5fe9
Update Time : Sun Mar 22 16:18:56 2015
Checksum : db25214 - correct
Events : 57
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing)
cat /proc/mdstat
:
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdb1[1] sdc1[2]
3906764800 blocks super 1.2
unused devices: <none>
fdisk -l
:
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000d84fa
Device Boot Start End Blocks Id System
/dev/sda1 2048 3907029167 1953513560 fd Linux raid autodetect
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000802d9
Device Boot Start End Blocks Id System
/dev/sdb1 * 2048 3907028991 1953513472 fd Linux raid autodetect
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000a8dca
Device Boot Start End Blocks Id System
/dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect
Disk /dev/sdd: 7756 MB, 7756087296 bytes
255 heads, 63 sectors/track, 942 cylinders, total 15148608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x128faec9
Device Boot Start End Blocks Id System
/dev/sdd1 * 2048 15148607 7573280 c W95 FAT32 (LBA)
And of course I've tried to add the /dev/sda1
again. mdadm --manage /dev/md0 --add /dev/sda1
gives:
mdadm: add new device failed for /dev/sda1 as 3: Invalid argument
If the RAID is fixed I will probably also need getting GRUB up and running again, so it can detect the RAID/LVM and boot again.
EDIT (added smartctl test results)
Output of smartctl
tests
smartctl -a /dev/sda
:
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-30-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD20EZRX-00D8PB0
Serial Number: WD-WMC4M0760056
LU WWN Device Id: 5 0014ee 003a4a444
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Mar 24 22:07:08 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (26280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 266) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3401
3 Spin_Up_Time 0x0027 172 172 021 Pre-fail Always - 4375
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 59
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9697
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 51
193 Load_Cycle_Count 0x0032 115 115 000 Old_age Always - 255276
194 Temperature_Celsius 0x0022 119 106 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 12
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
raid software-raid raid5
migrated from serverfault.com Mar 22 '15 at 19:29
This question came from our site for system and network administrators.
add a comment |
up vote
1
down vote
favorite
I'm trying to repair a RAID5 array, consisting of 3 2TB disks. After working perfectly for quite some time, the computer (running Debian) suddenly wouldn't boot anymore and got stuck at a GRUB prompt. I'm pretty sure it has to do with the RAID array.
Since it is difficult to give a full account of everything tried already, I will try to describe the current status.
mdadm --detail /dev/md0
outputs:
/dev/md0:
Version : 1.2
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Sun Mar 22 16:18:56 2015
State : active, degraded, Not Started
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : ubuntu:0 (local to host ubuntu)
UUID : ae2b72c0:60444678:25797b77:3695130a
Events : 57
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
mdadm --examine /dev/sda1
gives:
mdadm: No md superblock detected on /dev/sda1.
which makes sense, because I reformatted this partition because I believed it to be the faulty one.
mdadm --examine /dev/sdb1
gives:
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ae2b72c0:60444678:25797b77:3695130a
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f1817af9:1d964693:774d5d63:bfa69e3d
Update Time : Sun Mar 22 16:18:56 2015
Checksum : ab7c79ae - correct
Events : 57
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : .AA ('A' == active, '.' == missing)
mdadm --detail /dev/sdc1
gives:
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ae2b72c0:60444678:25797b77:3695130a
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f076b568:007e3f9b:71a19ea2:474e5fe9
Update Time : Sun Mar 22 16:18:56 2015
Checksum : db25214 - correct
Events : 57
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing)
cat /proc/mdstat
:
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdb1[1] sdc1[2]
3906764800 blocks super 1.2
unused devices: <none>
fdisk -l
:
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000d84fa
Device Boot Start End Blocks Id System
/dev/sda1 2048 3907029167 1953513560 fd Linux raid autodetect
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000802d9
Device Boot Start End Blocks Id System
/dev/sdb1 * 2048 3907028991 1953513472 fd Linux raid autodetect
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000a8dca
Device Boot Start End Blocks Id System
/dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect
Disk /dev/sdd: 7756 MB, 7756087296 bytes
255 heads, 63 sectors/track, 942 cylinders, total 15148608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x128faec9
Device Boot Start End Blocks Id System
/dev/sdd1 * 2048 15148607 7573280 c W95 FAT32 (LBA)
And of course I've tried to add the /dev/sda1
again. mdadm --manage /dev/md0 --add /dev/sda1
gives:
mdadm: add new device failed for /dev/sda1 as 3: Invalid argument
If the RAID is fixed I will probably also need getting GRUB up and running again, so it can detect the RAID/LVM and boot again.
EDIT (added smartctl test results)
Output of smartctl
tests
smartctl -a /dev/sda
:
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-30-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD20EZRX-00D8PB0
Serial Number: WD-WMC4M0760056
LU WWN Device Id: 5 0014ee 003a4a444
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Mar 24 22:07:08 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (26280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 266) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3401
3 Spin_Up_Time 0x0027 172 172 021 Pre-fail Always - 4375
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 59
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9697
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 51
193 Load_Cycle_Count 0x0032 115 115 000 Old_age Always - 255276
194 Temperature_Celsius 0x0022 119 106 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 12
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
raid software-raid raid5
migrated from serverfault.com Mar 22 '15 at 19:29
This question came from our site for system and network administrators.
1
I wouldn't use RAID5 with 3 drives. Linux md RAID10 can stripe 2 copies of your data across 3 drives, so you get RAID0 read performance (with the "far" layout). Having 2 copies of every block means your redundancy overhead is 50%, instead of 33% for RAID5, though. You'd have 3TB of usable space instead of 4. I think btrfs built-in redundancy mode is supposed to be mostly stable raid1 mode, if you want to risk that. Not posting an answer since roaima already gave the correct one. You might need a kernel command-line option to get your initrd to start your array even though it's degraded.
– Peter Cordes
Mar 25 '15 at 7:55
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm trying to repair a RAID5 array, consisting of 3 2TB disks. After working perfectly for quite some time, the computer (running Debian) suddenly wouldn't boot anymore and got stuck at a GRUB prompt. I'm pretty sure it has to do with the RAID array.
Since it is difficult to give a full account of everything tried already, I will try to describe the current status.
mdadm --detail /dev/md0
outputs:
/dev/md0:
Version : 1.2
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Sun Mar 22 16:18:56 2015
State : active, degraded, Not Started
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : ubuntu:0 (local to host ubuntu)
UUID : ae2b72c0:60444678:25797b77:3695130a
Events : 57
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
mdadm --examine /dev/sda1
gives:
mdadm: No md superblock detected on /dev/sda1.
which makes sense, because I reformatted this partition because I believed it to be the faulty one.
mdadm --examine /dev/sdb1
gives:
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ae2b72c0:60444678:25797b77:3695130a
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f1817af9:1d964693:774d5d63:bfa69e3d
Update Time : Sun Mar 22 16:18:56 2015
Checksum : ab7c79ae - correct
Events : 57
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : .AA ('A' == active, '.' == missing)
mdadm --detail /dev/sdc1
gives:
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ae2b72c0:60444678:25797b77:3695130a
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f076b568:007e3f9b:71a19ea2:474e5fe9
Update Time : Sun Mar 22 16:18:56 2015
Checksum : db25214 - correct
Events : 57
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing)
cat /proc/mdstat
:
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdb1[1] sdc1[2]
3906764800 blocks super 1.2
unused devices: <none>
fdisk -l
:
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000d84fa
Device Boot Start End Blocks Id System
/dev/sda1 2048 3907029167 1953513560 fd Linux raid autodetect
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000802d9
Device Boot Start End Blocks Id System
/dev/sdb1 * 2048 3907028991 1953513472 fd Linux raid autodetect
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000a8dca
Device Boot Start End Blocks Id System
/dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect
Disk /dev/sdd: 7756 MB, 7756087296 bytes
255 heads, 63 sectors/track, 942 cylinders, total 15148608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x128faec9
Device Boot Start End Blocks Id System
/dev/sdd1 * 2048 15148607 7573280 c W95 FAT32 (LBA)
And of course I've tried to add the /dev/sda1
again. mdadm --manage /dev/md0 --add /dev/sda1
gives:
mdadm: add new device failed for /dev/sda1 as 3: Invalid argument
If the RAID is fixed I will probably also need getting GRUB up and running again, so it can detect the RAID/LVM and boot again.
EDIT (added smartctl test results)
Output of smartctl
tests
smartctl -a /dev/sda
:
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-30-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD20EZRX-00D8PB0
Serial Number: WD-WMC4M0760056
LU WWN Device Id: 5 0014ee 003a4a444
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Mar 24 22:07:08 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (26280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 266) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3401
3 Spin_Up_Time 0x0027 172 172 021 Pre-fail Always - 4375
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 59
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9697
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 51
193 Load_Cycle_Count 0x0032 115 115 000 Old_age Always - 255276
194 Temperature_Celsius 0x0022 119 106 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 12
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
raid software-raid raid5
I'm trying to repair a RAID5 array, consisting of 3 2TB disks. After working perfectly for quite some time, the computer (running Debian) suddenly wouldn't boot anymore and got stuck at a GRUB prompt. I'm pretty sure it has to do with the RAID array.
Since it is difficult to give a full account of everything tried already, I will try to describe the current status.
mdadm --detail /dev/md0
outputs:
/dev/md0:
Version : 1.2
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Sun Mar 22 16:18:56 2015
State : active, degraded, Not Started
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : ubuntu:0 (local to host ubuntu)
UUID : ae2b72c0:60444678:25797b77:3695130a
Events : 57
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
mdadm --examine /dev/sda1
gives:
mdadm: No md superblock detected on /dev/sda1.
which makes sense, because I reformatted this partition because I believed it to be the faulty one.
mdadm --examine /dev/sdb1
gives:
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ae2b72c0:60444678:25797b77:3695130a
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f1817af9:1d964693:774d5d63:bfa69e3d
Update Time : Sun Mar 22 16:18:56 2015
Checksum : ab7c79ae - correct
Events : 57
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : .AA ('A' == active, '.' == missing)
mdadm --detail /dev/sdc1
gives:
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : ae2b72c0:60444678:25797b77:3695130a
Name : ubuntu:0 (local to host ubuntu)
Creation Time : Sun Mar 22 15:13:25 2015
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f076b568:007e3f9b:71a19ea2:474e5fe9
Update Time : Sun Mar 22 16:18:56 2015
Checksum : db25214 - correct
Events : 57
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : .AA ('A' == active, '.' == missing)
cat /proc/mdstat
:
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdb1[1] sdc1[2]
3906764800 blocks super 1.2
unused devices: <none>
fdisk -l
:
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000d84fa
Device Boot Start End Blocks Id System
/dev/sda1 2048 3907029167 1953513560 fd Linux raid autodetect
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000802d9
Device Boot Start End Blocks Id System
/dev/sdb1 * 2048 3907028991 1953513472 fd Linux raid autodetect
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000a8dca
Device Boot Start End Blocks Id System
/dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect
Disk /dev/sdd: 7756 MB, 7756087296 bytes
255 heads, 63 sectors/track, 942 cylinders, total 15148608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x128faec9
Device Boot Start End Blocks Id System
/dev/sdd1 * 2048 15148607 7573280 c W95 FAT32 (LBA)
And of course I've tried to add the /dev/sda1
again. mdadm --manage /dev/md0 --add /dev/sda1
gives:
mdadm: add new device failed for /dev/sda1 as 3: Invalid argument
If the RAID is fixed I will probably also need getting GRUB up and running again, so it can detect the RAID/LVM and boot again.
EDIT (added smartctl test results)
Output of smartctl
tests
smartctl -a /dev/sda
:
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-30-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD20EZRX-00D8PB0
Serial Number: WD-WMC4M0760056
LU WWN Device Id: 5 0014ee 003a4a444
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Mar 24 22:07:08 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (26280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 266) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3401
3 Spin_Up_Time 0x0027 172 172 021 Pre-fail Always - 4375
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 59
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9697
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 51
193 Load_Cycle_Count 0x0032 115 115 000 Old_age Always - 255276
194 Temperature_Celsius 0x0022 119 106 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 12
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
raid software-raid raid5
raid software-raid raid5
edited Nov 17 at 20:32
Rui F Ribeiro
38.2k1475123
38.2k1475123
asked Mar 22 '15 at 17:22
jlmr
334
334
migrated from serverfault.com Mar 22 '15 at 19:29
This question came from our site for system and network administrators.
migrated from serverfault.com Mar 22 '15 at 19:29
This question came from our site for system and network administrators.
1
I wouldn't use RAID5 with 3 drives. Linux md RAID10 can stripe 2 copies of your data across 3 drives, so you get RAID0 read performance (with the "far" layout). Having 2 copies of every block means your redundancy overhead is 50%, instead of 33% for RAID5, though. You'd have 3TB of usable space instead of 4. I think btrfs built-in redundancy mode is supposed to be mostly stable raid1 mode, if you want to risk that. Not posting an answer since roaima already gave the correct one. You might need a kernel command-line option to get your initrd to start your array even though it's degraded.
– Peter Cordes
Mar 25 '15 at 7:55
add a comment |
1
I wouldn't use RAID5 with 3 drives. Linux md RAID10 can stripe 2 copies of your data across 3 drives, so you get RAID0 read performance (with the "far" layout). Having 2 copies of every block means your redundancy overhead is 50%, instead of 33% for RAID5, though. You'd have 3TB of usable space instead of 4. I think btrfs built-in redundancy mode is supposed to be mostly stable raid1 mode, if you want to risk that. Not posting an answer since roaima already gave the correct one. You might need a kernel command-line option to get your initrd to start your array even though it's degraded.
– Peter Cordes
Mar 25 '15 at 7:55
1
1
I wouldn't use RAID5 with 3 drives. Linux md RAID10 can stripe 2 copies of your data across 3 drives, so you get RAID0 read performance (with the "far" layout). Having 2 copies of every block means your redundancy overhead is 50%, instead of 33% for RAID5, though. You'd have 3TB of usable space instead of 4. I think btrfs built-in redundancy mode is supposed to be mostly stable raid1 mode, if you want to risk that. Not posting an answer since roaima already gave the correct one. You might need a kernel command-line option to get your initrd to start your array even though it's degraded.
– Peter Cordes
Mar 25 '15 at 7:55
I wouldn't use RAID5 with 3 drives. Linux md RAID10 can stripe 2 copies of your data across 3 drives, so you get RAID0 read performance (with the "far" layout). Having 2 copies of every block means your redundancy overhead is 50%, instead of 33% for RAID5, though. You'd have 3TB of usable space instead of 4. I think btrfs built-in redundancy mode is supposed to be mostly stable raid1 mode, if you want to risk that. Not posting an answer since roaima already gave the correct one. You might need a kernel command-line option to get your initrd to start your array even though it's degraded.
– Peter Cordes
Mar 25 '15 at 7:55
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
You're missing one of the three drives of the /dev/md0
RAID5 array. Therefore, mdadm
will assemble the array but not run it.
-R
,--run
Attempt to start the array even if fewer drives were given than
were present last time the array was active. Normally if not
all the expected drives are found and--scan
is not used, then the
array will be assembled but not started. With--run
an attempt
will be made to start it anyway.
So, all you should need to do is mdadm --run /dev/md0
. If you're cautious you can try mdadm --run --readonly /dev/md0
and follow that by mount -o ro,norecover /dev/md0 /mnt
to check it looks ok. (The converse of --readonly
is of course, --readwrite
.)
Once it's running you can add back a new disk.
I wouldn't recommend adding your existing disk because it's getting SMART disk errors as evidenced by this recent report from your test
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057
However, if you really want to try and re-add your existing disk it's probably a very good idea to --zero-superblock
on that disk first. But I'd still recommend replacing it.
Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding/dev/sda1
gives the same error as when it is not running. I already tried--zero-superblock
on it as well, but it didn't change anything. Is there something to theas 3
part to the error messagemdadm: add new device failed for /dev/sda1 as 3: Invalid argument
? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?
– jlmr
Mar 23 '15 at 6:47
@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using/dev/loop1,2,3
on top of small (100MB) files rather than three physical disks. At this point I would be inclined to usesmartctl -t long /dev/sda
, followed later bysmartctl -a /dev/sda
to see if you've real disk errors.
– roaima
Mar 24 '15 at 0:00
I added thesmartctl
test results in the original question.
– jlmr
Mar 25 '15 at 7:14
@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58
@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29
|
show 1 more comment
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
You're missing one of the three drives of the /dev/md0
RAID5 array. Therefore, mdadm
will assemble the array but not run it.
-R
,--run
Attempt to start the array even if fewer drives were given than
were present last time the array was active. Normally if not
all the expected drives are found and--scan
is not used, then the
array will be assembled but not started. With--run
an attempt
will be made to start it anyway.
So, all you should need to do is mdadm --run /dev/md0
. If you're cautious you can try mdadm --run --readonly /dev/md0
and follow that by mount -o ro,norecover /dev/md0 /mnt
to check it looks ok. (The converse of --readonly
is of course, --readwrite
.)
Once it's running you can add back a new disk.
I wouldn't recommend adding your existing disk because it's getting SMART disk errors as evidenced by this recent report from your test
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057
However, if you really want to try and re-add your existing disk it's probably a very good idea to --zero-superblock
on that disk first. But I'd still recommend replacing it.
Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding/dev/sda1
gives the same error as when it is not running. I already tried--zero-superblock
on it as well, but it didn't change anything. Is there something to theas 3
part to the error messagemdadm: add new device failed for /dev/sda1 as 3: Invalid argument
? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?
– jlmr
Mar 23 '15 at 6:47
@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using/dev/loop1,2,3
on top of small (100MB) files rather than three physical disks. At this point I would be inclined to usesmartctl -t long /dev/sda
, followed later bysmartctl -a /dev/sda
to see if you've real disk errors.
– roaima
Mar 24 '15 at 0:00
I added thesmartctl
test results in the original question.
– jlmr
Mar 25 '15 at 7:14
@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58
@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29
|
show 1 more comment
up vote
2
down vote
accepted
You're missing one of the three drives of the /dev/md0
RAID5 array. Therefore, mdadm
will assemble the array but not run it.
-R
,--run
Attempt to start the array even if fewer drives were given than
were present last time the array was active. Normally if not
all the expected drives are found and--scan
is not used, then the
array will be assembled but not started. With--run
an attempt
will be made to start it anyway.
So, all you should need to do is mdadm --run /dev/md0
. If you're cautious you can try mdadm --run --readonly /dev/md0
and follow that by mount -o ro,norecover /dev/md0 /mnt
to check it looks ok. (The converse of --readonly
is of course, --readwrite
.)
Once it's running you can add back a new disk.
I wouldn't recommend adding your existing disk because it's getting SMART disk errors as evidenced by this recent report from your test
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057
However, if you really want to try and re-add your existing disk it's probably a very good idea to --zero-superblock
on that disk first. But I'd still recommend replacing it.
Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding/dev/sda1
gives the same error as when it is not running. I already tried--zero-superblock
on it as well, but it didn't change anything. Is there something to theas 3
part to the error messagemdadm: add new device failed for /dev/sda1 as 3: Invalid argument
? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?
– jlmr
Mar 23 '15 at 6:47
@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using/dev/loop1,2,3
on top of small (100MB) files rather than three physical disks. At this point I would be inclined to usesmartctl -t long /dev/sda
, followed later bysmartctl -a /dev/sda
to see if you've real disk errors.
– roaima
Mar 24 '15 at 0:00
I added thesmartctl
test results in the original question.
– jlmr
Mar 25 '15 at 7:14
@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58
@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29
|
show 1 more comment
up vote
2
down vote
accepted
up vote
2
down vote
accepted
You're missing one of the three drives of the /dev/md0
RAID5 array. Therefore, mdadm
will assemble the array but not run it.
-R
,--run
Attempt to start the array even if fewer drives were given than
were present last time the array was active. Normally if not
all the expected drives are found and--scan
is not used, then the
array will be assembled but not started. With--run
an attempt
will be made to start it anyway.
So, all you should need to do is mdadm --run /dev/md0
. If you're cautious you can try mdadm --run --readonly /dev/md0
and follow that by mount -o ro,norecover /dev/md0 /mnt
to check it looks ok. (The converse of --readonly
is of course, --readwrite
.)
Once it's running you can add back a new disk.
I wouldn't recommend adding your existing disk because it's getting SMART disk errors as evidenced by this recent report from your test
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057
However, if you really want to try and re-add your existing disk it's probably a very good idea to --zero-superblock
on that disk first. But I'd still recommend replacing it.
You're missing one of the three drives of the /dev/md0
RAID5 array. Therefore, mdadm
will assemble the array but not run it.
-R
,--run
Attempt to start the array even if fewer drives were given than
were present last time the array was active. Normally if not
all the expected drives are found and--scan
is not used, then the
array will be assembled but not started. With--run
an attempt
will be made to start it anyway.
So, all you should need to do is mdadm --run /dev/md0
. If you're cautious you can try mdadm --run --readonly /dev/md0
and follow that by mount -o ro,norecover /dev/md0 /mnt
to check it looks ok. (The converse of --readonly
is of course, --readwrite
.)
Once it's running you can add back a new disk.
I wouldn't recommend adding your existing disk because it's getting SMART disk errors as evidenced by this recent report from your test
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057
However, if you really want to try and re-add your existing disk it's probably a very good idea to --zero-superblock
on that disk first. But I'd still recommend replacing it.
edited Mar 25 '15 at 10:53
answered Mar 22 '15 at 20:51
roaima
42k550115
42k550115
Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding/dev/sda1
gives the same error as when it is not running. I already tried--zero-superblock
on it as well, but it didn't change anything. Is there something to theas 3
part to the error messagemdadm: add new device failed for /dev/sda1 as 3: Invalid argument
? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?
– jlmr
Mar 23 '15 at 6:47
@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using/dev/loop1,2,3
on top of small (100MB) files rather than three physical disks. At this point I would be inclined to usesmartctl -t long /dev/sda
, followed later bysmartctl -a /dev/sda
to see if you've real disk errors.
– roaima
Mar 24 '15 at 0:00
I added thesmartctl
test results in the original question.
– jlmr
Mar 25 '15 at 7:14
@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58
@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29
|
show 1 more comment
Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding/dev/sda1
gives the same error as when it is not running. I already tried--zero-superblock
on it as well, but it didn't change anything. Is there something to theas 3
part to the error messagemdadm: add new device failed for /dev/sda1 as 3: Invalid argument
? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?
– jlmr
Mar 23 '15 at 6:47
@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using/dev/loop1,2,3
on top of small (100MB) files rather than three physical disks. At this point I would be inclined to usesmartctl -t long /dev/sda
, followed later bysmartctl -a /dev/sda
to see if you've real disk errors.
– roaima
Mar 24 '15 at 0:00
I added thesmartctl
test results in the original question.
– jlmr
Mar 25 '15 at 7:14
@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58
@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29
Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding
/dev/sda1
gives the same error as when it is not running. I already tried --zero-superblock
on it as well, but it didn't change anything. Is there something to the as 3
part to the error message mdadm: add new device failed for /dev/sda1 as 3: Invalid argument
? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?– jlmr
Mar 23 '15 at 6:47
Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding
/dev/sda1
gives the same error as when it is not running. I already tried --zero-superblock
on it as well, but it didn't change anything. Is there something to the as 3
part to the error message mdadm: add new device failed for /dev/sda1 as 3: Invalid argument
? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?– jlmr
Mar 23 '15 at 6:47
@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using
/dev/loop1,2,3
on top of small (100MB) files rather than three physical disks. At this point I would be inclined to use smartctl -t long /dev/sda
, followed later by smartctl -a /dev/sda
to see if you've real disk errors.– roaima
Mar 24 '15 at 0:00
@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using
/dev/loop1,2,3
on top of small (100MB) files rather than three physical disks. At this point I would be inclined to use smartctl -t long /dev/sda
, followed later by smartctl -a /dev/sda
to see if you've real disk errors.– roaima
Mar 24 '15 at 0:00
I added the
smartctl
test results in the original question.– jlmr
Mar 25 '15 at 7:14
I added the
smartctl
test results in the original question.– jlmr
Mar 25 '15 at 7:14
@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58
@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58
@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29
@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29
|
show 1 more comment
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f191857%2frepairing-a-raid5-array%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I wouldn't use RAID5 with 3 drives. Linux md RAID10 can stripe 2 copies of your data across 3 drives, so you get RAID0 read performance (with the "far" layout). Having 2 copies of every block means your redundancy overhead is 50%, instead of 33% for RAID5, though. You'd have 3TB of usable space instead of 4. I think btrfs built-in redundancy mode is supposed to be mostly stable raid1 mode, if you want to risk that. Not posting an answer since roaima already gave the correct one. You might need a kernel command-line option to get your initrd to start your array even though it's degraded.
– Peter Cordes
Mar 25 '15 at 7:55