Repairing a RAID5 array

up vote
1
down vote

favorite

I'm trying to repair a RAID5 array, consisting of 3 2TB disks. After working perfectly for quite some time, the computer (running Debian) suddenly wouldn't boot anymore and got stuck at a GRUB prompt. I'm pretty sure it has to do with the RAID array.

Since it is difficult to give a full account of everything tried already, I will try to describe the current status.

mdadm --detail /dev/md0 outputs:

/dev/md0:
 Version : 1.2
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
 Raid Devices : 3
 Total Devices : 2
 Persistence : Superblock is persistent

 Update Time : Sun Mar 22 16:18:56 2015
 State : active, degraded, Not Started 
 Active Devices : 2
 Working Devices : 2
 Failed Devices : 0
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : ubuntu:0 (local to host ubuntu)
 UUID : ae2b72c0:60444678:25797b77:3695130a
 Events : 57

Number Major Minor RaidDevice State
 0 0 0 0 removed
 1 8 17 1 active sync /dev/sdb1
 2 8 33 2 active sync /dev/sdc1

mdadm --examine /dev/sda1 gives:

 mdadm: No md superblock detected on /dev/sda1.

which makes sense, because I reformatted this partition because I believed it to be the faulty one.

mdadm --examine /dev/sdb1 gives:

 /dev/sdb1:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : ae2b72c0:60444678:25797b77:3695130a
 Name : ubuntu:0 (local to host ubuntu)
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
 Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
 Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 State : clean
 Device UUID : f1817af9:1d964693:774d5d63:bfa69e3d

 Update Time : Sun Mar 22 16:18:56 2015
 Checksum : ab7c79ae - correct
 Events : 57

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 1
 Array State : .AA ('A' == active, '.' == missing)

mdadm --detail /dev/sdc1 gives:

 /dev/sdc1:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : ae2b72c0:60444678:25797b77:3695130a
 Name : ubuntu:0 (local to host ubuntu)
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
 Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
 Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 State : clean
 Device UUID : f076b568:007e3f9b:71a19ea2:474e5fe9

 Update Time : Sun Mar 22 16:18:56 2015
 Checksum : db25214 - correct
 Events : 57

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 2
 Array State : .AA ('A' == active, '.' == missing)

cat /proc/mdstat:

Personalities : [raid6] [raid5] [raid4] 
md0 : inactive sdb1[1] sdc1[2]
 3906764800 blocks super 1.2

unused devices: <none>

fdisk -l:

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000d84fa

Device Boot Start End Blocks Id System
/dev/sda1 2048 3907029167 1953513560 fd Linux raid autodetect

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000802d9

Device Boot Start End Blocks Id System
/dev/sdb1 * 2048 3907028991 1953513472 fd Linux raid autodetect

Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000a8dca

Device Boot Start End Blocks Id System
/dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect

Disk /dev/sdd: 7756 MB, 7756087296 bytes
255 heads, 63 sectors/track, 942 cylinders, total 15148608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x128faec9

Device Boot Start End Blocks Id System
/dev/sdd1 * 2048 15148607 7573280 c W95 FAT32 (LBA)

And of course I've tried to add the /dev/sda1 again. mdadm --manage /dev/md0 --add /dev/sda1 gives:

mdadm: add new device failed for /dev/sda1 as 3: Invalid argument

If the RAID is fixed I will probably also need getting GRUB up and running again, so it can detect the RAID/LVM and boot again.

EDIT (added smartctl test results)

Output of smartctl tests

smartctl -a /dev/sda:

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-30-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD20EZRX-00D8PB0
Serial Number: WD-WMC4M0760056
LU WWN Device Id: 5 0014ee 003a4a444
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Mar 24 22:07:08 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
 was completed without error.
 Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
 the read element of the test failed.
Total time to complete Offline 
data collection: (26280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
 Auto Offline data collection on/off support.
 Suspend Offline collection upon new
 command.
 Offline surface scan supported.
 Self-test supported.
 Conveyance Self-test supported.
 Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
 power-saving mode.
 Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
 General Purpose Logging supported.
Short self-test routine 
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 266) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
 SCT Feature Control supported.
 SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3401
 3 Spin_Up_Time 0x0027 172 172 021 Pre-fail Always - 4375
 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 59
 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
 9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9697
 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 51
193 Load_Cycle_Count 0x0032 115 115 000 Old_age Always - 255276
194 Temperature_Celsius 0x0022 119 106 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 12
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
 1 0 0 Not_testing
 2 0 0 Not_testing
 3 0 0 Not_testing
 4 0 0 Not_testing
 5 0 0 Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

edited Nov 17 at 20:32

Rui F Ribeiro

38.2k1475123

asked Mar 22 '15 at 17:22

jlmr

334

migrated from serverfault.com Mar 22 '15 at 19:29

This question came from our site for system and network administrators.

1

I wouldn't use RAID5 with 3 drives. Linux md RAID10 can stripe 2 copies of your data across 3 drives, so you get RAID0 read performance (with the "far" layout). Having 2 copies of every block means your redundancy overhead is 50%, instead of 33% for RAID5, though. You'd have 3TB of usable space instead of 4. I think btrfs built-in redundancy mode is supposed to be mostly stable raid1 mode, if you want to risk that. Not posting an answer since roaima already gave the correct one. You might need a kernel command-line option to get your initrd to start your array even though it's degraded.
– Peter Cordes
Mar 25 '15 at 7:55

add a comment |

up vote
1
down vote

favorite

Since it is difficult to give a full account of everything tried already, I will try to describe the current status.

mdadm --detail /dev/md0 outputs:

/dev/md0:
 Version : 1.2
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
 Raid Devices : 3
 Total Devices : 2
 Persistence : Superblock is persistent

 Update Time : Sun Mar 22 16:18:56 2015
 State : active, degraded, Not Started 
 Active Devices : 2
 Working Devices : 2
 Failed Devices : 0
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : ubuntu:0 (local to host ubuntu)
 UUID : ae2b72c0:60444678:25797b77:3695130a
 Events : 57

Number Major Minor RaidDevice State
 0 0 0 0 removed
 1 8 17 1 active sync /dev/sdb1
 2 8 33 2 active sync /dev/sdc1

mdadm --examine /dev/sda1 gives:

 mdadm: No md superblock detected on /dev/sda1.

which makes sense, because I reformatted this partition because I believed it to be the faulty one.

mdadm --examine /dev/sdb1 gives:

 /dev/sdb1:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : ae2b72c0:60444678:25797b77:3695130a
 Name : ubuntu:0 (local to host ubuntu)
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
 Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
 Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 State : clean
 Device UUID : f1817af9:1d964693:774d5d63:bfa69e3d

 Update Time : Sun Mar 22 16:18:56 2015
 Checksum : ab7c79ae - correct
 Events : 57

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 1
 Array State : .AA ('A' == active, '.' == missing)

mdadm --detail /dev/sdc1 gives:

 /dev/sdc1:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : ae2b72c0:60444678:25797b77:3695130a
 Name : ubuntu:0 (local to host ubuntu)
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
 Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
 Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 State : clean
 Device UUID : f076b568:007e3f9b:71a19ea2:474e5fe9

 Update Time : Sun Mar 22 16:18:56 2015
 Checksum : db25214 - correct
 Events : 57

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 2
 Array State : .AA ('A' == active, '.' == missing)

cat /proc/mdstat:

Personalities : [raid6] [raid5] [raid4] 
md0 : inactive sdb1[1] sdc1[2]
 3906764800 blocks super 1.2

unused devices: <none>

fdisk -l:

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000d84fa

Device Boot Start End Blocks Id System
/dev/sda1 2048 3907029167 1953513560 fd Linux raid autodetect

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000802d9

Device Boot Start End Blocks Id System
/dev/sdb1 * 2048 3907028991 1953513472 fd Linux raid autodetect

Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000a8dca

Device Boot Start End Blocks Id System
/dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect

Disk /dev/sdd: 7756 MB, 7756087296 bytes
255 heads, 63 sectors/track, 942 cylinders, total 15148608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x128faec9

Device Boot Start End Blocks Id System
/dev/sdd1 * 2048 15148607 7573280 c W95 FAT32 (LBA)

And of course I've tried to add the /dev/sda1 again. mdadm --manage /dev/md0 --add /dev/sda1 gives:

mdadm: add new device failed for /dev/sda1 as 3: Invalid argument

If the RAID is fixed I will probably also need getting GRUB up and running again, so it can detect the RAID/LVM and boot again.

EDIT (added smartctl test results)

Output of smartctl tests

smartctl -a /dev/sda:

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-30-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD20EZRX-00D8PB0
Serial Number: WD-WMC4M0760056
LU WWN Device Id: 5 0014ee 003a4a444
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Mar 24 22:07:08 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
 was completed without error.
 Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
 the read element of the test failed.
Total time to complete Offline 
data collection: (26280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
 Auto Offline data collection on/off support.
 Suspend Offline collection upon new
 command.
 Offline surface scan supported.
 Self-test supported.
 Conveyance Self-test supported.
 Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
 power-saving mode.
 Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
 General Purpose Logging supported.
Short self-test routine 
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 266) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
 SCT Feature Control supported.
 SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3401
 3 Spin_Up_Time 0x0027 172 172 021 Pre-fail Always - 4375
 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 59
 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
 9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9697
 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 51
193 Load_Cycle_Count 0x0032 115 115 000 Old_age Always - 255276
194 Temperature_Celsius 0x0022 119 106 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 12
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
 1 0 0 Not_testing
 2 0 0 Not_testing
 3 0 0 Not_testing
 4 0 0 Not_testing
 5 0 0 Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

edited Nov 17 at 20:32

Rui F Ribeiro

38.2k1475123

asked Mar 22 '15 at 17:22

jlmr

334

migrated from serverfault.com Mar 22 '15 at 19:29

This question came from our site for system and network administrators.

1

I wouldn't use RAID5 with 3 drives. Linux md RAID10 can stripe 2 copies of your data across 3 drives, so you get RAID0 read performance (with the "far" layout). Having 2 copies of every block means your redundancy overhead is 50%, instead of 33% for RAID5, though. You'd have 3TB of usable space instead of 4. I think btrfs built-in redundancy mode is supposed to be mostly stable raid1 mode, if you want to risk that. Not posting an answer since roaima already gave the correct one. You might need a kernel command-line option to get your initrd to start your array even though it's degraded.
– Peter Cordes
Mar 25 '15 at 7:55

add a comment |

up vote
1
down vote

favorite

Since it is difficult to give a full account of everything tried already, I will try to describe the current status.

mdadm --detail /dev/md0 outputs:

/dev/md0:
 Version : 1.2
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
 Raid Devices : 3
 Total Devices : 2
 Persistence : Superblock is persistent

 Update Time : Sun Mar 22 16:18:56 2015
 State : active, degraded, Not Started 
 Active Devices : 2
 Working Devices : 2
 Failed Devices : 0
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : ubuntu:0 (local to host ubuntu)
 UUID : ae2b72c0:60444678:25797b77:3695130a
 Events : 57

Number Major Minor RaidDevice State
 0 0 0 0 removed
 1 8 17 1 active sync /dev/sdb1
 2 8 33 2 active sync /dev/sdc1

mdadm --examine /dev/sda1 gives:

 mdadm: No md superblock detected on /dev/sda1.

which makes sense, because I reformatted this partition because I believed it to be the faulty one.

mdadm --examine /dev/sdb1 gives:

 /dev/sdb1:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : ae2b72c0:60444678:25797b77:3695130a
 Name : ubuntu:0 (local to host ubuntu)
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
 Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
 Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 State : clean
 Device UUID : f1817af9:1d964693:774d5d63:bfa69e3d

 Update Time : Sun Mar 22 16:18:56 2015
 Checksum : ab7c79ae - correct
 Events : 57

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 1
 Array State : .AA ('A' == active, '.' == missing)

mdadm --detail /dev/sdc1 gives:

 /dev/sdc1:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : ae2b72c0:60444678:25797b77:3695130a
 Name : ubuntu:0 (local to host ubuntu)
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
 Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
 Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 State : clean
 Device UUID : f076b568:007e3f9b:71a19ea2:474e5fe9

 Update Time : Sun Mar 22 16:18:56 2015
 Checksum : db25214 - correct
 Events : 57

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 2
 Array State : .AA ('A' == active, '.' == missing)

cat /proc/mdstat:

Personalities : [raid6] [raid5] [raid4] 
md0 : inactive sdb1[1] sdc1[2]
 3906764800 blocks super 1.2

unused devices: <none>

fdisk -l:

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000d84fa

Device Boot Start End Blocks Id System
/dev/sda1 2048 3907029167 1953513560 fd Linux raid autodetect

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000802d9

Device Boot Start End Blocks Id System
/dev/sdb1 * 2048 3907028991 1953513472 fd Linux raid autodetect

Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000a8dca

Device Boot Start End Blocks Id System
/dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect

Disk /dev/sdd: 7756 MB, 7756087296 bytes
255 heads, 63 sectors/track, 942 cylinders, total 15148608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x128faec9

Device Boot Start End Blocks Id System
/dev/sdd1 * 2048 15148607 7573280 c W95 FAT32 (LBA)

And of course I've tried to add the /dev/sda1 again. mdadm --manage /dev/md0 --add /dev/sda1 gives:

mdadm: add new device failed for /dev/sda1 as 3: Invalid argument

If the RAID is fixed I will probably also need getting GRUB up and running again, so it can detect the RAID/LVM and boot again.

EDIT (added smartctl test results)

Output of smartctl tests

smartctl -a /dev/sda:

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-30-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD20EZRX-00D8PB0
Serial Number: WD-WMC4M0760056
LU WWN Device Id: 5 0014ee 003a4a444
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Mar 24 22:07:08 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
 was completed without error.
 Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
 the read element of the test failed.
Total time to complete Offline 
data collection: (26280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
 Auto Offline data collection on/off support.
 Suspend Offline collection upon new
 command.
 Offline surface scan supported.
 Self-test supported.
 Conveyance Self-test supported.
 Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
 power-saving mode.
 Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
 General Purpose Logging supported.
Short self-test routine 
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 266) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
 SCT Feature Control supported.
 SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3401
 3 Spin_Up_Time 0x0027 172 172 021 Pre-fail Always - 4375
 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 59
 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
 9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9697
 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 51
193 Load_Cycle_Count 0x0032 115 115 000 Old_age Always - 255276
194 Temperature_Celsius 0x0022 119 106 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 12
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
 1 0 0 Not_testing
 2 0 0 Not_testing
 3 0 0 Not_testing
 4 0 0 Not_testing
 5 0 0 Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

edited Nov 17 at 20:32

Rui F Ribeiro

38.2k1475123

asked Mar 22 '15 at 17:22

jlmr

334

Since it is difficult to give a full account of everything tried already, I will try to describe the current status.

mdadm --detail /dev/md0 outputs:

/dev/md0:
 Version : 1.2
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
 Raid Devices : 3
 Total Devices : 2
 Persistence : Superblock is persistent

 Update Time : Sun Mar 22 16:18:56 2015
 State : active, degraded, Not Started 
 Active Devices : 2
 Working Devices : 2
 Failed Devices : 0
 Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 512K

 Name : ubuntu:0 (local to host ubuntu)
 UUID : ae2b72c0:60444678:25797b77:3695130a
 Events : 57

Number Major Minor RaidDevice State
 0 0 0 0 removed
 1 8 17 1 active sync /dev/sdb1
 2 8 33 2 active sync /dev/sdc1

mdadm --examine /dev/sda1 gives:

 mdadm: No md superblock detected on /dev/sda1.

which makes sense, because I reformatted this partition because I believed it to be the faulty one.

mdadm --examine /dev/sdb1 gives:

 /dev/sdb1:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : ae2b72c0:60444678:25797b77:3695130a
 Name : ubuntu:0 (local to host ubuntu)
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
 Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
 Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 State : clean
 Device UUID : f1817af9:1d964693:774d5d63:bfa69e3d

 Update Time : Sun Mar 22 16:18:56 2015
 Checksum : ab7c79ae - correct
 Events : 57

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 1
 Array State : .AA ('A' == active, '.' == missing)

mdadm --detail /dev/sdc1 gives:

 /dev/sdc1:
 Magic : a92b4efc
 Version : 1.2
 Feature Map : 0x0
 Array UUID : ae2b72c0:60444678:25797b77:3695130a
 Name : ubuntu:0 (local to host ubuntu)
 Creation Time : Sun Mar 22 15:13:25 2015
 Raid Level : raid5
 Raid Devices : 3

 Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
 Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
 Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
 Data Offset : 262144 sectors
 Super Offset : 8 sectors
 State : clean
 Device UUID : f076b568:007e3f9b:71a19ea2:474e5fe9

 Update Time : Sun Mar 22 16:18:56 2015
 Checksum : db25214 - correct
 Events : 57

 Layout : left-symmetric
 Chunk Size : 512K

 Device Role : Active device 2
 Array State : .AA ('A' == active, '.' == missing)

cat /proc/mdstat:

Personalities : [raid6] [raid5] [raid4] 
md0 : inactive sdb1[1] sdc1[2]
 3906764800 blocks super 1.2

unused devices: <none>

fdisk -l:

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000d84fa

Device Boot Start End Blocks Id System
/dev/sda1 2048 3907029167 1953513560 fd Linux raid autodetect

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000802d9

Device Boot Start End Blocks Id System
/dev/sdb1 * 2048 3907028991 1953513472 fd Linux raid autodetect

Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x000a8dca

Device Boot Start End Blocks Id System
/dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect

Disk /dev/sdd: 7756 MB, 7756087296 bytes
255 heads, 63 sectors/track, 942 cylinders, total 15148608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x128faec9

Device Boot Start End Blocks Id System
/dev/sdd1 * 2048 15148607 7573280 c W95 FAT32 (LBA)

And of course I've tried to add the /dev/sda1 again. mdadm --manage /dev/md0 --add /dev/sda1 gives:

mdadm: add new device failed for /dev/sda1 as 3: Invalid argument

If the RAID is fixed I will probably also need getting GRUB up and running again, so it can detect the RAID/LVM and boot again.

EDIT (added smartctl test results)

Output of smartctl tests

smartctl -a /dev/sda:

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-30-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD20EZRX-00D8PB0
Serial Number: WD-WMC4M0760056
LU WWN Device Id: 5 0014ee 003a4a444
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Mar 24 22:07:08 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
 was completed without error.
 Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
 the read element of the test failed.
Total time to complete Offline 
data collection: (26280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
 Auto Offline data collection on/off support.
 Suspend Offline collection upon new
 command.
 Offline surface scan supported.
 Self-test supported.
 Conveyance Self-test supported.
 Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
 power-saving mode.
 Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
 General Purpose Logging supported.
Short self-test routine 
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 266) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
 SCT Feature Control supported.
 SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3401
 3 Spin_Up_Time 0x0027 172 172 021 Pre-fail Always - 4375
 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 59
 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
 9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9697
 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 51
193 Load_Cycle_Count 0x0032 115 115 000 Old_age Always - 255276
194 Temperature_Celsius 0x0022 119 106 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 12
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
 1 0 0 Not_testing
 2 0 0 Not_testing
 3 0 0 Not_testing
 4 0 0 Not_testing
 5 0 0 Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

raid software-raid raid5

edited Nov 17 at 20:32

Rui F Ribeiro

38.2k1475123

asked Mar 22 '15 at 17:22

jlmr

334

edited Nov 17 at 20:32

Rui F Ribeiro

38.2k1475123

asked Mar 22 '15 at 17:22

jlmr

334

edited Nov 17 at 20:32

Rui F Ribeiro

38.2k1475123

edited Nov 17 at 20:32

Rui F Ribeiro

38.2k1475123

edited Nov 17 at 20:32

Rui F Ribeiro

38.2k1475123

asked Mar 22 '15 at 17:22

jlmr

334

asked Mar 22 '15 at 17:22

jlmr

334

asked Mar 22 '15 at 17:22

jlmr

334

migrated from serverfault.com Mar 22 '15 at 19:29

This question came from our site for system and network administrators.

migrated from serverfault.com Mar 22 '15 at 19:29

This question came from our site for system and network administrators.

1

I wouldn't use RAID5 with 3 drives. Linux md RAID10 can stripe 2 copies of your data across 3 drives, so you get RAID0 read performance (with the "far" layout). Having 2 copies of every block means your redundancy overhead is 50%, instead of 33% for RAID5, though. You'd have 3TB of usable space instead of 4. I think btrfs built-in redundancy mode is supposed to be mostly stable raid1 mode, if you want to risk that. Not posting an answer since roaima already gave the correct one. You might need a kernel command-line option to get your initrd to start your array even though it's degraded.
– Peter Cordes
Mar 25 '15 at 7:55

add a comment |

1

I wouldn't use RAID5 with 3 drives. Linux md RAID10 can stripe 2 copies of your data across 3 drives, so you get RAID0 read performance (with the "far" layout). Having 2 copies of every block means your redundancy overhead is 50%, instead of 33% for RAID5, though. You'd have 3TB of usable space instead of 4. I think btrfs built-in redundancy mode is supposed to be mostly stable raid1 mode, if you want to risk that. Not posting an answer since roaima already gave the correct one. You might need a kernel command-line option to get your initrd to start your array even though it's degraded.
– Peter Cordes
Mar 25 '15 at 7:55

I wouldn't use RAID5 with 3 drives. Linux md RAID10 can stripe 2 copies of your data across 3 drives, so you get RAID0 read performance (with the "far" layout). Having 2 copies of every block means your redundancy overhead is 50%, instead of 33% for RAID5, though. You'd have 3TB of usable space instead of 4. I think btrfs built-in redundancy mode is supposed to be mostly stable raid1 mode, if you want to risk that. Not posting an answer since roaima already gave the correct one. You might need a kernel command-line option to get your initrd to start your array even though it's degraded.
– Peter Cordes
Mar 25 '15 at 7:55

add a comment |

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

You're missing one of the three drives of the /dev/md0 RAID5 array. Therefore, mdadm will assemble the array but not run it.

-R, --run
Attempt to start the array even if fewer drives were given than
were present last time the array was active. Normally if not
all the expected drives are found and --scan is not used, then the
array will be assembled but not started. With --run an attempt
will be made to start it anyway.

So, all you should need to do is mdadm --run /dev/md0. If you're cautious you can try mdadm --run --readonly /dev/md0 and follow that by mount -o ro,norecover /dev/md0 /mnt to check it looks ok. (The converse of --readonly is of course, --readwrite.)

Once it's running you can add back a new disk.

I wouldn't recommend adding your existing disk because it's getting SMART disk errors as evidenced by this recent report from your test

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057

However, if you really want to try and re-add your existing disk it's probably a very good idea to --zero-superblock on that disk first. But I'd still recommend replacing it.

edited Mar 25 '15 at 10:53

answered Mar 22 '15 at 20:51

roaima

42k550115

Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding /dev/sda1 gives the same error as when it is not running. I already tried --zero-superblock on it as well, but it didn't change anything. Is there something to the as 3 part to the error message mdadm: add new device failed for /dev/sda1 as 3: Invalid argument? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?
– jlmr
Mar 23 '15 at 6:47

@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using /dev/loop1,2,3 on top of small (100MB) files rather than three physical disks. At this point I would be inclined to use smartctl -t long /dev/sda, followed later by smartctl -a /dev/sda to see if you've real disk errors.
– roaima
Mar 24 '15 at 0:00

I added the smartctl test results in the original question.
– jlmr
Mar 25 '15 at 7:14

@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58

@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29

|
show 1 more comment

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f191857%2frepairing-a-raid5-array%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

You're missing one of the three drives of the /dev/md0 RAID5 array. Therefore, mdadm will assemble the array but not run it.

-R, --run
Attempt to start the array even if fewer drives were given than
were present last time the array was active. Normally if not
all the expected drives are found and --scan is not used, then the
array will be assembled but not started. With --run an attempt
will be made to start it anyway.

Once it's running you can add back a new disk.

I wouldn't recommend adding your existing disk because it's getting SMART disk errors as evidenced by this recent report from your test

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057

However, if you really want to try and re-add your existing disk it's probably a very good idea to --zero-superblock on that disk first. But I'd still recommend replacing it.

edited Mar 25 '15 at 10:53

answered Mar 22 '15 at 20:51

roaima

42k550115

Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding /dev/sda1 gives the same error as when it is not running. I already tried --zero-superblock on it as well, but it didn't change anything. Is there something to the as 3 part to the error message mdadm: add new device failed for /dev/sda1 as 3: Invalid argument? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?
– jlmr
Mar 23 '15 at 6:47

@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using /dev/loop1,2,3 on top of small (100MB) files rather than three physical disks. At this point I would be inclined to use smartctl -t long /dev/sda, followed later by smartctl -a /dev/sda to see if you've real disk errors.
– roaima
Mar 24 '15 at 0:00

I added the smartctl test results in the original question.
– jlmr
Mar 25 '15 at 7:14

@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58

@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29

|
show 1 more comment

up vote
2
down vote

accepted

You're missing one of the three drives of the /dev/md0 RAID5 array. Therefore, mdadm will assemble the array but not run it.

-R, --run
Attempt to start the array even if fewer drives were given than
were present last time the array was active. Normally if not
all the expected drives are found and --scan is not used, then the
array will be assembled but not started. With --run an attempt
will be made to start it anyway.

Once it's running you can add back a new disk.

I wouldn't recommend adding your existing disk because it's getting SMART disk errors as evidenced by this recent report from your test

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057

However, if you really want to try and re-add your existing disk it's probably a very good idea to --zero-superblock on that disk first. But I'd still recommend replacing it.

edited Mar 25 '15 at 10:53

answered Mar 22 '15 at 20:51

roaima

42k550115

Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding /dev/sda1 gives the same error as when it is not running. I already tried --zero-superblock on it as well, but it didn't change anything. Is there something to the as 3 part to the error message mdadm: add new device failed for /dev/sda1 as 3: Invalid argument? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?
– jlmr
Mar 23 '15 at 6:47

@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using /dev/loop1,2,3 on top of small (100MB) files rather than three physical disks. At this point I would be inclined to use smartctl -t long /dev/sda, followed later by smartctl -a /dev/sda to see if you've real disk errors.
– roaima
Mar 24 '15 at 0:00

I added the smartctl test results in the original question.
– jlmr
Mar 25 '15 at 7:14

@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58

@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29

|
show 1 more comment

up vote
2
down vote

accepted

You're missing one of the three drives of the /dev/md0 RAID5 array. Therefore, mdadm will assemble the array but not run it.

-R, --run
Attempt to start the array even if fewer drives were given than
were present last time the array was active. Normally if not
all the expected drives are found and --scan is not used, then the
array will be assembled but not started. With --run an attempt
will be made to start it anyway.

Once it's running you can add back a new disk.

I wouldn't recommend adding your existing disk because it's getting SMART disk errors as evidenced by this recent report from your test

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057

However, if you really want to try and re-add your existing disk it's probably a very good idea to --zero-superblock on that disk first. But I'd still recommend replacing it.

edited Mar 25 '15 at 10:53

answered Mar 22 '15 at 20:51

roaima

42k550115

You're missing one of the three drives of the /dev/md0 RAID5 array. Therefore, mdadm will assemble the array but not run it.

-R, --run
Attempt to start the array even if fewer drives were given than
were present last time the array was active. Normally if not
all the expected drives are found and --scan is not used, then the
array will be assembled but not started. With --run an attempt
will be made to start it anyway.

Once it's running you can add back a new disk.

I wouldn't recommend adding your existing disk because it's getting SMART disk errors as evidenced by this recent report from your test

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9692 2057

However, if you really want to try and re-add your existing disk it's probably a very good idea to --zero-superblock on that disk first. But I'd still recommend replacing it.

edited Mar 25 '15 at 10:53

answered Mar 22 '15 at 20:51

roaima

42k550115

edited Mar 25 '15 at 10:53

answered Mar 22 '15 at 20:51

roaima

42k550115

answered Mar 22 '15 at 20:51

roaima

42k550115

answered Mar 22 '15 at 20:51

roaima

42k550115

Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding /dev/sda1 gives the same error as when it is not running. I already tried --zero-superblock on it as well, but it didn't change anything. Is there something to the as 3 part to the error message mdadm: add new device failed for /dev/sda1 as 3: Invalid argument? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?
– jlmr
Mar 23 '15 at 6:47

@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using /dev/loop1,2,3 on top of small (100MB) files rather than three physical disks. At this point I would be inclined to use smartctl -t long /dev/sda, followed later by smartctl -a /dev/sda to see if you've real disk errors.
– roaima
Mar 24 '15 at 0:00

I added the smartctl test results in the original question.
– jlmr
Mar 25 '15 at 7:14

@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58

@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29

|
show 1 more comment

Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding /dev/sda1 gives the same error as when it is not running. I already tried --zero-superblock on it as well, but it didn't change anything. Is there something to the as 3 part to the error message mdadm: add new device failed for /dev/sda1 as 3: Invalid argument? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?
– jlmr
Mar 23 '15 at 6:47

@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using /dev/loop1,2,3 on top of small (100MB) files rather than three physical disks. At this point I would be inclined to use smartctl -t long /dev/sda, followed later by smartctl -a /dev/sda to see if you've real disk errors.
– roaima
Mar 24 '15 at 0:00

I added the smartctl test results in the original question.
– jlmr
Mar 25 '15 at 7:14

@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58

@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29

Thanks for your answer, the array will indeed run fine. The LiveCD I used mounts the filesystems and you can see all te files again. However, adding /dev/sda1 gives the same error as when it is not running. I already tried --zero-superblock on it as well, but it didn't change anything. Is there something to the as 3 part to the error message mdadm: add new device failed for /dev/sda1 as 3: Invalid argument? Could it be that it tries to add the drive as a fourth (0,1,2,3) drive?
– jlmr
Mar 23 '15 at 6:47

@jimr I can't reproduce the "invalid argument" error here, but TBH I'm using /dev/loop1,2,3 on top of small (100MB) files rather than three physical disks. At this point I would be inclined to use smartctl -t long /dev/sda, followed later by smartctl -a /dev/sda to see if you've real disk errors.
– roaima
Mar 24 '15 at 0:00

I added the smartctl test results in the original question.
– jlmr
Mar 25 '15 at 7:14

@jlmr get a new drive, this one is broken
– frostschutz
Mar 25 '15 at 8:58

@frostschutz, could you explain what exactly is wrong with the drive? I find it hard to interpret the test-results. It would be nice to understand it, also because of the warranty. The drive is quite new, so perhaps a refund or something can be arranged.
– jlmr
Mar 26 '15 at 16:29

|
show 1 more comment

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu