Multipathd config for LSI HBA 3008
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I have 5 jbod attached via LSI-SAS3008 to controller.
I'm using Arch-Linux 4.14.41-1-lts & multipath-tools v0.7.6 (03/10,2018)
My problem is when a disk starts to give I/O error and starts to flickering Multipath trying to check the disk and remap the failed path.
Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 04:59:51 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 04:59:51 FKM1 multipathd[5315]: sdbe: mark as failed
Jul 23 04:59:56 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:04:37 FKM1 multipathd[5315]: 67:128: reinstated
Jul 23 05:04:37 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 1
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 05:05:27 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 05:05:27 FKM1 multipathd[5315]: sdbe: mark as failed
Because of the faulty disk multipath trying to remap every time when the disk shows up.
[Fri Aug 3 00:18:37 2018] alua: device handler registered
[Fri Aug 3 00:18:37 2018] emc: device handler registered
[Fri Aug 3 00:18:37 2018] rdac: device handler registered
[Fri Aug 3 00:18:37 2018] device-mapper: uevent: version 1.0.3
[Fri Aug 3 00:18:37 2018] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[Fri Aug 3 00:18:43 2018] device-mapper: multipath service-time: version 0.3.0 loaded
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#1 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#0 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:21 2018] device-mapper: multipath: Failing path 67:176.
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: [sdbh] tag#11 CDB: opcode=0x0 00 00 00 00 00 00
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:57 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b3f148)
After a while the cycle continues when MPT3SAS Driver giving up and going to reset LSI card.
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: iomem(0x00000000fbe40000), mapped(0xffffbe0e8dca0000), size(65536)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: ioport(0x000000000000e000), size(256)
[Fri Aug 3 00:18:12 2018] usb 2-1-port6: over-current condition
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending message unit reset !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: message unit reset: SUCCESS
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Allocated physical memory: size(20778 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Current Controller Queue Depth(9564),Max Controller Queue Depth(9664)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Scatter Gather Elements per IO(128)
[Fri Aug 3 00:18:12 2018] usb 3-14.1: new low-speed USB device number 3 using xhci_hcd
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: LSISAS3008: FWVersion(15.00.02.00), ChipRevision(0x02), BiosVersion(08.35.00.00)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Protocol=(
[Fri Aug 3 00:18:12 2018] Initiator
[Fri Aug 3 00:18:12 2018] ,Target
[Fri Aug 3 00:18:12 2018] ),
[Fri Aug 3 00:18:12 2018] Capabilities=(
[Fri Aug 3 00:18:12 2018] TLR
[Fri Aug 3 00:18:12 2018] ,EEDP
[Fri Aug 3 00:18:12 2018] ,Snapshot Buffer
[Fri Aug 3 00:18:12 2018] ,Diag Trace Buffer
[Fri Aug 3 00:18:12 2018] ,Task Set Full
[Fri Aug 3 00:18:12 2018] ,NCQ
[Fri Aug 3 00:18:12 2018] )
[Fri Aug 3 00:18:12 2018] scsi host13: Fusion MPT SAS Host
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending port enable !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm4: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (528262416 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: host_add: handle(0x0001), sas_addr(0x500605b00c482a80), phys(8)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: expander_add: handle(0x0009), parent(0x0001), sas_addr(0x5003048017aed57f), phys(38)
[Fri Aug 3 00:18:12 2018] scsi 13:0:0:0: Direct-Access SEAGATE ST800FM0173 0007 PQ: 0 ANSI: 6
When Mpt3sas sends "diag reset" its mean I'm losing a jbod "90 disk" in same time!
And because of this a simple faulty disk can suspend my ZFS Pool.
Now i'm looking a solution and I think If I say to multipath; "do not remap if a disk fails 3 time" then my problem will be solve because the disk will be not using by the pool and if my pool do not use the faulty disk then the disk can not give I/O error.
So with simple explanation I'm looking a way to disable usage the failed disk.
I found few settings for /etc/multipath.conf
But I'm not sure this will solve my problem or not.
Can you tell me the best solution for my problem?
defaults
user_friendly_names no
path_grouping_policy failover
polling_interval 10
path_selector "round-robin 0"
path_grouping_policy failover
path_checker readsector0
failback manual
no_path_retry 3
prio rdac
blacklist_exceptions ID_SERIAL)"
This is the full DMESG log --> https://paste.ubuntu.com/p/XZZ2CScmHP/
linux storage multipath-storage
add a comment |Â
up vote
0
down vote
favorite
I have 5 jbod attached via LSI-SAS3008 to controller.
I'm using Arch-Linux 4.14.41-1-lts & multipath-tools v0.7.6 (03/10,2018)
My problem is when a disk starts to give I/O error and starts to flickering Multipath trying to check the disk and remap the failed path.
Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 04:59:51 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 04:59:51 FKM1 multipathd[5315]: sdbe: mark as failed
Jul 23 04:59:56 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:04:37 FKM1 multipathd[5315]: 67:128: reinstated
Jul 23 05:04:37 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 1
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 05:05:27 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 05:05:27 FKM1 multipathd[5315]: sdbe: mark as failed
Because of the faulty disk multipath trying to remap every time when the disk shows up.
[Fri Aug 3 00:18:37 2018] alua: device handler registered
[Fri Aug 3 00:18:37 2018] emc: device handler registered
[Fri Aug 3 00:18:37 2018] rdac: device handler registered
[Fri Aug 3 00:18:37 2018] device-mapper: uevent: version 1.0.3
[Fri Aug 3 00:18:37 2018] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[Fri Aug 3 00:18:43 2018] device-mapper: multipath service-time: version 0.3.0 loaded
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#1 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#0 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:21 2018] device-mapper: multipath: Failing path 67:176.
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: [sdbh] tag#11 CDB: opcode=0x0 00 00 00 00 00 00
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:57 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b3f148)
After a while the cycle continues when MPT3SAS Driver giving up and going to reset LSI card.
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: iomem(0x00000000fbe40000), mapped(0xffffbe0e8dca0000), size(65536)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: ioport(0x000000000000e000), size(256)
[Fri Aug 3 00:18:12 2018] usb 2-1-port6: over-current condition
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending message unit reset !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: message unit reset: SUCCESS
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Allocated physical memory: size(20778 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Current Controller Queue Depth(9564),Max Controller Queue Depth(9664)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Scatter Gather Elements per IO(128)
[Fri Aug 3 00:18:12 2018] usb 3-14.1: new low-speed USB device number 3 using xhci_hcd
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: LSISAS3008: FWVersion(15.00.02.00), ChipRevision(0x02), BiosVersion(08.35.00.00)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Protocol=(
[Fri Aug 3 00:18:12 2018] Initiator
[Fri Aug 3 00:18:12 2018] ,Target
[Fri Aug 3 00:18:12 2018] ),
[Fri Aug 3 00:18:12 2018] Capabilities=(
[Fri Aug 3 00:18:12 2018] TLR
[Fri Aug 3 00:18:12 2018] ,EEDP
[Fri Aug 3 00:18:12 2018] ,Snapshot Buffer
[Fri Aug 3 00:18:12 2018] ,Diag Trace Buffer
[Fri Aug 3 00:18:12 2018] ,Task Set Full
[Fri Aug 3 00:18:12 2018] ,NCQ
[Fri Aug 3 00:18:12 2018] )
[Fri Aug 3 00:18:12 2018] scsi host13: Fusion MPT SAS Host
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending port enable !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm4: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (528262416 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: host_add: handle(0x0001), sas_addr(0x500605b00c482a80), phys(8)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: expander_add: handle(0x0009), parent(0x0001), sas_addr(0x5003048017aed57f), phys(38)
[Fri Aug 3 00:18:12 2018] scsi 13:0:0:0: Direct-Access SEAGATE ST800FM0173 0007 PQ: 0 ANSI: 6
When Mpt3sas sends "diag reset" its mean I'm losing a jbod "90 disk" in same time!
And because of this a simple faulty disk can suspend my ZFS Pool.
Now i'm looking a solution and I think If I say to multipath; "do not remap if a disk fails 3 time" then my problem will be solve because the disk will be not using by the pool and if my pool do not use the faulty disk then the disk can not give I/O error.
So with simple explanation I'm looking a way to disable usage the failed disk.
I found few settings for /etc/multipath.conf
But I'm not sure this will solve my problem or not.
Can you tell me the best solution for my problem?
defaults
user_friendly_names no
path_grouping_policy failover
polling_interval 10
path_selector "round-robin 0"
path_grouping_policy failover
path_checker readsector0
failback manual
no_path_retry 3
prio rdac
blacklist_exceptions ID_SERIAL)"
This is the full DMESG log --> https://paste.ubuntu.com/p/XZZ2CScmHP/
linux storage multipath-storage
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have 5 jbod attached via LSI-SAS3008 to controller.
I'm using Arch-Linux 4.14.41-1-lts & multipath-tools v0.7.6 (03/10,2018)
My problem is when a disk starts to give I/O error and starts to flickering Multipath trying to check the disk and remap the failed path.
Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 04:59:51 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 04:59:51 FKM1 multipathd[5315]: sdbe: mark as failed
Jul 23 04:59:56 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:04:37 FKM1 multipathd[5315]: 67:128: reinstated
Jul 23 05:04:37 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 1
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 05:05:27 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 05:05:27 FKM1 multipathd[5315]: sdbe: mark as failed
Because of the faulty disk multipath trying to remap every time when the disk shows up.
[Fri Aug 3 00:18:37 2018] alua: device handler registered
[Fri Aug 3 00:18:37 2018] emc: device handler registered
[Fri Aug 3 00:18:37 2018] rdac: device handler registered
[Fri Aug 3 00:18:37 2018] device-mapper: uevent: version 1.0.3
[Fri Aug 3 00:18:37 2018] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[Fri Aug 3 00:18:43 2018] device-mapper: multipath service-time: version 0.3.0 loaded
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#1 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#0 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:21 2018] device-mapper: multipath: Failing path 67:176.
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: [sdbh] tag#11 CDB: opcode=0x0 00 00 00 00 00 00
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:57 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b3f148)
After a while the cycle continues when MPT3SAS Driver giving up and going to reset LSI card.
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: iomem(0x00000000fbe40000), mapped(0xffffbe0e8dca0000), size(65536)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: ioport(0x000000000000e000), size(256)
[Fri Aug 3 00:18:12 2018] usb 2-1-port6: over-current condition
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending message unit reset !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: message unit reset: SUCCESS
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Allocated physical memory: size(20778 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Current Controller Queue Depth(9564),Max Controller Queue Depth(9664)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Scatter Gather Elements per IO(128)
[Fri Aug 3 00:18:12 2018] usb 3-14.1: new low-speed USB device number 3 using xhci_hcd
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: LSISAS3008: FWVersion(15.00.02.00), ChipRevision(0x02), BiosVersion(08.35.00.00)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Protocol=(
[Fri Aug 3 00:18:12 2018] Initiator
[Fri Aug 3 00:18:12 2018] ,Target
[Fri Aug 3 00:18:12 2018] ),
[Fri Aug 3 00:18:12 2018] Capabilities=(
[Fri Aug 3 00:18:12 2018] TLR
[Fri Aug 3 00:18:12 2018] ,EEDP
[Fri Aug 3 00:18:12 2018] ,Snapshot Buffer
[Fri Aug 3 00:18:12 2018] ,Diag Trace Buffer
[Fri Aug 3 00:18:12 2018] ,Task Set Full
[Fri Aug 3 00:18:12 2018] ,NCQ
[Fri Aug 3 00:18:12 2018] )
[Fri Aug 3 00:18:12 2018] scsi host13: Fusion MPT SAS Host
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending port enable !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm4: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (528262416 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: host_add: handle(0x0001), sas_addr(0x500605b00c482a80), phys(8)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: expander_add: handle(0x0009), parent(0x0001), sas_addr(0x5003048017aed57f), phys(38)
[Fri Aug 3 00:18:12 2018] scsi 13:0:0:0: Direct-Access SEAGATE ST800FM0173 0007 PQ: 0 ANSI: 6
When Mpt3sas sends "diag reset" its mean I'm losing a jbod "90 disk" in same time!
And because of this a simple faulty disk can suspend my ZFS Pool.
Now i'm looking a solution and I think If I say to multipath; "do not remap if a disk fails 3 time" then my problem will be solve because the disk will be not using by the pool and if my pool do not use the faulty disk then the disk can not give I/O error.
So with simple explanation I'm looking a way to disable usage the failed disk.
I found few settings for /etc/multipath.conf
But I'm not sure this will solve my problem or not.
Can you tell me the best solution for my problem?
defaults
user_friendly_names no
path_grouping_policy failover
polling_interval 10
path_selector "round-robin 0"
path_grouping_policy failover
path_checker readsector0
failback manual
no_path_retry 3
prio rdac
blacklist_exceptions ID_SERIAL)"
This is the full DMESG log --> https://paste.ubuntu.com/p/XZZ2CScmHP/
linux storage multipath-storage
I have 5 jbod attached via LSI-SAS3008 to controller.
I'm using Arch-Linux 4.14.41-1-lts & multipath-tools v0.7.6 (03/10,2018)
My problem is when a disk starts to give I/O error and starts to flickering Multipath trying to check the disk and remap the failed path.
Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 04:59:51 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 04:59:51 FKM1 multipathd[5315]: sdbe: mark as failed
Jul 23 04:59:56 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:04:37 FKM1 multipathd[5315]: 67:128: reinstated
Jul 23 05:04:37 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 1
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 05:05:27 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 05:05:27 FKM1 multipathd[5315]: sdbe: mark as failed
Because of the faulty disk multipath trying to remap every time when the disk shows up.
[Fri Aug 3 00:18:37 2018] alua: device handler registered
[Fri Aug 3 00:18:37 2018] emc: device handler registered
[Fri Aug 3 00:18:37 2018] rdac: device handler registered
[Fri Aug 3 00:18:37 2018] device-mapper: uevent: version 1.0.3
[Fri Aug 3 00:18:37 2018] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[Fri Aug 3 00:18:43 2018] device-mapper: multipath service-time: version 0.3.0 loaded
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#1 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#0 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:21 2018] device-mapper: multipath: Failing path 67:176.
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: [sdbh] tag#11 CDB: opcode=0x0 00 00 00 00 00 00
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:57 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b3f148)
After a while the cycle continues when MPT3SAS Driver giving up and going to reset LSI card.
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: iomem(0x00000000fbe40000), mapped(0xffffbe0e8dca0000), size(65536)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: ioport(0x000000000000e000), size(256)
[Fri Aug 3 00:18:12 2018] usb 2-1-port6: over-current condition
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending message unit reset !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: message unit reset: SUCCESS
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Allocated physical memory: size(20778 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Current Controller Queue Depth(9564),Max Controller Queue Depth(9664)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Scatter Gather Elements per IO(128)
[Fri Aug 3 00:18:12 2018] usb 3-14.1: new low-speed USB device number 3 using xhci_hcd
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: LSISAS3008: FWVersion(15.00.02.00), ChipRevision(0x02), BiosVersion(08.35.00.00)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Protocol=(
[Fri Aug 3 00:18:12 2018] Initiator
[Fri Aug 3 00:18:12 2018] ,Target
[Fri Aug 3 00:18:12 2018] ),
[Fri Aug 3 00:18:12 2018] Capabilities=(
[Fri Aug 3 00:18:12 2018] TLR
[Fri Aug 3 00:18:12 2018] ,EEDP
[Fri Aug 3 00:18:12 2018] ,Snapshot Buffer
[Fri Aug 3 00:18:12 2018] ,Diag Trace Buffer
[Fri Aug 3 00:18:12 2018] ,Task Set Full
[Fri Aug 3 00:18:12 2018] ,NCQ
[Fri Aug 3 00:18:12 2018] )
[Fri Aug 3 00:18:12 2018] scsi host13: Fusion MPT SAS Host
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending port enable !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm4: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (528262416 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: host_add: handle(0x0001), sas_addr(0x500605b00c482a80), phys(8)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: expander_add: handle(0x0009), parent(0x0001), sas_addr(0x5003048017aed57f), phys(38)
[Fri Aug 3 00:18:12 2018] scsi 13:0:0:0: Direct-Access SEAGATE ST800FM0173 0007 PQ: 0 ANSI: 6
When Mpt3sas sends "diag reset" its mean I'm losing a jbod "90 disk" in same time!
And because of this a simple faulty disk can suspend my ZFS Pool.
Now i'm looking a solution and I think If I say to multipath; "do not remap if a disk fails 3 time" then my problem will be solve because the disk will be not using by the pool and if my pool do not use the faulty disk then the disk can not give I/O error.
So with simple explanation I'm looking a way to disable usage the failed disk.
I found few settings for /etc/multipath.conf
But I'm not sure this will solve my problem or not.
Can you tell me the best solution for my problem?
defaults
user_friendly_names no
path_grouping_policy failover
polling_interval 10
path_selector "round-robin 0"
path_grouping_policy failover
path_checker readsector0
failback manual
no_path_retry 3
prio rdac
blacklist_exceptions ID_SERIAL)"
This is the full DMESG log --> https://paste.ubuntu.com/p/XZZ2CScmHP/
linux storage multipath-storage
linux storage multipath-storage
edited Aug 7 at 15:07
asked Aug 7 at 15:02
Morphinz
13111
13111
add a comment |Â
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461092%2fmultipathd-config-for-lsi-hba-3008%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password