Multipathd config for LSI HBA 3008

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have 5 jbod attached via LSI-SAS3008 to controller.
I'm using Arch-Linux 4.14.41-1-lts & multipath-tools v0.7.6 (03/10,2018)



My problem is when a disk starts to give I/O error and starts to flickering Multipath trying to check the disk and remap the failed path.



Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 04:59:51 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 04:59:51 FKM1 multipathd[5315]: sdbe: mark as failed
Jul 23 04:59:56 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:04:37 FKM1 multipathd[5315]: 67:128: reinstated
Jul 23 05:04:37 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 1
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
Jul 23 05:05:27 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
Jul 23 05:05:27 FKM1 multipathd[5315]: sdbe: mark as failed


Because of the faulty disk multipath trying to remap every time when the disk shows up.



[Fri Aug 3 00:18:37 2018] alua: device handler registered
[Fri Aug 3 00:18:37 2018] emc: device handler registered
[Fri Aug 3 00:18:37 2018] rdac: device handler registered
[Fri Aug 3 00:18:37 2018] device-mapper: uevent: version 1.0.3
[Fri Aug 3 00:18:37 2018] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[Fri Aug 3 00:18:43 2018] device-mapper: multipath service-time: version 0.3.0 loaded
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
[Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#1 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a6c4de948)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#0 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa07b2eb87d48)
[Fri Aug 3 00:21:21 2018] device-mapper: multipath: Failing path 67:176.
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: [sdbh] tag#11 CDB: opcode=0x0 00 00 00 00 00 00
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
[Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
[Fri Aug 3 00:21:21 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a89b38148)
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
[Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
[Fri Aug 3 00:21:57 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b3f148)


After a while the cycle continues when MPT3SAS Driver giving up and going to reset LSI card.



[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: iomem(0x00000000fbe40000), mapped(0xffffbe0e8dca0000), size(65536)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: ioport(0x000000000000e000), size(256)
[Fri Aug 3 00:18:12 2018] usb 2-1-port6: over-current condition
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending message unit reset !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: message unit reset: SUCCESS
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Allocated physical memory: size(20778 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Current Controller Queue Depth(9564),Max Controller Queue Depth(9664)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Scatter Gather Elements per IO(128)
[Fri Aug 3 00:18:12 2018] usb 3-14.1: new low-speed USB device number 3 using xhci_hcd
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: LSISAS3008: FWVersion(15.00.02.00), ChipRevision(0x02), BiosVersion(08.35.00.00)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Protocol=(
[Fri Aug 3 00:18:12 2018] Initiator
[Fri Aug 3 00:18:12 2018] ,Target
[Fri Aug 3 00:18:12 2018] ),
[Fri Aug 3 00:18:12 2018] Capabilities=(
[Fri Aug 3 00:18:12 2018] TLR
[Fri Aug 3 00:18:12 2018] ,EEDP
[Fri Aug 3 00:18:12 2018] ,Snapshot Buffer
[Fri Aug 3 00:18:12 2018] ,Diag Trace Buffer
[Fri Aug 3 00:18:12 2018] ,Task Set Full
[Fri Aug 3 00:18:12 2018] ,NCQ
[Fri Aug 3 00:18:12 2018] )
[Fri Aug 3 00:18:12 2018] scsi host13: Fusion MPT SAS Host
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending port enable !!
[Fri Aug 3 00:18:12 2018] mpt3sas_cm4: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (528262416 kB)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: host_add: handle(0x0001), sas_addr(0x500605b00c482a80), phys(8)
[Fri Aug 3 00:18:12 2018] mpt3sas_cm3: expander_add: handle(0x0009), parent(0x0001), sas_addr(0x5003048017aed57f), phys(38)
[Fri Aug 3 00:18:12 2018] scsi 13:0:0:0: Direct-Access SEAGATE ST800FM0173 0007 PQ: 0 ANSI: 6


When Mpt3sas sends "diag reset" its mean I'm losing a jbod "90 disk" in same time!
And because of this a simple faulty disk can suspend my ZFS Pool.



Now i'm looking a solution and I think If I say to multipath; "do not remap if a disk fails 3 time" then my problem will be solve because the disk will be not using by the pool and if my pool do not use the faulty disk then the disk can not give I/O error.



So with simple explanation I'm looking a way to disable usage the failed disk.



I found few settings for /etc/multipath.conf
But I'm not sure this will solve my problem or not.
Can you tell me the best solution for my problem?



defaults 
user_friendly_names no
path_grouping_policy failover
polling_interval 10
path_selector "round-robin 0"
path_grouping_policy failover
path_checker readsector0
failback manual
no_path_retry 3
prio rdac



blacklist_exceptions ID_SERIAL)"



This is the full DMESG log --> https://paste.ubuntu.com/p/XZZ2CScmHP/










share|improve this question



























    up vote
    0
    down vote

    favorite












    I have 5 jbod attached via LSI-SAS3008 to controller.
    I'm using Arch-Linux 4.14.41-1-lts & multipath-tools v0.7.6 (03/10,2018)



    My problem is when a disk starts to give I/O error and starts to flickering Multipath trying to check the disk and remap the failed path.



    Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
    Jul 23 04:59:51 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
    Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
    Jul 23 04:59:51 FKM1 multipathd[5315]: sdbe: mark as failed
    Jul 23 04:59:56 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
    Jul 23 05:04:37 FKM1 multipathd[5315]: 67:128: reinstated
    Jul 23 05:04:37 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 1
    Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
    Jul 23 05:05:27 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
    Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
    Jul 23 05:05:27 FKM1 multipathd[5315]: sdbe: mark as failed


    Because of the faulty disk multipath trying to remap every time when the disk shows up.



    [Fri Aug 3 00:18:37 2018] alua: device handler registered
    [Fri Aug 3 00:18:37 2018] emc: device handler registered
    [Fri Aug 3 00:18:37 2018] rdac: device handler registered
    [Fri Aug 3 00:18:37 2018] device-mapper: uevent: version 1.0.3
    [Fri Aug 3 00:18:37 2018] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
    [Fri Aug 3 00:18:43 2018] device-mapper: multipath service-time: version 0.3.0 loaded
    [Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
    [Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
    [Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
    [Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
    [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a6c4de948)
    [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#1 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
    [Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
    [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
    [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
    [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a6c4de948)
    [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa07b2eb87d48)
    [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#0 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
    [Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
    [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
    [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
    [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa07b2eb87d48)
    [Fri Aug 3 00:21:21 2018] device-mapper: multipath: Failing path 67:176.
    [Fri Aug 3 00:21:21 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b38148)
    [Fri Aug 3 00:21:21 2018] sd 12:0:16:0: [sdbh] tag#11 CDB: opcode=0x0 00 00 00 00 00 00
    [Fri Aug 3 00:21:21 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
    [Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
    [Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
    [Fri Aug 3 00:21:21 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a89b38148)
    [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
    [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
    [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
    [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
    [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
    [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
    [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
    [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
    [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
    [Fri Aug 3 00:21:57 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b3f148)


    After a while the cycle continues when MPT3SAS Driver giving up and going to reset LSI card.



    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: iomem(0x00000000fbe40000), mapped(0xffffbe0e8dca0000), size(65536)
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: ioport(0x000000000000e000), size(256)
    [Fri Aug 3 00:18:12 2018] usb 2-1-port6: over-current condition
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending message unit reset !!
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: message unit reset: SUCCESS
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Allocated physical memory: size(20778 kB)
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Current Controller Queue Depth(9564),Max Controller Queue Depth(9664)
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Scatter Gather Elements per IO(128)
    [Fri Aug 3 00:18:12 2018] usb 3-14.1: new low-speed USB device number 3 using xhci_hcd
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: LSISAS3008: FWVersion(15.00.02.00), ChipRevision(0x02), BiosVersion(08.35.00.00)
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Protocol=(
    [Fri Aug 3 00:18:12 2018] Initiator
    [Fri Aug 3 00:18:12 2018] ,Target
    [Fri Aug 3 00:18:12 2018] ),
    [Fri Aug 3 00:18:12 2018] Capabilities=(
    [Fri Aug 3 00:18:12 2018] TLR
    [Fri Aug 3 00:18:12 2018] ,EEDP
    [Fri Aug 3 00:18:12 2018] ,Snapshot Buffer
    [Fri Aug 3 00:18:12 2018] ,Diag Trace Buffer
    [Fri Aug 3 00:18:12 2018] ,Task Set Full
    [Fri Aug 3 00:18:12 2018] ,NCQ
    [Fri Aug 3 00:18:12 2018] )
    [Fri Aug 3 00:18:12 2018] scsi host13: Fusion MPT SAS Host
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending port enable !!
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm4: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (528262416 kB)
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: host_add: handle(0x0001), sas_addr(0x500605b00c482a80), phys(8)
    [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: expander_add: handle(0x0009), parent(0x0001), sas_addr(0x5003048017aed57f), phys(38)
    [Fri Aug 3 00:18:12 2018] scsi 13:0:0:0: Direct-Access SEAGATE ST800FM0173 0007 PQ: 0 ANSI: 6


    When Mpt3sas sends "diag reset" its mean I'm losing a jbod "90 disk" in same time!
    And because of this a simple faulty disk can suspend my ZFS Pool.



    Now i'm looking a solution and I think If I say to multipath; "do not remap if a disk fails 3 time" then my problem will be solve because the disk will be not using by the pool and if my pool do not use the faulty disk then the disk can not give I/O error.



    So with simple explanation I'm looking a way to disable usage the failed disk.



    I found few settings for /etc/multipath.conf
    But I'm not sure this will solve my problem or not.
    Can you tell me the best solution for my problem?



    defaults 
    user_friendly_names no
    path_grouping_policy failover
    polling_interval 10
    path_selector "round-robin 0"
    path_grouping_policy failover
    path_checker readsector0
    failback manual
    no_path_retry 3
    prio rdac



    blacklist_exceptions ID_SERIAL)"



    This is the full DMESG log --> https://paste.ubuntu.com/p/XZZ2CScmHP/










    share|improve this question

























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have 5 jbod attached via LSI-SAS3008 to controller.
      I'm using Arch-Linux 4.14.41-1-lts & multipath-tools v0.7.6 (03/10,2018)



      My problem is when a disk starts to give I/O error and starts to flickering Multipath trying to check the disk and remap the failed path.



      Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
      Jul 23 04:59:51 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
      Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
      Jul 23 04:59:51 FKM1 multipathd[5315]: sdbe: mark as failed
      Jul 23 04:59:56 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
      Jul 23 05:04:37 FKM1 multipathd[5315]: 67:128: reinstated
      Jul 23 05:04:37 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 1
      Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
      Jul 23 05:05:27 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
      Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
      Jul 23 05:05:27 FKM1 multipathd[5315]: sdbe: mark as failed


      Because of the faulty disk multipath trying to remap every time when the disk shows up.



      [Fri Aug 3 00:18:37 2018] alua: device handler registered
      [Fri Aug 3 00:18:37 2018] emc: device handler registered
      [Fri Aug 3 00:18:37 2018] rdac: device handler registered
      [Fri Aug 3 00:18:37 2018] device-mapper: uevent: version 1.0.3
      [Fri Aug 3 00:18:37 2018] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
      [Fri Aug 3 00:18:43 2018] device-mapper: multipath service-time: version 0.3.0 loaded
      [Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
      [Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
      [Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
      [Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a6c4de948)
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#1 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a6c4de948)
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa07b2eb87d48)
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#0 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa07b2eb87d48)
      [Fri Aug 3 00:21:21 2018] device-mapper: multipath: Failing path 67:176.
      [Fri Aug 3 00:21:21 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b38148)
      [Fri Aug 3 00:21:21 2018] sd 12:0:16:0: [sdbh] tag#11 CDB: opcode=0x0 00 00 00 00 00 00
      [Fri Aug 3 00:21:21 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
      [Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
      [Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
      [Fri Aug 3 00:21:21 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a89b38148)
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
      [Fri Aug 3 00:21:57 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b3f148)


      After a while the cycle continues when MPT3SAS Driver giving up and going to reset LSI card.



      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: iomem(0x00000000fbe40000), mapped(0xffffbe0e8dca0000), size(65536)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: ioport(0x000000000000e000), size(256)
      [Fri Aug 3 00:18:12 2018] usb 2-1-port6: over-current condition
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending message unit reset !!
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: message unit reset: SUCCESS
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Allocated physical memory: size(20778 kB)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Current Controller Queue Depth(9564),Max Controller Queue Depth(9664)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Scatter Gather Elements per IO(128)
      [Fri Aug 3 00:18:12 2018] usb 3-14.1: new low-speed USB device number 3 using xhci_hcd
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: LSISAS3008: FWVersion(15.00.02.00), ChipRevision(0x02), BiosVersion(08.35.00.00)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Protocol=(
      [Fri Aug 3 00:18:12 2018] Initiator
      [Fri Aug 3 00:18:12 2018] ,Target
      [Fri Aug 3 00:18:12 2018] ),
      [Fri Aug 3 00:18:12 2018] Capabilities=(
      [Fri Aug 3 00:18:12 2018] TLR
      [Fri Aug 3 00:18:12 2018] ,EEDP
      [Fri Aug 3 00:18:12 2018] ,Snapshot Buffer
      [Fri Aug 3 00:18:12 2018] ,Diag Trace Buffer
      [Fri Aug 3 00:18:12 2018] ,Task Set Full
      [Fri Aug 3 00:18:12 2018] ,NCQ
      [Fri Aug 3 00:18:12 2018] )
      [Fri Aug 3 00:18:12 2018] scsi host13: Fusion MPT SAS Host
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending port enable !!
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm4: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (528262416 kB)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: host_add: handle(0x0001), sas_addr(0x500605b00c482a80), phys(8)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: expander_add: handle(0x0009), parent(0x0001), sas_addr(0x5003048017aed57f), phys(38)
      [Fri Aug 3 00:18:12 2018] scsi 13:0:0:0: Direct-Access SEAGATE ST800FM0173 0007 PQ: 0 ANSI: 6


      When Mpt3sas sends "diag reset" its mean I'm losing a jbod "90 disk" in same time!
      And because of this a simple faulty disk can suspend my ZFS Pool.



      Now i'm looking a solution and I think If I say to multipath; "do not remap if a disk fails 3 time" then my problem will be solve because the disk will be not using by the pool and if my pool do not use the faulty disk then the disk can not give I/O error.



      So with simple explanation I'm looking a way to disable usage the failed disk.



      I found few settings for /etc/multipath.conf
      But I'm not sure this will solve my problem or not.
      Can you tell me the best solution for my problem?



      defaults 
      user_friendly_names no
      path_grouping_policy failover
      polling_interval 10
      path_selector "round-robin 0"
      path_grouping_policy failover
      path_checker readsector0
      failback manual
      no_path_retry 3
      prio rdac



      blacklist_exceptions ID_SERIAL)"



      This is the full DMESG log --> https://paste.ubuntu.com/p/XZZ2CScmHP/










      share|improve this question















      I have 5 jbod attached via LSI-SAS3008 to controller.
      I'm using Arch-Linux 4.14.41-1-lts & multipath-tools v0.7.6 (03/10,2018)



      My problem is when a disk starts to give I/O error and starts to flickering Multipath trying to check the disk and remap the failed path.



      Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
      Jul 23 04:59:51 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
      Jul 23 04:59:51 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
      Jul 23 04:59:51 FKM1 multipathd[5315]: sdbe: mark as failed
      Jul 23 04:59:56 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
      Jul 23 05:04:37 FKM1 multipathd[5315]: 67:128: reinstated
      Jul 23 05:04:37 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 1
      Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: sdbe - tur checker timed out
      Jul 23 05:05:27 FKM1 multipathd[5315]: checker failed path 67:128 in map 35000c50093d4e7c7
      Jul 23 05:05:27 FKM1 multipathd[5315]: 35000c50093d4e7c7: remaining active paths: 0
      Jul 23 05:05:27 FKM1 multipathd[5315]: sdbe: mark as failed


      Because of the faulty disk multipath trying to remap every time when the disk shows up.



      [Fri Aug 3 00:18:37 2018] alua: device handler registered
      [Fri Aug 3 00:18:37 2018] emc: device handler registered
      [Fri Aug 3 00:18:37 2018] rdac: device handler registered
      [Fri Aug 3 00:18:37 2018] device-mapper: uevent: version 1.0.3
      [Fri Aug 3 00:18:37 2018] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
      [Fri Aug 3 00:18:43 2018] device-mapper: multipath service-time: version 0.3.0 loaded
      [Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
      [Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
      [Fri Aug 3 00:18:43 2018] device-mapper: table: 254:0: multipath: error getting device
      [Fri Aug 3 00:18:43 2018] device-mapper: ioctl: error adding target to table
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a6c4de948)
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#1 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a6c4de948)
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa07b2eb87d48)
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: [sdbh] tag#0 CDB: opcode=0x88 88 00 00 00 00 02 ba a0 f0 00 00 00 02 00 00 00
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
      [Fri Aug 3 00:21:19 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
      [Fri Aug 3 00:21:19 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa07b2eb87d48)
      [Fri Aug 3 00:21:21 2018] device-mapper: multipath: Failing path 67:176.
      [Fri Aug 3 00:21:21 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b38148)
      [Fri Aug 3 00:21:21 2018] sd 12:0:16:0: [sdbh] tag#11 CDB: opcode=0x0 00 00 00 00 00 00
      [Fri Aug 3 00:21:21 2018] scsi target12:0:16: handle(0x001c), sas_address(0x5000c50093d5135d), phy(8)
      [Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure_logical_id(0x500304800929f87f), slot(8)
      [Fri Aug 3 00:21:21 2018] scsi target12:0:16: enclosure level(0x0001),connector name(1 )
      [Fri Aug 3 00:21:21 2018] sd 12:0:16:0: task abort: SUCCESS scmd(ffffa03a89b38148)
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 0
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 512
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721043968
      [Fri Aug 3 00:21:26 2018] print_req_error: I/O error, dev dm-208, sector 11721044480
      [Fri Aug 3 00:21:57 2018] sd 12:0:16:0: attempting task abort! scmd(ffffa03a89b3f148)


      After a while the cycle continues when MPT3SAS Driver giving up and going to reset LSI card.



      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: iomem(0x00000000fbe40000), mapped(0xffffbe0e8dca0000), size(65536)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: ioport(0x000000000000e000), size(256)
      [Fri Aug 3 00:18:12 2018] usb 2-1-port6: over-current condition
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending message unit reset !!
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: message unit reset: SUCCESS
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Allocated physical memory: size(20778 kB)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Current Controller Queue Depth(9564),Max Controller Queue Depth(9664)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Scatter Gather Elements per IO(128)
      [Fri Aug 3 00:18:12 2018] usb 3-14.1: new low-speed USB device number 3 using xhci_hcd
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: LSISAS3008: FWVersion(15.00.02.00), ChipRevision(0x02), BiosVersion(08.35.00.00)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: Protocol=(
      [Fri Aug 3 00:18:12 2018] Initiator
      [Fri Aug 3 00:18:12 2018] ,Target
      [Fri Aug 3 00:18:12 2018] ),
      [Fri Aug 3 00:18:12 2018] Capabilities=(
      [Fri Aug 3 00:18:12 2018] TLR
      [Fri Aug 3 00:18:12 2018] ,EEDP
      [Fri Aug 3 00:18:12 2018] ,Snapshot Buffer
      [Fri Aug 3 00:18:12 2018] ,Diag Trace Buffer
      [Fri Aug 3 00:18:12 2018] ,Task Set Full
      [Fri Aug 3 00:18:12 2018] ,NCQ
      [Fri Aug 3 00:18:12 2018] )
      [Fri Aug 3 00:18:12 2018] scsi host13: Fusion MPT SAS Host
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: sending port enable !!
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm4: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (528262416 kB)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: host_add: handle(0x0001), sas_addr(0x500605b00c482a80), phys(8)
      [Fri Aug 3 00:18:12 2018] mpt3sas_cm3: expander_add: handle(0x0009), parent(0x0001), sas_addr(0x5003048017aed57f), phys(38)
      [Fri Aug 3 00:18:12 2018] scsi 13:0:0:0: Direct-Access SEAGATE ST800FM0173 0007 PQ: 0 ANSI: 6


      When Mpt3sas sends "diag reset" its mean I'm losing a jbod "90 disk" in same time!
      And because of this a simple faulty disk can suspend my ZFS Pool.



      Now i'm looking a solution and I think If I say to multipath; "do not remap if a disk fails 3 time" then my problem will be solve because the disk will be not using by the pool and if my pool do not use the faulty disk then the disk can not give I/O error.



      So with simple explanation I'm looking a way to disable usage the failed disk.



      I found few settings for /etc/multipath.conf
      But I'm not sure this will solve my problem or not.
      Can you tell me the best solution for my problem?



      defaults 
      user_friendly_names no
      path_grouping_policy failover
      polling_interval 10
      path_selector "round-robin 0"
      path_grouping_policy failover
      path_checker readsector0
      failback manual
      no_path_retry 3
      prio rdac



      blacklist_exceptions ID_SERIAL)"



      This is the full DMESG log --> https://paste.ubuntu.com/p/XZZ2CScmHP/







      linux storage multipath-storage






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Aug 7 at 15:07

























      asked Aug 7 at 15:02









      Morphinz

      13111




      13111

























          active

          oldest

          votes











          Your Answer







          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461092%2fmultipathd-config-for-lsi-hba-3008%23new-answer', 'question_page');

          );

          Post as a guest



































          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f461092%2fmultipathd-config-for-lsi-hba-3008%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          How to check contact read email or not when send email to Individual?

          Displaying single band from multi-band raster using QGIS

          How many registers does an x86_64 CPU actually have?