Understanding smartctl and hard-drive errors

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
0
down vote

favorite












I have a raidz2 ZFS pool and my 2 disks started to give I/O error and after that zfs marked them as faulted. click for dmesg log



I removed the disks and I ran some test on them. Smartctl says;



DISK 1 "click for full log= SMART Health Status: DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH [asc=5d, ascq=32]
DISK 2 "click for full log= SMART Health Status: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE [asc=5d, ascq=10]



I created a new pool from the "DISK 1" and I started a fio test but i did not see any I/O error on the disk. I did not encounter any error like the previous one.. The disk working normal. Also I created a pool with 4 disk and Disk Utilization was normal too.

I tried this test for 4 days and I have not encountered an error. The disk working like the others right now.



fio --randrepeat=0 --ioengine=libaio --name=test --filename=/disktest/fiofile 
--bs=1024k --iodepth=64 --size=5T --readwrite=readwrite --rwmixread=60 --numjobs=20


I have few questions;

1- Why the disk do not give error anymore?

2- If the disk working normal then why it caused I/O error on first pool?

3- What is the best way understanding a hard-drive faulted or not?

4- How we can reset the hard-drive error counters?

5- The disk is garbage or not?



The disk attached from; Controller -> LSI3008HBA -> 2x SAS-cable -> "SC946ED-R2KJBOD" 2xExpander -> Multipath SAS disks.










share|improve this question























  • First thing I'd do is to look at the SMART attributes, but -d scsi prevents them from being shown. You didn't say how your disks are attached, so if possible, try again without -d scsi. VALUE is normalized to 100, lower is worse.
    – dirkt
    Aug 13 at 18:39










  • @dirkt Controller -> LSI3008HBA -> 2x SAS-cable -> "SC946ED-R2KJBOD" 2xExpander -> Mutlipath SAS disks. I use -d scsci because its not working with different way or I could not. :) What is your advice?
    – Morphinz
    Aug 14 at 9:19










  • For LSI controllers, try -d megaraid,N with a suitable N, see here.
    – dirkt
    Aug 14 at 10:25










  • @dirkt Its HBA not Raid card. Your method works on raid cards.
    – Morphinz
    Sep 5 at 12:13










  • Even if it has "raid" in the name, it might also work on other LSI controllers, so it's worth a try. More specifically, it will work on any hardware that supports this particular access method. If your card doesn't support it, then it doesn't; in that case there will probably be no way to get at this information unless you dig up a datasheet that describes how to send SMART commands for your controller.
    – dirkt
    Sep 5 at 14:08














up vote
0
down vote

favorite












I have a raidz2 ZFS pool and my 2 disks started to give I/O error and after that zfs marked them as faulted. click for dmesg log



I removed the disks and I ran some test on them. Smartctl says;



DISK 1 "click for full log= SMART Health Status: DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH [asc=5d, ascq=32]
DISK 2 "click for full log= SMART Health Status: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE [asc=5d, ascq=10]



I created a new pool from the "DISK 1" and I started a fio test but i did not see any I/O error on the disk. I did not encounter any error like the previous one.. The disk working normal. Also I created a pool with 4 disk and Disk Utilization was normal too.

I tried this test for 4 days and I have not encountered an error. The disk working like the others right now.



fio --randrepeat=0 --ioengine=libaio --name=test --filename=/disktest/fiofile 
--bs=1024k --iodepth=64 --size=5T --readwrite=readwrite --rwmixread=60 --numjobs=20


I have few questions;

1- Why the disk do not give error anymore?

2- If the disk working normal then why it caused I/O error on first pool?

3- What is the best way understanding a hard-drive faulted or not?

4- How we can reset the hard-drive error counters?

5- The disk is garbage or not?



The disk attached from; Controller -> LSI3008HBA -> 2x SAS-cable -> "SC946ED-R2KJBOD" 2xExpander -> Multipath SAS disks.










share|improve this question























  • First thing I'd do is to look at the SMART attributes, but -d scsi prevents them from being shown. You didn't say how your disks are attached, so if possible, try again without -d scsi. VALUE is normalized to 100, lower is worse.
    – dirkt
    Aug 13 at 18:39










  • @dirkt Controller -> LSI3008HBA -> 2x SAS-cable -> "SC946ED-R2KJBOD" 2xExpander -> Mutlipath SAS disks. I use -d scsci because its not working with different way or I could not. :) What is your advice?
    – Morphinz
    Aug 14 at 9:19










  • For LSI controllers, try -d megaraid,N with a suitable N, see here.
    – dirkt
    Aug 14 at 10:25










  • @dirkt Its HBA not Raid card. Your method works on raid cards.
    – Morphinz
    Sep 5 at 12:13










  • Even if it has "raid" in the name, it might also work on other LSI controllers, so it's worth a try. More specifically, it will work on any hardware that supports this particular access method. If your card doesn't support it, then it doesn't; in that case there will probably be no way to get at this information unless you dig up a datasheet that describes how to send SMART commands for your controller.
    – dirkt
    Sep 5 at 14:08












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a raidz2 ZFS pool and my 2 disks started to give I/O error and after that zfs marked them as faulted. click for dmesg log



I removed the disks and I ran some test on them. Smartctl says;



DISK 1 "click for full log= SMART Health Status: DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH [asc=5d, ascq=32]
DISK 2 "click for full log= SMART Health Status: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE [asc=5d, ascq=10]



I created a new pool from the "DISK 1" and I started a fio test but i did not see any I/O error on the disk. I did not encounter any error like the previous one.. The disk working normal. Also I created a pool with 4 disk and Disk Utilization was normal too.

I tried this test for 4 days and I have not encountered an error. The disk working like the others right now.



fio --randrepeat=0 --ioengine=libaio --name=test --filename=/disktest/fiofile 
--bs=1024k --iodepth=64 --size=5T --readwrite=readwrite --rwmixread=60 --numjobs=20


I have few questions;

1- Why the disk do not give error anymore?

2- If the disk working normal then why it caused I/O error on first pool?

3- What is the best way understanding a hard-drive faulted or not?

4- How we can reset the hard-drive error counters?

5- The disk is garbage or not?



The disk attached from; Controller -> LSI3008HBA -> 2x SAS-cable -> "SC946ED-R2KJBOD" 2xExpander -> Multipath SAS disks.










share|improve this question















I have a raidz2 ZFS pool and my 2 disks started to give I/O error and after that zfs marked them as faulted. click for dmesg log



I removed the disks and I ran some test on them. Smartctl says;



DISK 1 "click for full log= SMART Health Status: DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH [asc=5d, ascq=32]
DISK 2 "click for full log= SMART Health Status: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE [asc=5d, ascq=10]



I created a new pool from the "DISK 1" and I started a fio test but i did not see any I/O error on the disk. I did not encounter any error like the previous one.. The disk working normal. Also I created a pool with 4 disk and Disk Utilization was normal too.

I tried this test for 4 days and I have not encountered an error. The disk working like the others right now.



fio --randrepeat=0 --ioengine=libaio --name=test --filename=/disktest/fiofile 
--bs=1024k --iodepth=64 --size=5T --readwrite=readwrite --rwmixread=60 --numjobs=20


I have few questions;

1- Why the disk do not give error anymore?

2- If the disk working normal then why it caused I/O error on first pool?

3- What is the best way understanding a hard-drive faulted or not?

4- How we can reset the hard-drive error counters?

5- The disk is garbage or not?



The disk attached from; Controller -> LSI3008HBA -> 2x SAS-cable -> "SC946ED-R2KJBOD" 2xExpander -> Multipath SAS disks.







linux hard-disk zfs smartctl






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 14 at 9:25

























asked Aug 13 at 12:29









Morphinz

13111




13111











  • First thing I'd do is to look at the SMART attributes, but -d scsi prevents them from being shown. You didn't say how your disks are attached, so if possible, try again without -d scsi. VALUE is normalized to 100, lower is worse.
    – dirkt
    Aug 13 at 18:39










  • @dirkt Controller -> LSI3008HBA -> 2x SAS-cable -> "SC946ED-R2KJBOD" 2xExpander -> Mutlipath SAS disks. I use -d scsci because its not working with different way or I could not. :) What is your advice?
    – Morphinz
    Aug 14 at 9:19










  • For LSI controllers, try -d megaraid,N with a suitable N, see here.
    – dirkt
    Aug 14 at 10:25










  • @dirkt Its HBA not Raid card. Your method works on raid cards.
    – Morphinz
    Sep 5 at 12:13










  • Even if it has "raid" in the name, it might also work on other LSI controllers, so it's worth a try. More specifically, it will work on any hardware that supports this particular access method. If your card doesn't support it, then it doesn't; in that case there will probably be no way to get at this information unless you dig up a datasheet that describes how to send SMART commands for your controller.
    – dirkt
    Sep 5 at 14:08
















  • First thing I'd do is to look at the SMART attributes, but -d scsi prevents them from being shown. You didn't say how your disks are attached, so if possible, try again without -d scsi. VALUE is normalized to 100, lower is worse.
    – dirkt
    Aug 13 at 18:39










  • @dirkt Controller -> LSI3008HBA -> 2x SAS-cable -> "SC946ED-R2KJBOD" 2xExpander -> Mutlipath SAS disks. I use -d scsci because its not working with different way or I could not. :) What is your advice?
    – Morphinz
    Aug 14 at 9:19










  • For LSI controllers, try -d megaraid,N with a suitable N, see here.
    – dirkt
    Aug 14 at 10:25










  • @dirkt Its HBA not Raid card. Your method works on raid cards.
    – Morphinz
    Sep 5 at 12:13










  • Even if it has "raid" in the name, it might also work on other LSI controllers, so it's worth a try. More specifically, it will work on any hardware that supports this particular access method. If your card doesn't support it, then it doesn't; in that case there will probably be no way to get at this information unless you dig up a datasheet that describes how to send SMART commands for your controller.
    – dirkt
    Sep 5 at 14:08















First thing I'd do is to look at the SMART attributes, but -d scsi prevents them from being shown. You didn't say how your disks are attached, so if possible, try again without -d scsi. VALUE is normalized to 100, lower is worse.
– dirkt
Aug 13 at 18:39




First thing I'd do is to look at the SMART attributes, but -d scsi prevents them from being shown. You didn't say how your disks are attached, so if possible, try again without -d scsi. VALUE is normalized to 100, lower is worse.
– dirkt
Aug 13 at 18:39












@dirkt Controller -> LSI3008HBA -> 2x SAS-cable -> "SC946ED-R2KJBOD" 2xExpander -> Mutlipath SAS disks. I use -d scsci because its not working with different way or I could not. :) What is your advice?
– Morphinz
Aug 14 at 9:19




@dirkt Controller -> LSI3008HBA -> 2x SAS-cable -> "SC946ED-R2KJBOD" 2xExpander -> Mutlipath SAS disks. I use -d scsci because its not working with different way or I could not. :) What is your advice?
– Morphinz
Aug 14 at 9:19












For LSI controllers, try -d megaraid,N with a suitable N, see here.
– dirkt
Aug 14 at 10:25




For LSI controllers, try -d megaraid,N with a suitable N, see here.
– dirkt
Aug 14 at 10:25












@dirkt Its HBA not Raid card. Your method works on raid cards.
– Morphinz
Sep 5 at 12:13




@dirkt Its HBA not Raid card. Your method works on raid cards.
– Morphinz
Sep 5 at 12:13












Even if it has "raid" in the name, it might also work on other LSI controllers, so it's worth a try. More specifically, it will work on any hardware that supports this particular access method. If your card doesn't support it, then it doesn't; in that case there will probably be no way to get at this information unless you dig up a datasheet that describes how to send SMART commands for your controller.
– dirkt
Sep 5 at 14:08




Even if it has "raid" in the name, it might also work on other LSI controllers, so it's worth a try. More specifically, it will work on any hardware that supports this particular access method. If your card doesn't support it, then it doesn't; in that case there will probably be no way to get at this information unless you dig up a datasheet that describes how to send SMART commands for your controller.
– dirkt
Sep 5 at 14:08










1 Answer
1






active

oldest

votes

















up vote
1
down vote













  1. Some faults can come and go. There's nothing that guarantees you will be warned before a disk is going to die but if SMART starts spitting out failure errors it's better not to risk it and just replace the drive.

  2. Errors can come and go because sometimes the disk keeps retrying problem regions until it succeeds (at which point it will generally try and avoid using that region again if it can).

  3. You could run a long SMART self test and/or read/write to every LBA in use (ZFS has a scrub (aka resilvering) process that can be initiated). Watch out though - these might make the disk fail for good...

  4. You can't.

  5. Hard to say but let's put it another way: is the money saved by not replacing it unnecessarily worth the risk of having it suddenly fail?





share|improve this answer




















  • @Morphinz did this answer you questions?
    – Anon
    Sep 6 at 3:06











  • thank you for your answer but this is an info about I won't have any control on my disks. My problem getting information and be able to control my disks. Because of 1 faulted disk I'm facing suspend issue. When the problem making noise I need to figure out and stop it. Or Kernel,Multipathd,Zfs should do this for me. Because of 1 disk I dont want to reboot my server. For example I can listen dmesg and when i see "attempting task abort, I/O errors" and I can set the disk as offline. Or I can flush multipath and drop the disk via "/sys/block". With this way the disk will be not a problem.
    – Morphinz
    Sep 11 at 8:39










  • Even If i try "zpool import mypool" when zfs searching the disks, the faulted disk causing HBA reset. Thats causing "zpool import" hang for 1-2 minute and after that zpool import output gives me "Unavail, or faulted" everydisks. This issue really important and I cant understand why nobody cares it. Maybe my Broadcom 3008 HBA LSI cards causing this and thats why any other user do not have the problem. I really, really need to stop this HBA reset issue. If I can close the code from kernel I will... Or if I need to change these HBA card I will...
    – Morphinz
    Sep 11 at 8:45










  • You might be better off asking a new ZFS specific question as your comments show you need a different answer to those of your original 5 questions...
    – Anon
    Sep 12 at 18:16










  • (FYI: grox.net/sysadm/unix/linux_disk_hotplug_helpful_commands talks about using echo offline > /sys/block/<blockdev>/device/state) to offline a disk.
    – Anon
    Sep 12 at 18:19











Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f462287%2funderstanding-smartctl-and-hard-drive-errors%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote













  1. Some faults can come and go. There's nothing that guarantees you will be warned before a disk is going to die but if SMART starts spitting out failure errors it's better not to risk it and just replace the drive.

  2. Errors can come and go because sometimes the disk keeps retrying problem regions until it succeeds (at which point it will generally try and avoid using that region again if it can).

  3. You could run a long SMART self test and/or read/write to every LBA in use (ZFS has a scrub (aka resilvering) process that can be initiated). Watch out though - these might make the disk fail for good...

  4. You can't.

  5. Hard to say but let's put it another way: is the money saved by not replacing it unnecessarily worth the risk of having it suddenly fail?





share|improve this answer




















  • @Morphinz did this answer you questions?
    – Anon
    Sep 6 at 3:06











  • thank you for your answer but this is an info about I won't have any control on my disks. My problem getting information and be able to control my disks. Because of 1 faulted disk I'm facing suspend issue. When the problem making noise I need to figure out and stop it. Or Kernel,Multipathd,Zfs should do this for me. Because of 1 disk I dont want to reboot my server. For example I can listen dmesg and when i see "attempting task abort, I/O errors" and I can set the disk as offline. Or I can flush multipath and drop the disk via "/sys/block". With this way the disk will be not a problem.
    – Morphinz
    Sep 11 at 8:39










  • Even If i try "zpool import mypool" when zfs searching the disks, the faulted disk causing HBA reset. Thats causing "zpool import" hang for 1-2 minute and after that zpool import output gives me "Unavail, or faulted" everydisks. This issue really important and I cant understand why nobody cares it. Maybe my Broadcom 3008 HBA LSI cards causing this and thats why any other user do not have the problem. I really, really need to stop this HBA reset issue. If I can close the code from kernel I will... Or if I need to change these HBA card I will...
    – Morphinz
    Sep 11 at 8:45










  • You might be better off asking a new ZFS specific question as your comments show you need a different answer to those of your original 5 questions...
    – Anon
    Sep 12 at 18:16










  • (FYI: grox.net/sysadm/unix/linux_disk_hotplug_helpful_commands talks about using echo offline > /sys/block/<blockdev>/device/state) to offline a disk.
    – Anon
    Sep 12 at 18:19















up vote
1
down vote













  1. Some faults can come and go. There's nothing that guarantees you will be warned before a disk is going to die but if SMART starts spitting out failure errors it's better not to risk it and just replace the drive.

  2. Errors can come and go because sometimes the disk keeps retrying problem regions until it succeeds (at which point it will generally try and avoid using that region again if it can).

  3. You could run a long SMART self test and/or read/write to every LBA in use (ZFS has a scrub (aka resilvering) process that can be initiated). Watch out though - these might make the disk fail for good...

  4. You can't.

  5. Hard to say but let's put it another way: is the money saved by not replacing it unnecessarily worth the risk of having it suddenly fail?





share|improve this answer




















  • @Morphinz did this answer you questions?
    – Anon
    Sep 6 at 3:06











  • thank you for your answer but this is an info about I won't have any control on my disks. My problem getting information and be able to control my disks. Because of 1 faulted disk I'm facing suspend issue. When the problem making noise I need to figure out and stop it. Or Kernel,Multipathd,Zfs should do this for me. Because of 1 disk I dont want to reboot my server. For example I can listen dmesg and when i see "attempting task abort, I/O errors" and I can set the disk as offline. Or I can flush multipath and drop the disk via "/sys/block". With this way the disk will be not a problem.
    – Morphinz
    Sep 11 at 8:39










  • Even If i try "zpool import mypool" when zfs searching the disks, the faulted disk causing HBA reset. Thats causing "zpool import" hang for 1-2 minute and after that zpool import output gives me "Unavail, or faulted" everydisks. This issue really important and I cant understand why nobody cares it. Maybe my Broadcom 3008 HBA LSI cards causing this and thats why any other user do not have the problem. I really, really need to stop this HBA reset issue. If I can close the code from kernel I will... Or if I need to change these HBA card I will...
    – Morphinz
    Sep 11 at 8:45










  • You might be better off asking a new ZFS specific question as your comments show you need a different answer to those of your original 5 questions...
    – Anon
    Sep 12 at 18:16










  • (FYI: grox.net/sysadm/unix/linux_disk_hotplug_helpful_commands talks about using echo offline > /sys/block/<blockdev>/device/state) to offline a disk.
    – Anon
    Sep 12 at 18:19













up vote
1
down vote










up vote
1
down vote









  1. Some faults can come and go. There's nothing that guarantees you will be warned before a disk is going to die but if SMART starts spitting out failure errors it's better not to risk it and just replace the drive.

  2. Errors can come and go because sometimes the disk keeps retrying problem regions until it succeeds (at which point it will generally try and avoid using that region again if it can).

  3. You could run a long SMART self test and/or read/write to every LBA in use (ZFS has a scrub (aka resilvering) process that can be initiated). Watch out though - these might make the disk fail for good...

  4. You can't.

  5. Hard to say but let's put it another way: is the money saved by not replacing it unnecessarily worth the risk of having it suddenly fail?





share|improve this answer












  1. Some faults can come and go. There's nothing that guarantees you will be warned before a disk is going to die but if SMART starts spitting out failure errors it's better not to risk it and just replace the drive.

  2. Errors can come and go because sometimes the disk keeps retrying problem regions until it succeeds (at which point it will generally try and avoid using that region again if it can).

  3. You could run a long SMART self test and/or read/write to every LBA in use (ZFS has a scrub (aka resilvering) process that can be initiated). Watch out though - these might make the disk fail for good...

  4. You can't.

  5. Hard to say but let's put it another way: is the money saved by not replacing it unnecessarily worth the risk of having it suddenly fail?






share|improve this answer












share|improve this answer



share|improve this answer










answered Aug 14 at 6:46









Anon

1,3101018




1,3101018











  • @Morphinz did this answer you questions?
    – Anon
    Sep 6 at 3:06











  • thank you for your answer but this is an info about I won't have any control on my disks. My problem getting information and be able to control my disks. Because of 1 faulted disk I'm facing suspend issue. When the problem making noise I need to figure out and stop it. Or Kernel,Multipathd,Zfs should do this for me. Because of 1 disk I dont want to reboot my server. For example I can listen dmesg and when i see "attempting task abort, I/O errors" and I can set the disk as offline. Or I can flush multipath and drop the disk via "/sys/block". With this way the disk will be not a problem.
    – Morphinz
    Sep 11 at 8:39










  • Even If i try "zpool import mypool" when zfs searching the disks, the faulted disk causing HBA reset. Thats causing "zpool import" hang for 1-2 minute and after that zpool import output gives me "Unavail, or faulted" everydisks. This issue really important and I cant understand why nobody cares it. Maybe my Broadcom 3008 HBA LSI cards causing this and thats why any other user do not have the problem. I really, really need to stop this HBA reset issue. If I can close the code from kernel I will... Or if I need to change these HBA card I will...
    – Morphinz
    Sep 11 at 8:45










  • You might be better off asking a new ZFS specific question as your comments show you need a different answer to those of your original 5 questions...
    – Anon
    Sep 12 at 18:16










  • (FYI: grox.net/sysadm/unix/linux_disk_hotplug_helpful_commands talks about using echo offline > /sys/block/<blockdev>/device/state) to offline a disk.
    – Anon
    Sep 12 at 18:19

















  • @Morphinz did this answer you questions?
    – Anon
    Sep 6 at 3:06











  • thank you for your answer but this is an info about I won't have any control on my disks. My problem getting information and be able to control my disks. Because of 1 faulted disk I'm facing suspend issue. When the problem making noise I need to figure out and stop it. Or Kernel,Multipathd,Zfs should do this for me. Because of 1 disk I dont want to reboot my server. For example I can listen dmesg and when i see "attempting task abort, I/O errors" and I can set the disk as offline. Or I can flush multipath and drop the disk via "/sys/block". With this way the disk will be not a problem.
    – Morphinz
    Sep 11 at 8:39










  • Even If i try "zpool import mypool" when zfs searching the disks, the faulted disk causing HBA reset. Thats causing "zpool import" hang for 1-2 minute and after that zpool import output gives me "Unavail, or faulted" everydisks. This issue really important and I cant understand why nobody cares it. Maybe my Broadcom 3008 HBA LSI cards causing this and thats why any other user do not have the problem. I really, really need to stop this HBA reset issue. If I can close the code from kernel I will... Or if I need to change these HBA card I will...
    – Morphinz
    Sep 11 at 8:45










  • You might be better off asking a new ZFS specific question as your comments show you need a different answer to those of your original 5 questions...
    – Anon
    Sep 12 at 18:16










  • (FYI: grox.net/sysadm/unix/linux_disk_hotplug_helpful_commands talks about using echo offline > /sys/block/<blockdev>/device/state) to offline a disk.
    – Anon
    Sep 12 at 18:19
















@Morphinz did this answer you questions?
– Anon
Sep 6 at 3:06





@Morphinz did this answer you questions?
– Anon
Sep 6 at 3:06













thank you for your answer but this is an info about I won't have any control on my disks. My problem getting information and be able to control my disks. Because of 1 faulted disk I'm facing suspend issue. When the problem making noise I need to figure out and stop it. Or Kernel,Multipathd,Zfs should do this for me. Because of 1 disk I dont want to reboot my server. For example I can listen dmesg and when i see "attempting task abort, I/O errors" and I can set the disk as offline. Or I can flush multipath and drop the disk via "/sys/block". With this way the disk will be not a problem.
– Morphinz
Sep 11 at 8:39




thank you for your answer but this is an info about I won't have any control on my disks. My problem getting information and be able to control my disks. Because of 1 faulted disk I'm facing suspend issue. When the problem making noise I need to figure out and stop it. Or Kernel,Multipathd,Zfs should do this for me. Because of 1 disk I dont want to reboot my server. For example I can listen dmesg and when i see "attempting task abort, I/O errors" and I can set the disk as offline. Or I can flush multipath and drop the disk via "/sys/block". With this way the disk will be not a problem.
– Morphinz
Sep 11 at 8:39












Even If i try "zpool import mypool" when zfs searching the disks, the faulted disk causing HBA reset. Thats causing "zpool import" hang for 1-2 minute and after that zpool import output gives me "Unavail, or faulted" everydisks. This issue really important and I cant understand why nobody cares it. Maybe my Broadcom 3008 HBA LSI cards causing this and thats why any other user do not have the problem. I really, really need to stop this HBA reset issue. If I can close the code from kernel I will... Or if I need to change these HBA card I will...
– Morphinz
Sep 11 at 8:45




Even If i try "zpool import mypool" when zfs searching the disks, the faulted disk causing HBA reset. Thats causing "zpool import" hang for 1-2 minute and after that zpool import output gives me "Unavail, or faulted" everydisks. This issue really important and I cant understand why nobody cares it. Maybe my Broadcom 3008 HBA LSI cards causing this and thats why any other user do not have the problem. I really, really need to stop this HBA reset issue. If I can close the code from kernel I will... Or if I need to change these HBA card I will...
– Morphinz
Sep 11 at 8:45












You might be better off asking a new ZFS specific question as your comments show you need a different answer to those of your original 5 questions...
– Anon
Sep 12 at 18:16




You might be better off asking a new ZFS specific question as your comments show you need a different answer to those of your original 5 questions...
– Anon
Sep 12 at 18:16












(FYI: grox.net/sysadm/unix/linux_disk_hotplug_helpful_commands talks about using echo offline > /sys/block/<blockdev>/device/state) to offline a disk.
– Anon
Sep 12 at 18:19





(FYI: grox.net/sysadm/unix/linux_disk_hotplug_helpful_commands talks about using echo offline > /sys/block/<blockdev>/device/state) to offline a disk.
– Anon
Sep 12 at 18:19


















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f462287%2funderstanding-smartctl-and-hard-drive-errors%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Bahrain

Postfix configuration issue with fips on centos 7; mailgun relay