dd hanging & uninterruptible sleep (kernel quirk?)

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












So, this has been annoying me for years.



This happens with more programs than just dd, but I find it happens very often with programs that involve raw filesystem manipulation.



When I'm copying with dd -- e.g., making a bootable USB disk by doing sudo dd if=somelinuxdistro.iso of=/dev/sdb bs=64K status=progress, it's like all my signals are ignored by the application. (Or by the kernel in the case of SIGKILL) htop shows status D, which apparently means "uninterruptible sleep". It can stay in this state for ages if there's a hardware glitch, and in regular usage I can't seem to find any way of detaching it from the terminal so I can keep working -- often I just end up switching to a different terminal to finish my work.



I've looked this up before, but I've never found an explanation of what this state is for or why the kernel refuses to kill a process in this state -- or what exactly the recommended thing is to do to avoid wasting time. (Nor any recommendations of what to do when this happens.)



In short:
I'd like to have is a way of reliably force killing processes in state D, or at least detaching them from the terminal. And I'd also like an explanation of what's going on in the background to cause them to be in this state in the first place.










share|improve this question























  • Have you tried Ctrl+Z?
    – G-Man
    Sep 10 at 20:23










  • You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (echo 3 > /proc/sys/vm/drop_caches) helps, see if it helps in your case, too.
    – dirkt
    Sep 11 at 6:05










  • @G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
    – Alexandria P.
    Sep 11 at 6:47














up vote
1
down vote

favorite












So, this has been annoying me for years.



This happens with more programs than just dd, but I find it happens very often with programs that involve raw filesystem manipulation.



When I'm copying with dd -- e.g., making a bootable USB disk by doing sudo dd if=somelinuxdistro.iso of=/dev/sdb bs=64K status=progress, it's like all my signals are ignored by the application. (Or by the kernel in the case of SIGKILL) htop shows status D, which apparently means "uninterruptible sleep". It can stay in this state for ages if there's a hardware glitch, and in regular usage I can't seem to find any way of detaching it from the terminal so I can keep working -- often I just end up switching to a different terminal to finish my work.



I've looked this up before, but I've never found an explanation of what this state is for or why the kernel refuses to kill a process in this state -- or what exactly the recommended thing is to do to avoid wasting time. (Nor any recommendations of what to do when this happens.)



In short:
I'd like to have is a way of reliably force killing processes in state D, or at least detaching them from the terminal. And I'd also like an explanation of what's going on in the background to cause them to be in this state in the first place.










share|improve this question























  • Have you tried Ctrl+Z?
    – G-Man
    Sep 10 at 20:23










  • You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (echo 3 > /proc/sys/vm/drop_caches) helps, see if it helps in your case, too.
    – dirkt
    Sep 11 at 6:05










  • @G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
    – Alexandria P.
    Sep 11 at 6:47












up vote
1
down vote

favorite









up vote
1
down vote

favorite











So, this has been annoying me for years.



This happens with more programs than just dd, but I find it happens very often with programs that involve raw filesystem manipulation.



When I'm copying with dd -- e.g., making a bootable USB disk by doing sudo dd if=somelinuxdistro.iso of=/dev/sdb bs=64K status=progress, it's like all my signals are ignored by the application. (Or by the kernel in the case of SIGKILL) htop shows status D, which apparently means "uninterruptible sleep". It can stay in this state for ages if there's a hardware glitch, and in regular usage I can't seem to find any way of detaching it from the terminal so I can keep working -- often I just end up switching to a different terminal to finish my work.



I've looked this up before, but I've never found an explanation of what this state is for or why the kernel refuses to kill a process in this state -- or what exactly the recommended thing is to do to avoid wasting time. (Nor any recommendations of what to do when this happens.)



In short:
I'd like to have is a way of reliably force killing processes in state D, or at least detaching them from the terminal. And I'd also like an explanation of what's going on in the background to cause them to be in this state in the first place.










share|improve this question















So, this has been annoying me for years.



This happens with more programs than just dd, but I find it happens very often with programs that involve raw filesystem manipulation.



When I'm copying with dd -- e.g., making a bootable USB disk by doing sudo dd if=somelinuxdistro.iso of=/dev/sdb bs=64K status=progress, it's like all my signals are ignored by the application. (Or by the kernel in the case of SIGKILL) htop shows status D, which apparently means "uninterruptible sleep". It can stay in this state for ages if there's a hardware glitch, and in regular usage I can't seem to find any way of detaching it from the terminal so I can keep working -- often I just end up switching to a different terminal to finish my work.



I've looked this up before, but I've never found an explanation of what this state is for or why the kernel refuses to kill a process in this state -- or what exactly the recommended thing is to do to avoid wasting time. (Nor any recommendations of what to do when this happens.)



In short:
I'd like to have is a way of reliably force killing processes in state D, or at least detaching them from the terminal. And I'd also like an explanation of what's going on in the background to cause them to be in this state in the first place.







filesystems kernel dd hang






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 10 at 20:01









Rui F Ribeiro

36.8k1273117




36.8k1273117










asked Sep 10 at 19:59









Alexandria P.

93




93











  • Have you tried Ctrl+Z?
    – G-Man
    Sep 10 at 20:23










  • You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (echo 3 > /proc/sys/vm/drop_caches) helps, see if it helps in your case, too.
    – dirkt
    Sep 11 at 6:05










  • @G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
    – Alexandria P.
    Sep 11 at 6:47
















  • Have you tried Ctrl+Z?
    – G-Man
    Sep 10 at 20:23










  • You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (echo 3 > /proc/sys/vm/drop_caches) helps, see if it helps in your case, too.
    – dirkt
    Sep 11 at 6:05










  • @G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
    – Alexandria P.
    Sep 11 at 6:47















Have you tried Ctrl+Z?
– G-Man
Sep 10 at 20:23




Have you tried Ctrl+Z?
– G-Man
Sep 10 at 20:23












You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (echo 3 > /proc/sys/vm/drop_caches) helps, see if it helps in your case, too.
– dirkt
Sep 11 at 6:05




You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (echo 3 > /proc/sys/vm/drop_caches) helps, see if it helps in your case, too.
– dirkt
Sep 11 at 6:05












@G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
– Alexandria P.
Sep 11 at 6:47




@G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
– Alexandria P.
Sep 11 at 6:47










1 Answer
1






active

oldest

votes

















up vote
1
down vote













If you cannot interrupt an "uninterruptable read" and this is not related to a switched off NFS server, you discovered a driver bug.



I/O to local background storage should not have a timeout that is larger than 5-10 minutes. So if you type ^C or ^Z and nothing happens within 10 minutes, there is a driver bug.



The background is that UNIX defines that so called fast IO is not interruptable by signals because fast IO will terminate after a forseeable amount of time.



Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state. Everything that happened after startin the IO needs to be unwound and a return can only happen from where the IO was initiated.



Even worse, if a background storage driver did implement interruptable IO, this would cause unhandlable problems in filesystems above such a driver. You are using a driver that is intended to be used as background storage for a filesystem...



You could call dmesg and check the kernel messages for your problem. If interrupting really does not work after 10 minutes (when one read or write system call is expected to time out and there is a chance to kill dd between two such syscalls) you need to reboot.



If this is a device at USB, you could try to pull the device before you reboot.






share|improve this answer


















  • 1




    I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain why fast IO is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
    – Alexandria P.
    Sep 10 at 21:04







  • 1




    Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
    – Alexandria P.
    Sep 11 at 3:33










  • Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
    – Alexandria P.
    Sep 12 at 7:25










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468100%2fdd-hanging-uninterruptible-sleep-kernel-quirk%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote













If you cannot interrupt an "uninterruptable read" and this is not related to a switched off NFS server, you discovered a driver bug.



I/O to local background storage should not have a timeout that is larger than 5-10 minutes. So if you type ^C or ^Z and nothing happens within 10 minutes, there is a driver bug.



The background is that UNIX defines that so called fast IO is not interruptable by signals because fast IO will terminate after a forseeable amount of time.



Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state. Everything that happened after startin the IO needs to be unwound and a return can only happen from where the IO was initiated.



Even worse, if a background storage driver did implement interruptable IO, this would cause unhandlable problems in filesystems above such a driver. You are using a driver that is intended to be used as background storage for a filesystem...



You could call dmesg and check the kernel messages for your problem. If interrupting really does not work after 10 minutes (when one read or write system call is expected to time out and there is a chance to kill dd between two such syscalls) you need to reboot.



If this is a device at USB, you could try to pull the device before you reboot.






share|improve this answer


















  • 1




    I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain why fast IO is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
    – Alexandria P.
    Sep 10 at 21:04







  • 1




    Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
    – Alexandria P.
    Sep 11 at 3:33










  • Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
    – Alexandria P.
    Sep 12 at 7:25














up vote
1
down vote













If you cannot interrupt an "uninterruptable read" and this is not related to a switched off NFS server, you discovered a driver bug.



I/O to local background storage should not have a timeout that is larger than 5-10 minutes. So if you type ^C or ^Z and nothing happens within 10 minutes, there is a driver bug.



The background is that UNIX defines that so called fast IO is not interruptable by signals because fast IO will terminate after a forseeable amount of time.



Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state. Everything that happened after startin the IO needs to be unwound and a return can only happen from where the IO was initiated.



Even worse, if a background storage driver did implement interruptable IO, this would cause unhandlable problems in filesystems above such a driver. You are using a driver that is intended to be used as background storage for a filesystem...



You could call dmesg and check the kernel messages for your problem. If interrupting really does not work after 10 minutes (when one read or write system call is expected to time out and there is a chance to kill dd between two such syscalls) you need to reboot.



If this is a device at USB, you could try to pull the device before you reboot.






share|improve this answer


















  • 1




    I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain why fast IO is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
    – Alexandria P.
    Sep 10 at 21:04







  • 1




    Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
    – Alexandria P.
    Sep 11 at 3:33










  • Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
    – Alexandria P.
    Sep 12 at 7:25












up vote
1
down vote










up vote
1
down vote









If you cannot interrupt an "uninterruptable read" and this is not related to a switched off NFS server, you discovered a driver bug.



I/O to local background storage should not have a timeout that is larger than 5-10 minutes. So if you type ^C or ^Z and nothing happens within 10 minutes, there is a driver bug.



The background is that UNIX defines that so called fast IO is not interruptable by signals because fast IO will terminate after a forseeable amount of time.



Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state. Everything that happened after startin the IO needs to be unwound and a return can only happen from where the IO was initiated.



Even worse, if a background storage driver did implement interruptable IO, this would cause unhandlable problems in filesystems above such a driver. You are using a driver that is intended to be used as background storage for a filesystem...



You could call dmesg and check the kernel messages for your problem. If interrupting really does not work after 10 minutes (when one read or write system call is expected to time out and there is a chance to kill dd between two such syscalls) you need to reboot.



If this is a device at USB, you could try to pull the device before you reboot.






share|improve this answer














If you cannot interrupt an "uninterruptable read" and this is not related to a switched off NFS server, you discovered a driver bug.



I/O to local background storage should not have a timeout that is larger than 5-10 minutes. So if you type ^C or ^Z and nothing happens within 10 minutes, there is a driver bug.



The background is that UNIX defines that so called fast IO is not interruptable by signals because fast IO will terminate after a forseeable amount of time.



Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state. Everything that happened after startin the IO needs to be unwound and a return can only happen from where the IO was initiated.



Even worse, if a background storage driver did implement interruptable IO, this would cause unhandlable problems in filesystems above such a driver. You are using a driver that is intended to be used as background storage for a filesystem...



You could call dmesg and check the kernel messages for your problem. If interrupting really does not work after 10 minutes (when one read or write system call is expected to time out and there is a chance to kill dd between two such syscalls) you need to reboot.



If this is a device at USB, you could try to pull the device before you reboot.







share|improve this answer














share|improve this answer



share|improve this answer








edited Sep 11 at 7:32

























answered Sep 10 at 20:51









schily

9,64131537




9,64131537







  • 1




    I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain why fast IO is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
    – Alexandria P.
    Sep 10 at 21:04







  • 1




    Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
    – Alexandria P.
    Sep 11 at 3:33










  • Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
    – Alexandria P.
    Sep 12 at 7:25












  • 1




    I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain why fast IO is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
    – Alexandria P.
    Sep 10 at 21:04







  • 1




    Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
    – Alexandria P.
    Sep 11 at 3:33










  • Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
    – Alexandria P.
    Sep 12 at 7:25







1




1




I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain why fast IO is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
– Alexandria P.
Sep 10 at 21:04





I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain why fast IO is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
– Alexandria P.
Sep 10 at 21:04





1




1




Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
– Alexandria P.
Sep 11 at 3:33




Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
– Alexandria P.
Sep 11 at 3:33












Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
– Alexandria P.
Sep 12 at 7:25




Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
– Alexandria P.
Sep 12 at 7:25

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468100%2fdd-hanging-uninterruptible-sleep-kernel-quirk%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Bahrain

Postfix configuration issue with fips on centos 7; mailgun relay