dd hanging & uninterruptible sleep (kernel quirk?)
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
So, this has been annoying me for years.
This happens with more programs than just dd, but I find it happens very often with programs that involve raw filesystem manipulation.
When I'm copying with dd -- e.g., making a bootable USB disk by doing sudo dd if=somelinuxdistro.iso of=/dev/sdb bs=64K status=progress
, it's like all my signals are ignored by the application. (Or by the kernel in the case of SIGKILL
) htop
shows status D
, which apparently means "uninterruptible sleep". It can stay in this state for ages if there's a hardware glitch, and in regular usage I can't seem to find any way of detaching it from the terminal so I can keep working -- often I just end up switching to a different terminal to finish my work.
I've looked this up before, but I've never found an explanation of what this state is for or why the kernel refuses to kill a process in this state -- or what exactly the recommended thing is to do to avoid wasting time. (Nor any recommendations of what to do when this happens.)
In short:
I'd like to have is a way of reliably force killing processes in state D, or at least detaching them from the terminal. And I'd also like an explanation of what's going on in the background to cause them to be in this state in the first place.
filesystems kernel dd hang
add a comment |Â
up vote
1
down vote
favorite
So, this has been annoying me for years.
This happens with more programs than just dd, but I find it happens very often with programs that involve raw filesystem manipulation.
When I'm copying with dd -- e.g., making a bootable USB disk by doing sudo dd if=somelinuxdistro.iso of=/dev/sdb bs=64K status=progress
, it's like all my signals are ignored by the application. (Or by the kernel in the case of SIGKILL
) htop
shows status D
, which apparently means "uninterruptible sleep". It can stay in this state for ages if there's a hardware glitch, and in regular usage I can't seem to find any way of detaching it from the terminal so I can keep working -- often I just end up switching to a different terminal to finish my work.
I've looked this up before, but I've never found an explanation of what this state is for or why the kernel refuses to kill a process in this state -- or what exactly the recommended thing is to do to avoid wasting time. (Nor any recommendations of what to do when this happens.)
In short:
I'd like to have is a way of reliably force killing processes in state D, or at least detaching them from the terminal. And I'd also like an explanation of what's going on in the background to cause them to be in this state in the first place.
filesystems kernel dd hang
Have you tried Ctrl+Z?
â G-Man
Sep 10 at 20:23
You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (echo 3 > /proc/sys/vm/drop_caches
) helps, see if it helps in your case, too.
â dirkt
Sep 11 at 6:05
@G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
â Alexandria P.
Sep 11 at 6:47
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
So, this has been annoying me for years.
This happens with more programs than just dd, but I find it happens very often with programs that involve raw filesystem manipulation.
When I'm copying with dd -- e.g., making a bootable USB disk by doing sudo dd if=somelinuxdistro.iso of=/dev/sdb bs=64K status=progress
, it's like all my signals are ignored by the application. (Or by the kernel in the case of SIGKILL
) htop
shows status D
, which apparently means "uninterruptible sleep". It can stay in this state for ages if there's a hardware glitch, and in regular usage I can't seem to find any way of detaching it from the terminal so I can keep working -- often I just end up switching to a different terminal to finish my work.
I've looked this up before, but I've never found an explanation of what this state is for or why the kernel refuses to kill a process in this state -- or what exactly the recommended thing is to do to avoid wasting time. (Nor any recommendations of what to do when this happens.)
In short:
I'd like to have is a way of reliably force killing processes in state D, or at least detaching them from the terminal. And I'd also like an explanation of what's going on in the background to cause them to be in this state in the first place.
filesystems kernel dd hang
So, this has been annoying me for years.
This happens with more programs than just dd, but I find it happens very often with programs that involve raw filesystem manipulation.
When I'm copying with dd -- e.g., making a bootable USB disk by doing sudo dd if=somelinuxdistro.iso of=/dev/sdb bs=64K status=progress
, it's like all my signals are ignored by the application. (Or by the kernel in the case of SIGKILL
) htop
shows status D
, which apparently means "uninterruptible sleep". It can stay in this state for ages if there's a hardware glitch, and in regular usage I can't seem to find any way of detaching it from the terminal so I can keep working -- often I just end up switching to a different terminal to finish my work.
I've looked this up before, but I've never found an explanation of what this state is for or why the kernel refuses to kill a process in this state -- or what exactly the recommended thing is to do to avoid wasting time. (Nor any recommendations of what to do when this happens.)
In short:
I'd like to have is a way of reliably force killing processes in state D, or at least detaching them from the terminal. And I'd also like an explanation of what's going on in the background to cause them to be in this state in the first place.
filesystems kernel dd hang
filesystems kernel dd hang
edited Sep 10 at 20:01
Rui F Ribeiro
36.8k1273117
36.8k1273117
asked Sep 10 at 19:59
Alexandria P.
93
93
Have you tried Ctrl+Z?
â G-Man
Sep 10 at 20:23
You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (echo 3 > /proc/sys/vm/drop_caches
) helps, see if it helps in your case, too.
â dirkt
Sep 11 at 6:05
@G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
â Alexandria P.
Sep 11 at 6:47
add a comment |Â
Have you tried Ctrl+Z?
â G-Man
Sep 10 at 20:23
You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (echo 3 > /proc/sys/vm/drop_caches
) helps, see if it helps in your case, too.
â dirkt
Sep 11 at 6:05
@G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
â Alexandria P.
Sep 11 at 6:47
Have you tried Ctrl+Z?
â G-Man
Sep 10 at 20:23
Have you tried Ctrl+Z?
â G-Man
Sep 10 at 20:23
You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (
echo 3 > /proc/sys/vm/drop_caches
) helps, see if it helps in your case, too.â dirkt
Sep 11 at 6:05
You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (
echo 3 > /proc/sys/vm/drop_caches
) helps, see if it helps in your case, too.â dirkt
Sep 11 at 6:05
@G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
â Alexandria P.
Sep 11 at 6:47
@G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
â Alexandria P.
Sep 11 at 6:47
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
If you cannot interrupt an "uninterruptable read" and this is not related to a switched off NFS server, you discovered a driver bug.
I/O to local background storage should not have a timeout that is larger than 5-10 minutes. So if you type ^C
or ^Z
and nothing happens within 10 minutes, there is a driver bug.
The background is that UNIX defines that so called fast IO
is not interruptable by signals because fast IO will terminate after a forseeable amount of time.
Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state. Everything that happened after startin the IO needs to be unwound and a return can only happen from where the IO was initiated.
Even worse, if a background storage driver did implement interruptable IO, this would cause unhandlable problems in filesystems above such a driver. You are using a driver that is intended to be used as background storage for a filesystem...
You could call dmesg
and check the kernel messages for your problem. If interrupting really does not work after 10 minutes (when one read or write system call is expected to time out and there is a chance to kill dd
between two such syscalls) you need to reboot.
If this is a device at USB, you could try to pull the device before you reboot.
1
I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain whyfast IO
is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
â Alexandria P.
Sep 10 at 21:04
1
Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
â Alexandria P.
Sep 11 at 3:33
Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
â Alexandria P.
Sep 12 at 7:25
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
If you cannot interrupt an "uninterruptable read" and this is not related to a switched off NFS server, you discovered a driver bug.
I/O to local background storage should not have a timeout that is larger than 5-10 minutes. So if you type ^C
or ^Z
and nothing happens within 10 minutes, there is a driver bug.
The background is that UNIX defines that so called fast IO
is not interruptable by signals because fast IO will terminate after a forseeable amount of time.
Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state. Everything that happened after startin the IO needs to be unwound and a return can only happen from where the IO was initiated.
Even worse, if a background storage driver did implement interruptable IO, this would cause unhandlable problems in filesystems above such a driver. You are using a driver that is intended to be used as background storage for a filesystem...
You could call dmesg
and check the kernel messages for your problem. If interrupting really does not work after 10 minutes (when one read or write system call is expected to time out and there is a chance to kill dd
between two such syscalls) you need to reboot.
If this is a device at USB, you could try to pull the device before you reboot.
1
I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain whyfast IO
is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
â Alexandria P.
Sep 10 at 21:04
1
Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
â Alexandria P.
Sep 11 at 3:33
Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
â Alexandria P.
Sep 12 at 7:25
add a comment |Â
up vote
1
down vote
If you cannot interrupt an "uninterruptable read" and this is not related to a switched off NFS server, you discovered a driver bug.
I/O to local background storage should not have a timeout that is larger than 5-10 minutes. So if you type ^C
or ^Z
and nothing happens within 10 minutes, there is a driver bug.
The background is that UNIX defines that so called fast IO
is not interruptable by signals because fast IO will terminate after a forseeable amount of time.
Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state. Everything that happened after startin the IO needs to be unwound and a return can only happen from where the IO was initiated.
Even worse, if a background storage driver did implement interruptable IO, this would cause unhandlable problems in filesystems above such a driver. You are using a driver that is intended to be used as background storage for a filesystem...
You could call dmesg
and check the kernel messages for your problem. If interrupting really does not work after 10 minutes (when one read or write system call is expected to time out and there is a chance to kill dd
between two such syscalls) you need to reboot.
If this is a device at USB, you could try to pull the device before you reboot.
1
I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain whyfast IO
is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
â Alexandria P.
Sep 10 at 21:04
1
Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
â Alexandria P.
Sep 11 at 3:33
Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
â Alexandria P.
Sep 12 at 7:25
add a comment |Â
up vote
1
down vote
up vote
1
down vote
If you cannot interrupt an "uninterruptable read" and this is not related to a switched off NFS server, you discovered a driver bug.
I/O to local background storage should not have a timeout that is larger than 5-10 minutes. So if you type ^C
or ^Z
and nothing happens within 10 minutes, there is a driver bug.
The background is that UNIX defines that so called fast IO
is not interruptable by signals because fast IO will terminate after a forseeable amount of time.
Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state. Everything that happened after startin the IO needs to be unwound and a return can only happen from where the IO was initiated.
Even worse, if a background storage driver did implement interruptable IO, this would cause unhandlable problems in filesystems above such a driver. You are using a driver that is intended to be used as background storage for a filesystem...
You could call dmesg
and check the kernel messages for your problem. If interrupting really does not work after 10 minutes (when one read or write system call is expected to time out and there is a chance to kill dd
between two such syscalls) you need to reboot.
If this is a device at USB, you could try to pull the device before you reboot.
If you cannot interrupt an "uninterruptable read" and this is not related to a switched off NFS server, you discovered a driver bug.
I/O to local background storage should not have a timeout that is larger than 5-10 minutes. So if you type ^C
or ^Z
and nothing happens within 10 minutes, there is a driver bug.
The background is that UNIX defines that so called fast IO
is not interruptable by signals because fast IO will terminate after a forseeable amount of time.
Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state. Everything that happened after startin the IO needs to be unwound and a return can only happen from where the IO was initiated.
Even worse, if a background storage driver did implement interruptable IO, this would cause unhandlable problems in filesystems above such a driver. You are using a driver that is intended to be used as background storage for a filesystem...
You could call dmesg
and check the kernel messages for your problem. If interrupting really does not work after 10 minutes (when one read or write system call is expected to time out and there is a chance to kill dd
between two such syscalls) you need to reboot.
If this is a device at USB, you could try to pull the device before you reboot.
edited Sep 11 at 7:32
answered Sep 10 at 20:51
schily
9,64131537
9,64131537
1
I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain whyfast IO
is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
â Alexandria P.
Sep 10 at 21:04
1
Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
â Alexandria P.
Sep 11 at 3:33
Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
â Alexandria P.
Sep 12 at 7:25
add a comment |Â
1
I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain whyfast IO
is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.
â Alexandria P.
Sep 10 at 21:04
1
Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
â Alexandria P.
Sep 11 at 3:33
Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
â Alexandria P.
Sep 12 at 7:25
1
1
I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain why
fast IO
is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.â Alexandria P.
Sep 10 at 21:04
I understand that there may be problems with the driver or with the hardware -- but this doesn't address my question of how to deal with the workflow problem. Nor does it explain why
fast IO
is designed to be non-interruptable when it potentially causes problems by assuming the backend is working properly. What advantage is there to blocking the user from killing a process like this? If the user is trying to force kill it, they surely already understand that file corruption is likely.â Alexandria P.
Sep 10 at 21:04
1
1
Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
â Alexandria P.
Sep 11 at 3:33
Can you clarify what you meant by "Making IO interruptable by signals causes a high overhead as there is a need to go back to a clean state"?
â Alexandria P.
Sep 11 at 3:33
Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
â Alexandria P.
Sep 12 at 7:25
Hmmm. I'm still a little confused. Why can't the kernel just drop all the IO stuff it was doing? I mean, that's seemingly what happens when the device gets physically unplugged -- so it can't be any worse than unplugging a device.
â Alexandria P.
Sep 12 at 7:25
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f468100%2fdd-hanging-uninterruptible-sleep-kernel-quirk%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Have you tried Ctrl+Z?
â G-Man
Sep 10 at 20:23
You can't kill processes in state D, that's how it is. I have a similar problem on my machine, where apparently I/O gets stuck (kernel bug?). Dropping caches (
echo 3 > /proc/sys/vm/drop_caches
) helps, see if it helps in your case, too.â dirkt
Sep 11 at 6:05
@G-Man Yes. And Ctrl+C, and Ctrl+U, and kill -9 [PID]. :P
â Alexandria P.
Sep 11 at 6:47