rcu_sched detected stall on CPU
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
Seen multiple rcu_sched stall messages in a customer device and it gets crashed/hung. Under this condition, the device is not accessible via SSH or 3G.
Kernel version is 3.2.54. "rcu_sched detected stall on CPU 0" is repeated many times, what does this indicate? This device exhibits this crash during a power cycling test. acpower_isr()/poe_isr() is used to update AC power status/PoE status during each switch-over. Does this causing the issue? (unable to release the lock?)
Backtrace:
[<c4011504>] (dump_backtrace+0x0/0x110) from [<c43924bc>] (dump_stack+0x18/0x1c)
r6:c962e080 r5:c96462e0 r4:c9ec4674 r3:c96429bc
[<c43924a4>] (dump_stack+0x0/0x1c) from [<c4082188>] (__rcu_pending+0x88/0x38c)
[<c4082100>] (__rcu_pending+0x0/0x38c) from [<c4083218>] (rcu_check_callbacks+0xe8/0x17c)
[<c4083130>] (rcu_check_callbacks+0x0/0x17c) from [<c4043ac4>] (update_process_times+0x40/0x64)
r8:23339c9a r7:00000000 r6:c6f06ae0 r5:00000000 r4:c8ac8000
r3:00010000
[<c4043a84>] (update_process_times+0x0/0x64) from [<c406513c>] (tick_sched_timer+0x9c/0xdc)
r7:c9ec44a0 r6:c8ac9dd8 r5:c8ac8000 r4:c9ec4598
[<c40650a0>] (tick_sched_timer+0x0/0xdc) from [<c405805c>] (__run_hrtimer+0xf4/0x1c8)
r9:c8ac9d20 r8:23339580 r6:c9ec44d8 r5:c9ec44a0 r4:c9ec4598
[<c4057f68>] (__run_hrtimer+0x0/0x1c8) from [<c4058db4>] (hrtimer_interrupt+0x124/0x288)
[<c4058c90>] (hrtimer_interrupt+0x0/0x288) from [<c40139e0>] (twd_handler+0x28/0x30)
[<c40139b8>] (twd_handler+0x0/0x30) from [<c407f880>] (handle_percpu_devid_irq+0xd0/0x150)
r4:0000001d r3:c40139b8
[<c407f7b0>] (handle_percpu_devid_irq+0x0/0x150) from [<c407be30>] (generic_handle_irq+0x34/0x48)
[<c407bdfc>] (generic_handle_irq+0x0/0x48) from [<c400e5e0>] (handle_IRQ+0x80/0xc0)
[<c400e560>] (handle_IRQ+0x0/0xc0) from [<c40081d0>] (asm_do_IRQ+0x10/0x14)
r5:20000013 r4:c4395234
[<c40081c0>] (asm_do_IRQ+0x0/0x14) from [<c400d738>] (__irq_svc+0x38/0x120)
Exception stack(0xc8ac9dd8 to 0xc8ac9e20)
9dc0: c96ae534 00000013
9de0: 00000001 00000001 c96ae52c c82385a0 00000001 00000001 00006000 d0800000
9e00: d0800000 c8ac9e2c c8ac9e30 c8ac9e20 c40d2f4c c4395234 20000013 ffffffff
[<c4395218>] (_raw_spin_lock+0x0/0x30) from [<c40d2f4c>] (alloc_vmap_area.clone.18+0xa8/0x2f8)
[<c40d2ea4>] (alloc_vmap_area.clone.18+0x0/0x2f8) from [<c40d3268>] (__get_vm_area_node.clone.19+0xcc/0x164)
[<c40d319c>] (__get_vm_area_node.clone.19+0x0/0x164) from [<c40d3bec>] (__vmalloc_node_range+0x5c/0x1d0)
[<c40d3b90>] (__vmalloc_node_range+0x0/0x1d0) from [<c40d3da0>] (__vmalloc_node+0x40/0x4c)
r8:c400de84 r7:00000080 r6:00b7a080 r5:0000465c r4:0000465c
[<c40d3d60>] (__vmalloc_node+0x0/0x4c) from [<c40d3ee4>] (vmalloc+0x30/0x3c)
[<c40d3eb4>] (vmalloc+0x0/0x3c) from [<c406de40>] (sys_init_module+0x5c/0x1878)
[<c406dde4>] (sys_init_module+0x0/0x1878) from [<c400dd00>] (ret_fast_syscall+0x0/0x30)
acpower_isr() [105]
poe_isr() [136]
INFO: rcu_sched detected stall on CPU 0 (t=204330 jiffies)
linux-kernel process kernel-modules crash kernel-panic
add a comment |Â
up vote
2
down vote
favorite
Seen multiple rcu_sched stall messages in a customer device and it gets crashed/hung. Under this condition, the device is not accessible via SSH or 3G.
Kernel version is 3.2.54. "rcu_sched detected stall on CPU 0" is repeated many times, what does this indicate? This device exhibits this crash during a power cycling test. acpower_isr()/poe_isr() is used to update AC power status/PoE status during each switch-over. Does this causing the issue? (unable to release the lock?)
Backtrace:
[<c4011504>] (dump_backtrace+0x0/0x110) from [<c43924bc>] (dump_stack+0x18/0x1c)
r6:c962e080 r5:c96462e0 r4:c9ec4674 r3:c96429bc
[<c43924a4>] (dump_stack+0x0/0x1c) from [<c4082188>] (__rcu_pending+0x88/0x38c)
[<c4082100>] (__rcu_pending+0x0/0x38c) from [<c4083218>] (rcu_check_callbacks+0xe8/0x17c)
[<c4083130>] (rcu_check_callbacks+0x0/0x17c) from [<c4043ac4>] (update_process_times+0x40/0x64)
r8:23339c9a r7:00000000 r6:c6f06ae0 r5:00000000 r4:c8ac8000
r3:00010000
[<c4043a84>] (update_process_times+0x0/0x64) from [<c406513c>] (tick_sched_timer+0x9c/0xdc)
r7:c9ec44a0 r6:c8ac9dd8 r5:c8ac8000 r4:c9ec4598
[<c40650a0>] (tick_sched_timer+0x0/0xdc) from [<c405805c>] (__run_hrtimer+0xf4/0x1c8)
r9:c8ac9d20 r8:23339580 r6:c9ec44d8 r5:c9ec44a0 r4:c9ec4598
[<c4057f68>] (__run_hrtimer+0x0/0x1c8) from [<c4058db4>] (hrtimer_interrupt+0x124/0x288)
[<c4058c90>] (hrtimer_interrupt+0x0/0x288) from [<c40139e0>] (twd_handler+0x28/0x30)
[<c40139b8>] (twd_handler+0x0/0x30) from [<c407f880>] (handle_percpu_devid_irq+0xd0/0x150)
r4:0000001d r3:c40139b8
[<c407f7b0>] (handle_percpu_devid_irq+0x0/0x150) from [<c407be30>] (generic_handle_irq+0x34/0x48)
[<c407bdfc>] (generic_handle_irq+0x0/0x48) from [<c400e5e0>] (handle_IRQ+0x80/0xc0)
[<c400e560>] (handle_IRQ+0x0/0xc0) from [<c40081d0>] (asm_do_IRQ+0x10/0x14)
r5:20000013 r4:c4395234
[<c40081c0>] (asm_do_IRQ+0x0/0x14) from [<c400d738>] (__irq_svc+0x38/0x120)
Exception stack(0xc8ac9dd8 to 0xc8ac9e20)
9dc0: c96ae534 00000013
9de0: 00000001 00000001 c96ae52c c82385a0 00000001 00000001 00006000 d0800000
9e00: d0800000 c8ac9e2c c8ac9e30 c8ac9e20 c40d2f4c c4395234 20000013 ffffffff
[<c4395218>] (_raw_spin_lock+0x0/0x30) from [<c40d2f4c>] (alloc_vmap_area.clone.18+0xa8/0x2f8)
[<c40d2ea4>] (alloc_vmap_area.clone.18+0x0/0x2f8) from [<c40d3268>] (__get_vm_area_node.clone.19+0xcc/0x164)
[<c40d319c>] (__get_vm_area_node.clone.19+0x0/0x164) from [<c40d3bec>] (__vmalloc_node_range+0x5c/0x1d0)
[<c40d3b90>] (__vmalloc_node_range+0x0/0x1d0) from [<c40d3da0>] (__vmalloc_node+0x40/0x4c)
r8:c400de84 r7:00000080 r6:00b7a080 r5:0000465c r4:0000465c
[<c40d3d60>] (__vmalloc_node+0x0/0x4c) from [<c40d3ee4>] (vmalloc+0x30/0x3c)
[<c40d3eb4>] (vmalloc+0x0/0x3c) from [<c406de40>] (sys_init_module+0x5c/0x1878)
[<c406dde4>] (sys_init_module+0x0/0x1878) from [<c400dd00>] (ret_fast_syscall+0x0/0x30)
acpower_isr() [105]
poe_isr() [136]
INFO: rcu_sched detected stall on CPU 0 (t=204330 jiffies)
linux-kernel process kernel-modules crash kernel-panic
You should specify which kernel version this is, and try if you can with another (higher) version to see if the problem remains.
â Patrick Mevzek
Nov 28 '17 at 11:04
Kernel version is 3.2.54, since this is a customer unit, can not check with other version.
â Ravi
Nov 28 '17 at 11:11
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
Seen multiple rcu_sched stall messages in a customer device and it gets crashed/hung. Under this condition, the device is not accessible via SSH or 3G.
Kernel version is 3.2.54. "rcu_sched detected stall on CPU 0" is repeated many times, what does this indicate? This device exhibits this crash during a power cycling test. acpower_isr()/poe_isr() is used to update AC power status/PoE status during each switch-over. Does this causing the issue? (unable to release the lock?)
Backtrace:
[<c4011504>] (dump_backtrace+0x0/0x110) from [<c43924bc>] (dump_stack+0x18/0x1c)
r6:c962e080 r5:c96462e0 r4:c9ec4674 r3:c96429bc
[<c43924a4>] (dump_stack+0x0/0x1c) from [<c4082188>] (__rcu_pending+0x88/0x38c)
[<c4082100>] (__rcu_pending+0x0/0x38c) from [<c4083218>] (rcu_check_callbacks+0xe8/0x17c)
[<c4083130>] (rcu_check_callbacks+0x0/0x17c) from [<c4043ac4>] (update_process_times+0x40/0x64)
r8:23339c9a r7:00000000 r6:c6f06ae0 r5:00000000 r4:c8ac8000
r3:00010000
[<c4043a84>] (update_process_times+0x0/0x64) from [<c406513c>] (tick_sched_timer+0x9c/0xdc)
r7:c9ec44a0 r6:c8ac9dd8 r5:c8ac8000 r4:c9ec4598
[<c40650a0>] (tick_sched_timer+0x0/0xdc) from [<c405805c>] (__run_hrtimer+0xf4/0x1c8)
r9:c8ac9d20 r8:23339580 r6:c9ec44d8 r5:c9ec44a0 r4:c9ec4598
[<c4057f68>] (__run_hrtimer+0x0/0x1c8) from [<c4058db4>] (hrtimer_interrupt+0x124/0x288)
[<c4058c90>] (hrtimer_interrupt+0x0/0x288) from [<c40139e0>] (twd_handler+0x28/0x30)
[<c40139b8>] (twd_handler+0x0/0x30) from [<c407f880>] (handle_percpu_devid_irq+0xd0/0x150)
r4:0000001d r3:c40139b8
[<c407f7b0>] (handle_percpu_devid_irq+0x0/0x150) from [<c407be30>] (generic_handle_irq+0x34/0x48)
[<c407bdfc>] (generic_handle_irq+0x0/0x48) from [<c400e5e0>] (handle_IRQ+0x80/0xc0)
[<c400e560>] (handle_IRQ+0x0/0xc0) from [<c40081d0>] (asm_do_IRQ+0x10/0x14)
r5:20000013 r4:c4395234
[<c40081c0>] (asm_do_IRQ+0x0/0x14) from [<c400d738>] (__irq_svc+0x38/0x120)
Exception stack(0xc8ac9dd8 to 0xc8ac9e20)
9dc0: c96ae534 00000013
9de0: 00000001 00000001 c96ae52c c82385a0 00000001 00000001 00006000 d0800000
9e00: d0800000 c8ac9e2c c8ac9e30 c8ac9e20 c40d2f4c c4395234 20000013 ffffffff
[<c4395218>] (_raw_spin_lock+0x0/0x30) from [<c40d2f4c>] (alloc_vmap_area.clone.18+0xa8/0x2f8)
[<c40d2ea4>] (alloc_vmap_area.clone.18+0x0/0x2f8) from [<c40d3268>] (__get_vm_area_node.clone.19+0xcc/0x164)
[<c40d319c>] (__get_vm_area_node.clone.19+0x0/0x164) from [<c40d3bec>] (__vmalloc_node_range+0x5c/0x1d0)
[<c40d3b90>] (__vmalloc_node_range+0x0/0x1d0) from [<c40d3da0>] (__vmalloc_node+0x40/0x4c)
r8:c400de84 r7:00000080 r6:00b7a080 r5:0000465c r4:0000465c
[<c40d3d60>] (__vmalloc_node+0x0/0x4c) from [<c40d3ee4>] (vmalloc+0x30/0x3c)
[<c40d3eb4>] (vmalloc+0x0/0x3c) from [<c406de40>] (sys_init_module+0x5c/0x1878)
[<c406dde4>] (sys_init_module+0x0/0x1878) from [<c400dd00>] (ret_fast_syscall+0x0/0x30)
acpower_isr() [105]
poe_isr() [136]
INFO: rcu_sched detected stall on CPU 0 (t=204330 jiffies)
linux-kernel process kernel-modules crash kernel-panic
Seen multiple rcu_sched stall messages in a customer device and it gets crashed/hung. Under this condition, the device is not accessible via SSH or 3G.
Kernel version is 3.2.54. "rcu_sched detected stall on CPU 0" is repeated many times, what does this indicate? This device exhibits this crash during a power cycling test. acpower_isr()/poe_isr() is used to update AC power status/PoE status during each switch-over. Does this causing the issue? (unable to release the lock?)
Backtrace:
[<c4011504>] (dump_backtrace+0x0/0x110) from [<c43924bc>] (dump_stack+0x18/0x1c)
r6:c962e080 r5:c96462e0 r4:c9ec4674 r3:c96429bc
[<c43924a4>] (dump_stack+0x0/0x1c) from [<c4082188>] (__rcu_pending+0x88/0x38c)
[<c4082100>] (__rcu_pending+0x0/0x38c) from [<c4083218>] (rcu_check_callbacks+0xe8/0x17c)
[<c4083130>] (rcu_check_callbacks+0x0/0x17c) from [<c4043ac4>] (update_process_times+0x40/0x64)
r8:23339c9a r7:00000000 r6:c6f06ae0 r5:00000000 r4:c8ac8000
r3:00010000
[<c4043a84>] (update_process_times+0x0/0x64) from [<c406513c>] (tick_sched_timer+0x9c/0xdc)
r7:c9ec44a0 r6:c8ac9dd8 r5:c8ac8000 r4:c9ec4598
[<c40650a0>] (tick_sched_timer+0x0/0xdc) from [<c405805c>] (__run_hrtimer+0xf4/0x1c8)
r9:c8ac9d20 r8:23339580 r6:c9ec44d8 r5:c9ec44a0 r4:c9ec4598
[<c4057f68>] (__run_hrtimer+0x0/0x1c8) from [<c4058db4>] (hrtimer_interrupt+0x124/0x288)
[<c4058c90>] (hrtimer_interrupt+0x0/0x288) from [<c40139e0>] (twd_handler+0x28/0x30)
[<c40139b8>] (twd_handler+0x0/0x30) from [<c407f880>] (handle_percpu_devid_irq+0xd0/0x150)
r4:0000001d r3:c40139b8
[<c407f7b0>] (handle_percpu_devid_irq+0x0/0x150) from [<c407be30>] (generic_handle_irq+0x34/0x48)
[<c407bdfc>] (generic_handle_irq+0x0/0x48) from [<c400e5e0>] (handle_IRQ+0x80/0xc0)
[<c400e560>] (handle_IRQ+0x0/0xc0) from [<c40081d0>] (asm_do_IRQ+0x10/0x14)
r5:20000013 r4:c4395234
[<c40081c0>] (asm_do_IRQ+0x0/0x14) from [<c400d738>] (__irq_svc+0x38/0x120)
Exception stack(0xc8ac9dd8 to 0xc8ac9e20)
9dc0: c96ae534 00000013
9de0: 00000001 00000001 c96ae52c c82385a0 00000001 00000001 00006000 d0800000
9e00: d0800000 c8ac9e2c c8ac9e30 c8ac9e20 c40d2f4c c4395234 20000013 ffffffff
[<c4395218>] (_raw_spin_lock+0x0/0x30) from [<c40d2f4c>] (alloc_vmap_area.clone.18+0xa8/0x2f8)
[<c40d2ea4>] (alloc_vmap_area.clone.18+0x0/0x2f8) from [<c40d3268>] (__get_vm_area_node.clone.19+0xcc/0x164)
[<c40d319c>] (__get_vm_area_node.clone.19+0x0/0x164) from [<c40d3bec>] (__vmalloc_node_range+0x5c/0x1d0)
[<c40d3b90>] (__vmalloc_node_range+0x0/0x1d0) from [<c40d3da0>] (__vmalloc_node+0x40/0x4c)
r8:c400de84 r7:00000080 r6:00b7a080 r5:0000465c r4:0000465c
[<c40d3d60>] (__vmalloc_node+0x0/0x4c) from [<c40d3ee4>] (vmalloc+0x30/0x3c)
[<c40d3eb4>] (vmalloc+0x0/0x3c) from [<c406de40>] (sys_init_module+0x5c/0x1878)
[<c406dde4>] (sys_init_module+0x0/0x1878) from [<c400dd00>] (ret_fast_syscall+0x0/0x30)
acpower_isr() [105]
poe_isr() [136]
INFO: rcu_sched detected stall on CPU 0 (t=204330 jiffies)
linux-kernel process kernel-modules crash kernel-panic
edited Dec 7 '17 at 8:30
asked Nov 28 '17 at 10:51
Ravi
329214
329214
You should specify which kernel version this is, and try if you can with another (higher) version to see if the problem remains.
â Patrick Mevzek
Nov 28 '17 at 11:04
Kernel version is 3.2.54, since this is a customer unit, can not check with other version.
â Ravi
Nov 28 '17 at 11:11
add a comment |Â
You should specify which kernel version this is, and try if you can with another (higher) version to see if the problem remains.
â Patrick Mevzek
Nov 28 '17 at 11:04
Kernel version is 3.2.54, since this is a customer unit, can not check with other version.
â Ravi
Nov 28 '17 at 11:11
You should specify which kernel version this is, and try if you can with another (higher) version to see if the problem remains.
â Patrick Mevzek
Nov 28 '17 at 11:04
You should specify which kernel version this is, and try if you can with another (higher) version to see if the problem remains.
â Patrick Mevzek
Nov 28 '17 at 11:04
Kernel version is 3.2.54, since this is a customer unit, can not check with other version.
â Ravi
Nov 28 '17 at 11:11
Kernel version is 3.2.54, since this is a customer unit, can not check with other version.
â Ravi
Nov 28 '17 at 11:11
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
3
down vote
From the stack we can see that this CPU is stuck in a spinlock while trying to allocate memory (_raw_spin_lock
inside alloc_vmap_area
). More interestingly, it seems this is happening while trying to load a new module (sys_init_module
), which just calls the module's initialisation code (through a pointer jump, which is why you don't see it in the stack trace).
This means that this is extremely likely to either be a kernel bug that's exercised when loading this module, or a bug in the module itself (probably the latter since vmalloc
is almost certainly called by the underlying module).
You need to find the module which is responsible for this bug -- look at the processes stuck in D state when this happens, or use something like eBPF to trace new calls to module initialisation.
When this happens, would not be able to access the unit. How do I check the process status that time?
â Ravi
Nov 29 '17 at 11:43
1
@Ravi You can use something likeatop
to do logging, then view processes in D state withatop -r
, navigating to the time desired. Of course, in a volatile state like this, this is not guaranteed to work, but there's a reasonable chance that it will be able to continue.
â Chris Down
Nov 29 '17 at 12:06
Unfortunately don't have many such utilities on the unit.
â Ravi
Nov 30 '17 at 12:10
Edited- crash logs. Does this "rcu_sched detected stall on CPU 0" is due to acpower_isr()/poe_isr()? _raw_spin_lock() is holding back the CPU indefinitely? Unfortunately there is no debug utility present in the controller which is in this bad state...
â Ravi
Dec 5 '17 at 4:53
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
From the stack we can see that this CPU is stuck in a spinlock while trying to allocate memory (_raw_spin_lock
inside alloc_vmap_area
). More interestingly, it seems this is happening while trying to load a new module (sys_init_module
), which just calls the module's initialisation code (through a pointer jump, which is why you don't see it in the stack trace).
This means that this is extremely likely to either be a kernel bug that's exercised when loading this module, or a bug in the module itself (probably the latter since vmalloc
is almost certainly called by the underlying module).
You need to find the module which is responsible for this bug -- look at the processes stuck in D state when this happens, or use something like eBPF to trace new calls to module initialisation.
When this happens, would not be able to access the unit. How do I check the process status that time?
â Ravi
Nov 29 '17 at 11:43
1
@Ravi You can use something likeatop
to do logging, then view processes in D state withatop -r
, navigating to the time desired. Of course, in a volatile state like this, this is not guaranteed to work, but there's a reasonable chance that it will be able to continue.
â Chris Down
Nov 29 '17 at 12:06
Unfortunately don't have many such utilities on the unit.
â Ravi
Nov 30 '17 at 12:10
Edited- crash logs. Does this "rcu_sched detected stall on CPU 0" is due to acpower_isr()/poe_isr()? _raw_spin_lock() is holding back the CPU indefinitely? Unfortunately there is no debug utility present in the controller which is in this bad state...
â Ravi
Dec 5 '17 at 4:53
add a comment |Â
up vote
3
down vote
From the stack we can see that this CPU is stuck in a spinlock while trying to allocate memory (_raw_spin_lock
inside alloc_vmap_area
). More interestingly, it seems this is happening while trying to load a new module (sys_init_module
), which just calls the module's initialisation code (through a pointer jump, which is why you don't see it in the stack trace).
This means that this is extremely likely to either be a kernel bug that's exercised when loading this module, or a bug in the module itself (probably the latter since vmalloc
is almost certainly called by the underlying module).
You need to find the module which is responsible for this bug -- look at the processes stuck in D state when this happens, or use something like eBPF to trace new calls to module initialisation.
When this happens, would not be able to access the unit. How do I check the process status that time?
â Ravi
Nov 29 '17 at 11:43
1
@Ravi You can use something likeatop
to do logging, then view processes in D state withatop -r
, navigating to the time desired. Of course, in a volatile state like this, this is not guaranteed to work, but there's a reasonable chance that it will be able to continue.
â Chris Down
Nov 29 '17 at 12:06
Unfortunately don't have many such utilities on the unit.
â Ravi
Nov 30 '17 at 12:10
Edited- crash logs. Does this "rcu_sched detected stall on CPU 0" is due to acpower_isr()/poe_isr()? _raw_spin_lock() is holding back the CPU indefinitely? Unfortunately there is no debug utility present in the controller which is in this bad state...
â Ravi
Dec 5 '17 at 4:53
add a comment |Â
up vote
3
down vote
up vote
3
down vote
From the stack we can see that this CPU is stuck in a spinlock while trying to allocate memory (_raw_spin_lock
inside alloc_vmap_area
). More interestingly, it seems this is happening while trying to load a new module (sys_init_module
), which just calls the module's initialisation code (through a pointer jump, which is why you don't see it in the stack trace).
This means that this is extremely likely to either be a kernel bug that's exercised when loading this module, or a bug in the module itself (probably the latter since vmalloc
is almost certainly called by the underlying module).
You need to find the module which is responsible for this bug -- look at the processes stuck in D state when this happens, or use something like eBPF to trace new calls to module initialisation.
From the stack we can see that this CPU is stuck in a spinlock while trying to allocate memory (_raw_spin_lock
inside alloc_vmap_area
). More interestingly, it seems this is happening while trying to load a new module (sys_init_module
), which just calls the module's initialisation code (through a pointer jump, which is why you don't see it in the stack trace).
This means that this is extremely likely to either be a kernel bug that's exercised when loading this module, or a bug in the module itself (probably the latter since vmalloc
is almost certainly called by the underlying module).
You need to find the module which is responsible for this bug -- look at the processes stuck in D state when this happens, or use something like eBPF to trace new calls to module initialisation.
answered Nov 28 '17 at 12:18
Chris Down
75.7k11178195
75.7k11178195
When this happens, would not be able to access the unit. How do I check the process status that time?
â Ravi
Nov 29 '17 at 11:43
1
@Ravi You can use something likeatop
to do logging, then view processes in D state withatop -r
, navigating to the time desired. Of course, in a volatile state like this, this is not guaranteed to work, but there's a reasonable chance that it will be able to continue.
â Chris Down
Nov 29 '17 at 12:06
Unfortunately don't have many such utilities on the unit.
â Ravi
Nov 30 '17 at 12:10
Edited- crash logs. Does this "rcu_sched detected stall on CPU 0" is due to acpower_isr()/poe_isr()? _raw_spin_lock() is holding back the CPU indefinitely? Unfortunately there is no debug utility present in the controller which is in this bad state...
â Ravi
Dec 5 '17 at 4:53
add a comment |Â
When this happens, would not be able to access the unit. How do I check the process status that time?
â Ravi
Nov 29 '17 at 11:43
1
@Ravi You can use something likeatop
to do logging, then view processes in D state withatop -r
, navigating to the time desired. Of course, in a volatile state like this, this is not guaranteed to work, but there's a reasonable chance that it will be able to continue.
â Chris Down
Nov 29 '17 at 12:06
Unfortunately don't have many such utilities on the unit.
â Ravi
Nov 30 '17 at 12:10
Edited- crash logs. Does this "rcu_sched detected stall on CPU 0" is due to acpower_isr()/poe_isr()? _raw_spin_lock() is holding back the CPU indefinitely? Unfortunately there is no debug utility present in the controller which is in this bad state...
â Ravi
Dec 5 '17 at 4:53
When this happens, would not be able to access the unit. How do I check the process status that time?
â Ravi
Nov 29 '17 at 11:43
When this happens, would not be able to access the unit. How do I check the process status that time?
â Ravi
Nov 29 '17 at 11:43
1
1
@Ravi You can use something like
atop
to do logging, then view processes in D state with atop -r
, navigating to the time desired. Of course, in a volatile state like this, this is not guaranteed to work, but there's a reasonable chance that it will be able to continue.â Chris Down
Nov 29 '17 at 12:06
@Ravi You can use something like
atop
to do logging, then view processes in D state with atop -r
, navigating to the time desired. Of course, in a volatile state like this, this is not guaranteed to work, but there's a reasonable chance that it will be able to continue.â Chris Down
Nov 29 '17 at 12:06
Unfortunately don't have many such utilities on the unit.
â Ravi
Nov 30 '17 at 12:10
Unfortunately don't have many such utilities on the unit.
â Ravi
Nov 30 '17 at 12:10
Edited- crash logs. Does this "rcu_sched detected stall on CPU 0" is due to acpower_isr()/poe_isr()? _raw_spin_lock() is holding back the CPU indefinitely? Unfortunately there is no debug utility present in the controller which is in this bad state...
â Ravi
Dec 5 '17 at 4:53
Edited- crash logs. Does this "rcu_sched detected stall on CPU 0" is due to acpower_isr()/poe_isr()? _raw_spin_lock() is holding back the CPU indefinitely? Unfortunately there is no debug utility present in the controller which is in this bad state...
â Ravi
Dec 5 '17 at 4:53
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f407466%2frcu-sched-detected-stall-on-cpu%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
You should specify which kernel version this is, and try if you can with another (higher) version to see if the problem remains.
â Patrick Mevzek
Nov 28 '17 at 11:04
Kernel version is 3.2.54, since this is a customer unit, can not check with other version.
â Ravi
Nov 28 '17 at 11:11