Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#804 closed defect (fixed)

Tracing a task causes it to crash

Reported by: Jiri Svoboda Owned by:
Priority: major Milestone: 0.11.1
Component: helenos/app/trace Version: mainline
Keywords: udebug Cc:
Blocker for: Depends on:
See also:

Description

Running /app/trace +s <command> causes that command to crash (usually with something like a null pointer dereference), after that the tracer waits indefinitely (until you press Ctrl-Q).

It looks like this is a regression introduced between release 0.4.1 and 0.4.2.

Change History (9)

comment:1 by Jiri Svoboda, 5 years ago

If I kill taskmon beforehand, the task still faults, but trace will at least exit afterwards.

comment:2 by Jiri Svoboda, 5 years ago

An example with /app/tester (disabled shared libraries):

Task /app/tester (74) killed due to an exception at program counter 0x00000000004073a3.
cs =0x0000000000000023	rip=0x00000000004073a3	rfl=0x0000000000210246	err=0x0000000000000004
ss =0x000000000000001b
rax=0x0000000000000000	rbx=0x0000000070018620	rcx=0x00000000004895e0	rdx=0x0000000000000000
rsi=0x000000000042085f	rdi=0x0000000000000000	rbp=0x0000000070141d80	rsp=0x0000000070141d50
r8 =0x0000000000000000	r9 =0x0000000000000000	r10=0x0000000000000000	r11=0x0000000000200216
r12=0x0000000000000000	r13=0x00000000700405a0	r14=0x0000000070141d98	r15=0x0000000000000001
0x0000000070141d80: 0x00000000004073a3()
0x0000000070141dc0: 0x0000000000417ab8()
0x0000000070141df0: 0x00000000004062e3()
0x0000000070141e10: 0x0000000000406163()
0x0000000070141e20: 0x00000000004060ca()
Kill message: Page fault: 0x0000000000000000.
[/srv/taskmon(16)] taskmon: Task 74 fault in thread 0xffffffff85dab3d0.

The stack trace translates as:

0x0000000070141d80: 0x00000000004073a3() str_size+3
0x0000000070141dc0: 0x0000000000417ab8() vfs_cwd_set
0x0000000070141df0: 0x00000000004062e3() __libc_main
...

I got the same stack trace with another binary. Here's the disassembly of str_size:

00000000004073a0 <str_size>:
  4073a0:       55                      push   %rbp
  4073a1:       31 c0                   xor    %eax,%eax
  4073a3:       80 3f 00                cmpb   $0x0,(%rdi)

Here %rdi is zero, hence the fault.

comment:3 by Jiri Svoboda, 5 years ago

I suspected the problem could be arch-specific, but confirmed this, apart from amd64, on ia32, arm32 and ppc32.

comment:4 by Jiri Svoboda, 5 years ago

The root cause: {/app/trace} does not use task_spawnxxx to launch the command it is passed, it has a function preload_task that mimics task_spawnvf, but it got out of sync over time.

The only difference should be that it does not actually start the program at the end, giving a chance to connect the debugger to it.

Ideally we'd find a way to deduplicate the code (or at least move it close together) to prevent this from happening again in the future.

Also not good that this went unnoticed for 9 years :-( We should add a test for this.

comment:5 by Jiri Svoboda, 5 years ago

Component: helenos/kernel/generichelenos/app/trace

comment:6 by Jakub Jermář, 5 years ago

Keywords: udebug added

comment:7 by Jiri Svoboda, 5 years ago

Resolution: fixed
Status: newclosed

comment:8 by Jiri Svoboda, 5 years ago

Milestone: 0.9.2

comment:9 by Jakub Jermář, 4 years ago

Milestone: 0.9.20.11.1

Milestone renamed

Note: See TracTickets for help on using tickets.