Opened 6 years ago
Closed 2 years ago
#752 closed defect (fixed)
hound crashes on a dual-CPU system
Reported by: | Jakub Jermář | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 0.12.1 |
Component: | helenos/srv/hound | Version: | mainline |
Keywords: | Cc: | ||
Blocker for: | Depends on: | ||
See also: |
Description
When booted with multiple CPUs, the following hound crash is reproducible after playing demo.wav using wavplay:
[hdaudio] note: hda_stop_playback() [hound] warn: Connection failed to provide enough data 0/4096 [hdaudio] note: hda_stream_stop() [hdaudio n]ote: hda_stream_rese() Task /srv/audio/hound (t53) killed due to an exception at program counter 0x00000000004020bc. cs =0x0000000000000023 rip=0x00000000004020bc rfl=0x0000000000210246 err=0x0000000000000004 ss =0x000000000000001b rax=0x0000000000000400 rbx=0xfffffffffffffff0 rcx=0x0000000000000004 rdx=0x0000000000000003 rsi=0x000000000095a000 rdi=0xfffffffffffffff0 rbp=0x0000000000a60e40 rsp=0x0000000000a60e10 r8 =0x0000000000000004 r9 =0x0000000000000000 r10=0x0000000000000000 r11=0x0000000000200216 r12=0x0000000000001000 r13=0x0000000000000020 r14=0x000000000095a000 r15=0x0000000000000000 0x0000000000a60e40: 0x00000000004020bc() 0x0000000000a60e90: 0x0000000000401b0c() 0x0000000000a60f40: 0x000000000040103d() 0x0000000000a60fd0: 0x0000000000413a2c() 0x0000000000a60ff0: 0x0000000000411399() Kill message: Page fault: 0x0000000000000038. [/srv/taskmon(16)] taskmon: Task 53 fault in thread 0xffffffff8458e3f8. [/srv/taskmon(16)] taskmon: Executing /app/taskdump -t 53 [/app/taskdump(68)] Task Dump Utility [/app/taskdump(68)] Dumping task '/srv/audio/hound' (task ID 53). [/app/taskdump(68)] Loaded symbol table from /srv/audio/hound [/app/taskdump(68)] Threads: [/app/taskdump(68)] [1] hash: 0xffffffff8458e3f8 [/app/taskdump(68)] Thread 0xffffffff8458e3f8: PC = 0x00000000004020bc (connection_add_source_data+76). FP = 0x0000000000a60e40 [/app/taskdump(68)] 0x0000000000a60e40: 0x00000000004020bc (connection_add_source_data+76) [/app/taskdump(68)] 0x0000000000a60e90: 0x0000000000401b0c (audio_sink_mix_inputs+124) [/app/taskdump(68)] 0x0000000000a60f40: 0x000000000040103d (device_event_callback+141) [/app/taskdump(68)] 0x0000000000a60fd0: 0x0000000000413a2c (connection_fibril+124) [/app/taskdump(68)] 0x0000000000a60ff0: 0x0000000000411399 (_fibril_main+41)
Attachments (1)
Change History (9)
comment:1 by , 6 years ago
by , 6 years ago
comment:2 by , 6 years ago
It seems the following two stacks touch the same data structures, the first one without taking any locks:
The fibril that crashed:
audio_pipe_frames+43 connection_add_source_data+116 audio_sink_mix_inputs+163 device_event_callback+225 connection_fibril+288 _fibril_main+59
The fibril which is waiting for IPC:
context_swap+28 _fibril_switch_to+303 fibril_wait_timeout+553 fibril_wait_for+64 async_wait_for+62 async_req_fast+144 audio_pcm_stop_playback+81 device_sink_connection_callback+637 connection_destroy+170 hound_remove_source_internal+185 hound_remove_ctx+150 iface_rem_context+99 hound_connection_handler+493 connection_fibril+288 _fibril_main+59
comment:3 by , 6 years ago
Component: | helenos/unspecified → helenos/srv/hound |
---|
comment:4 by , 6 years ago
I can reproduce this reliably on a default build when I add an artificial fibril_usleep() like this:
--- a/uspace/srv/audio/hound/audio_sink.c +++ b/uspace/srv/audio/hound/audio_sink.c @@ -139,6 +139,7 @@ void audio_sink_mix_inputs(audio_sink_t *sink, void *dest, size_t size) pcm_format_silence(dest, size, &sink->format); list_foreach(sink->connections, sink_link, connection_t, conn) { + fibril_usleep(1000); const errno_t ret = connection_add_source_data( conn, dest, size, sink->format); if (ret != EOK) {
followed by running:
/ # wavplay demo.wav
Looks like the list_foreach()
above should apparently take the hound->list_guard
mutex.
comment:5 by , 6 years ago
Milestone: | 0.8.0 |
---|
comment:6 by , 5 years ago
Not 100% sure if it's the same thing, but you can reproduce this when GZX exits. Also I have a very old branch on my laptop with an audio player (aplayer) that is having the same problem. With aplayer I worked around it by adding a delay before destroying the hound context. I believe the problem happens if you destroy the context before playback finishes.
comment:7 by , 2 years ago
Could this have been fixed in:
commit 2fbd49c61404ed7f2e7f36f65d920ef81281a58e (HEAD -> master, origin/master) Author: Jiri Svoboda <jiri@wiwaxia> Date: Wed Sep 21 20:01:20 2022 +0200 Audio synk needs locking Without locking the list of sink connections, we are exposed to a race between removing a connection at the end of playback (typically while destroying a hound context) and audio device event PCM_EVENT_FRAMES_PLAYED which causes audio mixing to occur via audio_sink_mix_inputs(), causing hound to crash often at the end of playback.
?
comment:8 by , 2 years ago
Milestone: | → 0.12.1 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
You are correct.
Fixed in changeset 2fbd49c61404ed7f2e7f36f65d920ef81281a58e.
I am going to attach a full console log from a system built with -O0 containing also the other fibril stacks.