Opened 14 years ago
Closed 14 years ago
#260 closed defect (fixed)
Booting process sometimes gets stuck while starting shells on VCs
Reported by: | Jiri Svoboda | Owned by: | Jiri Svoboda |
---|---|---|---|
Priority: | major | Milestone: | 0.4.3 |
Component: | helenos/fs/fat | Version: | mainline |
Keywords: | Cc: | jakub@… | |
Blocker for: | Depends on: | ||
See also: |
Description
Booting sometimes gets stuck at the point where the first four VCs contain the getterm banner. No more banners are printed (no other VCs are active) and command line is not reached on any VC. Keyboard and mouse input on the console work and it is possible to enter the kernel console.
Reproduced on revision: mainline,644
Config: defaults/ia32
Qemu version: 0.10.3
Qemu command line: qemu -m 32 -cdrom image.iso -boot d
Reproducibility: non-deterministic, in about 50% of attempts
Change History (11)
comment:1 by , 14 years ago
Owner: | set to |
---|---|
Status: | new → accepted |
comment:2 by , 14 years ago
comment:3 by , 14 years ago
Just a guess: What about available memory? The non-determinism can be simply caused by race conditions on allocating memory (mapping and demapping of address space areas). Then the second question would be why all the tasks don't get unblock eventually.
comment:4 by , 14 years ago
That was my first guess as well, but no, it's not an OOM. Increasing memory does not help.
A tiny bit of further investigation: The loader tasks are waiting for VFS (1x vfs_in_read, 4x vfs_in_open), VFS is waiting for FAT (1x vfs_out_read, 4x vfs_out_lookup). FAT is not waiting for any other server.
With tmpfs root filesystem, the problem does not occur.
comment:5 by , 14 years ago
Cc: | added |
---|
comment:6 by , 14 years ago
Component: | unspecified → fs/fat |
---|
I have quickly prototyped a deadlock detection mechanism for fibril synchronization primitives (only mutexes as of now), it can be found in lp:~jakub/helenos/deadlock-detection branch where it will soak until it is ready for mainline.
Nevertheless, the detection mechanism is useful even now as it detected the following deadlock between two fibrils in fat:
fibril A:
fibril_mutex_lock()
fat_idx_get_by_pos()
fat_match()
libfs_lookup()
fat_lookup()
fibril B:
fibril_mutex_lock()
fat_idx_get_by_index()
fat_root_get()
libfs_lookup()
fat_lookup()
comment:7 by , 14 years ago
Ok, I think I know what is the problem, based on the above stacks. In fat_match(), we first lock parent→idx→lock and then call fat_idx_get_by_pos(), in which we want to lock used_lock. But in another fibril, we manage to lock used_lock first in fat_idx_get_by_index(), but cannot get the idx lock for parent, because it is already taken by the first fibril.
comment:8 by , 14 years ago
Jiri, I have just (hopefully) fixed this in lp:~jakub/helenos/fs. Can you merge from there and verify the issue is no longer reproducible?
Thanks,
Jakub
comment:11 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
Fixed in changeset:mainline,648.
Running 'tasks' in kcon shows four tasks with the name 'getterm' and five task with the name 'loader'. On non-debug build we can see 7x getterm and 7x loader.