#864 closed defect (notadefect)
VFS crashes on ia64
Reported by: | Jakub Jermář | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 0.14.1 |
Component: | helenos/srv/vfs | Version: | mainline |
Keywords: | ia64 | Cc: | |
Blocker for: | Depends on: | ||
See also: |
Description
After the toolchain upgrade from GCC 8.2 to 13.2 VFS started to crash during startup on ia64/ski:
[init:vfs(6)] vfs: Accepting connections [init:ext4fs(8)] ext4fs: Accepting connections Task init:vfs (6) killed due to an exception at program counter 0x400000000001d4c0. ar.bsp=0xe00000000416c148 ar.bspstore=0x60000000002f4058 ar.rnat=0x0 ar.rsc=0xc ar.ifs=0x8000000000000288 ar.pfs=0xc000000000000288 cr.isr=0x400000000 cr.ipsr=0x1013080a6010 cr.iip=0x400000000001d4c0, #0 (<unknown>) cr.iipa=0x400000000001f560 (<unknown>) cr.ifa=0x400 (<unknown>) Kill message: Page fault: 0x0000000000000400.
Change History (6)
comment:1 by , 13 months ago
comment:2 by , 13 months ago
The respective piece of code looks like this:
4000000000001a00 <_vfs_fd_alloc>: 4000000000001a00: 08 48 39 18 80 05 [MMI] alloc r41=ar.pfs,14,12,0 4000000000001a06: c0 02 80 00 42 00 mov r44=r32 4000000000001a0c: 05 00 c4 00 mov r40=b0 4000000000001a10: 09 38 01 41 00 21 [MMI] adds r39=64,r32 4000000000001a16: a0 02 04 00 42 40 mov r42=r1 4000000000001a1c: 04 10 41 00 zxt1 r34=r34;; 4000000000001a20: 11 28 fd 01 00 24 [MIB] mov r37=127 4000000000001a26: b0 02 04 65 00 00 mov.i r43=ar.lc 4000000000001a2c: e8 b4 01 50 br.call.sptk.many b0=400000000001cf00 <fibril_mutex_lock>;; 4000000000001a30: 08 60 01 40 00 21 [MMI] mov r44=r32 4000000000001a36: e0 00 9c 30 20 20 ld8 r14=[r39] 4000000000001a3c: 00 50 01 84 mov r1=r42 4000000000001a40: 0a 68 05 00 00 24 [MMI] mov r45=1;; 4000000000001a46: c0 02 00 10 48 e0 mov r44=1024 4000000000001a4c: 00 70 18 e4 cmp.eq p7,p6=0,r14 4000000000001a50: 16 00 00 00 00 c8 [BBB] nop.b 0x0 4000000000001a56: 01 f0 01 80 21 00 (p07) br.cond.dpnt.few 4000000000001e30 <_vfs_fd_alloc+0x430> 4000000000001a5c: 10 00 00 40 br.few 4000000000001a60 <_vfs_fd_alloc+0x60> 4000000000001a60: 11 00 00 00 01 00 [MIB] nop.m 0x0 4000000000001a66: 00 00 00 02 00 00 nop.i 0x0 4000000000001a6c: 28 ba 01 50 br.call.sptk.many b0=400000000001d480 <fibril_mutex_unlock>;; <==== !!! HERE 0x400 is passed 4000000000001a70: 08 60 01 40 00 21 [MMI] mov r44=r32
comment:3 by , 13 months ago
I think this is a compiler bug. Look at how r44 (i.e. out0) is used. First, it is initialized with the address of the mutex at address 1a06 from r32 (i.e. in0). out0 is then passed unaltered to fibril_mutex_lock
at address 1a2c. After the mutex is taken, out0 is refilled again from r32. The assembly between addresses 1a36 and 1a56 corresponds to the if (!vfs_data->files)
check. Here, however, out0 is prepared to be used for the possible call to malloc
and gets rewritten by the value of 0x400, which is actually the size to be allocated. If the branch is taken, out0 will be fixed later (not shown in the snippet) to contain the original in0. However, if it is not taken, out0 will continue to hold the allocation size even for the call to fibril_mutex_unlock
.
comment:5 by , 13 months ago
Resolution: | → notadefect |
---|---|
Status: | new → closed |
A workaround has been pushed in commit:
commit e8a6279ff3841e7471155ab4bf21d5249a85c4e6 (HEAD -> master, origin/master) Author: Jakub Jermář <jakub@jermar.eu> Date: Sat Nov 18 16:37:51 2023 +0100 Work around GCC bug 112604 Turn off the optimization which seems to be responsible for the issue on ia64.
Upon closer inspection the above error is triggered by an invocation of
fibril_mutex_unlock(0x400)
fromvfs_files_init
, which is inlined into_vfs_fd_alloc
. Adding debug prints tovfs_files_init
masks the bug and the system is usually able to boot normally.