Opened 10 years ago
Closed 9 years ago
#606 closed defect (fixed)
VFS sometimes crashes in fibril_switch() on sun4v
Reported by: | Jakub Jermář | Owned by: | Jakub Jermář |
---|---|---|---|
Priority: | major | Milestone: | 0.7.0 |
Component: | helenos-build/sparc64 | Version: | mainline |
Keywords: | sun4v | Cc: | |
Blocker for: | Depends on: | ||
See also: | #324 |
Description
After mainline,1921, mainline,1922 and mainline,1923, HelenOS/sun4v can make it quite far into userspace initialization. As far as stability is concerned, the only problem seems to be around this area in fibril.c:
fibril_t *srcf = __tcb_get()->fibril_data; if (stype != FIBRIL_FROM_DEAD) { /* Save current state */ if (!context_save(&srcf->ctx)) { if (serialization_count) srcf->flags &= ~FIBRIL_SERIALIZED; if (srcf->clean_after_me) { <========== HERE /* * Cleanup after the dead fibril from which we * restored context here. */ void *stack = srcf->clean_after_me->stack; <=========== or HERE if (stack) { /* * This check is necessary because a * thread could have exited like a * normal fibril using the * FIBRIL_FROM_DEAD switch type. In that * case, its fibril will not have the * stack member filled. */
Either srcf→clean_after_me or srcf→clean_after_me→stack contain some garbage (unaligned or unmapped).
The corresponding disasm is here:
c6b4: 82 10 00 07 mov %g7, %g1 c6b8: c4 5f a8 7f ldx [ %fp + 0x87f ], %g2 c6bc: c2 58 60 08 ldx [ %g1 + 8 ], %g1 c6c0: 80 a0 a0 03 cmp %g2, 3 c6c4: 02 40 00 73 be,pn %icc, c890 <fibril_switch+0x250> c6c8: c2 77 a7 f7 stx %g1, [ %fp + 0x7f7 ] c6cc: 40 00 53 f5 call 216a0 <context_save> c6d0: 90 00 60 10 add %g1, 0x10, %o0 c6d4: 80 a2 20 00 cmp %o0, 0 c6d8: 12 40 00 a1 bne,pn %icc, c95c <fibril_switch+0x31c> c6dc: 03 00 00 00 sethi %hi(0), %g1 c6e0: 82 18 7f e8 xor %g1, -24, %g1 c6e4: c2 01 c0 01 ld [ %g7 + %g1 ], %g1 c6e8: 80 a0 60 00 cmp %g1, 0 c6ec: 12 48 00 2f bne %icc, c7a8 <fibril_switch+0x168> c6f0: c8 5f a7 f7 ldx [ %fp + 0x7f7 ], %g4 c6f4: ca 5f a7 f7 ldx [ %fp + 0x7f7 ], %g5 c6f8: fa 59 60 c8 ldx [ %g5 + 0xc8 ], %i5 <======== here %g5 is misaligned c6fc: 22 c7 40 10 brz,a,pn %i5, c73c <fibril_switch+0xfc> c700: b0 10 20 01 mov 1, %i0 c704: d0 5f 60 a8 ldx [ %i5 + 0xa8 ], %o0 c708: 02 c2 00 06 brz,pn %o0, c720 <fibril_switch+0xe0> c70c: 01 00 00 00 nop c710: 7f ff e8 b4 call 69e0 <as_area_destroy>
This crash can be still occasionally encountered also in the CHT pre-integration branch:
http://bazaar.launchpad.net/~jakub/helenos/cht-preintegration/revision/2291
Change History (1)
comment:1 by , 9 years ago
Component: | helenos/srv/vfs → helenos-build/sparc64 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Note:
See TracTickets
for help on using tickets.
There was a bug in tlb_invalidate_pages() fixed by mainline,2409 which was most likely causing this issue. As of mainline,2409, I was unable to reproduce the problem both under gem5 and on a real-world T1000.