#782 closed defect (fixed)
HelenOS does not boot on Raspberry Pi
Reported by: | Jakub Jermář | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 0.9.1 |
Component: | helenos/boot/arm32 | Version: | mainline |
Keywords: | Cc: | ||
Blocker for: | Depends on: | ||
See also: |
Description
As of commit 4bb4cf88f506ddc6012f655a28835fe8872e9f71, the console output stops upon entering the kernel:
HelenOS bootloader, release 0.7.2 (Boosted Effort), revision 4bb4cf88f Built on 2018-12-17 21:01:02 for arm32 Copyright (c) 2001-2018 HelenOS project Boot loader: 0x00008000 -> 0x00015d40 Memory statistics 0x00015000|0x00015000: bootstrap stack 0x00010000|0x00010000: bootstrap page table 0x00015838|0x00015838: boot info structure 0x80a08000|0x00a08000: kernel entry point Boot loader: 0x00008000 -> 0x00015d40 Payload: 0x00015d40 -> 0x00263d40 Kernel load address: 0x00a08000 Kernel start: 0x80a08000 RAM end: 0x01a08000 (16777216 bytes available) Inflating components ... 0x80a08000|0x00a08000: kernel.elf.gz image (550396/154178 bytes) 0x80a8f000|0x00a8f000: ns.gz image (107112/50188 bytes) 0x80aaa000|0x00aaa000: loader.gz image (107240/50411 bytes) 0x80ac5000|0x00ac5000: init.gz image (134476/62563 bytes) 0x80ae6000|0x00ae6000: locsrv.gz image (122668/58017 bytes) 0x80b04000|0x00b04000: rd.gz image (113604/53331 bytes) 0x80b20000|0x00b20000: vfs.gz image (136864/63692 bytes) 0x80b42000|0x00b42000: logger.gz image (119284/55647 bytes) 0x80b60000|0x00b60000: fat.gz image (182444/86286 bytes) 0x80b8d000|0x00b8d000: initrd.img.gz image (5345280/1769857 bytes) Done. Booting the kernel...
Change History (7)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
The kernel panic started with this commit:
edc64c03b91257aecae0d60886bd274aea300bf9 is the first bad commit commit edc64c03b91257aecae0d60886bd274aea300bf9 Author: Jakub Jermar <jakub@jermar.eu> Date: Wed Jul 18 00:42:57 2018 +0200 Zero out new thread's register context This removes the information leak in which the new thread inherited some register values from the thread which created it. Also, now each thread begins execution with a well-defined register state. :040000 040000 00a5a6a1f0af764b7222a75ae8d5c5b472a9f4f9 06d1f1b58faa1025b6c39f5089ac29a686ebf744 M kernel
comment:3 by , 6 years ago
Commit 336b7393ec3e072439a0e045724088e669be87d4 fixed the panic caused by edc64c03b91257aecae0d60886bd274aea300bf9 (zero cpu_mode in context_t), but the crash due to 4621d2311994bf63dea425ed923239d4ca1babc9 (switch to compiler builtins for atomics) still remains.
comment:4 by , 6 years ago
Milestone: | 0.8.0 → 0.9.1 |
---|
comment:5 by , 6 years ago
I made a couple of experiments which helped me to narrow down the problem. It looks like the following test procedure executes as expected when called after the kernel's call to as_switch()
in page_arch_init()
and misbehaves if executed before:
80a47b10: e1a0c00d mov ip, sp 80a47b14: e92dd800 push {fp, ip, lr, pc} 80a47b18: e24cb004 sub fp, ip, #4 80a47b1c: e24dd008 sub sp, sp, #8 80a47b20: ee070fba mcr 15, 0, r0, cr7, cr10, {5} <= DMB 80a47b24: e24b3010 sub r3, fp, #16 80a47b28: e1932f9f ldrex r2, [r3] 80a47b2c: e2822001 add r2, r2, #1 80a47b30: e1831f92 strex r1, r2, [r3] 80a47b34: e3510000 cmp r1, #0 80a47b38: 1afffffa bne 80a47b28 <= atomic_inc() 80a47b3c: e3a00000 mov r0, #0 80a47b40: ee070fba mcr 15, 0, r0, cr7, cr10, {5} <= DMB 80a47b44: e24bd00c sub sp, fp, #12 80a47b48: e89da800 ldm sp, {fp, sp, pc}
As for what exactly misbehaves mean, I suspect STREX always returns 1, forming thus an infinite loop. It's as if the system was not ready to execute the LDREX-ADD-STREX atomic sequence yet and calling page_arch_init()
fixed that.
comment:6 by , 6 years ago
Ok, I figured this out.
The problem is that the loader installs a 1:1 mapping between the virtual and physical address space (and wickedly assumes physical mirrors at 2G). Virtual addresses that map identically to physical memory are mapped as cacheable (both inner- and outer- write-back, write-allocate) and everything else is mapped noncacheable as it is assumed to be a device. Unfortunately this "everything else" happens to include also kernel virtual addresses that use a PA2KA() mapping (i.e. identity with a shift to 2G). So until the kernel installs its own page tables, the LDREX/STREX instructions use mappings which are marked as noncacheable device memory. No wonder it doesn't work. Previous versions were not affected because they used a different mechanism which is not sensitive the memory attribute of the used memory.
Splitting the loader's page table into two halves, first with a 1:1 mapping and second with a PA2KA mapping fixes the problem on RaspberryPi. Unfortunately it breaks bbone, most likely because of its physical memory starts already at 2G (which might also be the reason it was not affected by the issue in the first place).
I am now looking into ways to fix this so that nothing breaks.
comment:7 by , 6 years ago
Component: | helenos/kernel/arm32 → helenos/boot/arm32 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Fixed in commit accdbd830beca44bcb50139f5c5e256cbe7afda9.
This behavior (no kernel messages printed) occurs since:
However, even before this commit, as far as the switch to the HelenOS-specific toolchain in commit bbe5e34956da986df4d32357c697e539e8cfec0d, the boot was failing with:
This kernel panic, is more difficult to bisect due to the toolchain change.