#782 closed defect (fixed)
HelenOS does not boot on Raspberry Pi
| Reported by: | Jakub Jermář | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | 0.9.1 |
| Component: | helenos/boot/arm32 | Version: | mainline |
| Keywords: | Cc: | ||
| Blocker for: | Depends on: | ||
| See also: |
Description
As of commit 4bb4cf88f506ddc6012f655a28835fe8872e9f71, the console output stops upon entering the kernel:
HelenOS bootloader, release 0.7.2 (Boosted Effort), revision 4bb4cf88f Built on 2018-12-17 21:01:02 for arm32 Copyright (c) 2001-2018 HelenOS project Boot loader: 0x00008000 -> 0x00015d40 Memory statistics 0x00015000|0x00015000: bootstrap stack 0x00010000|0x00010000: bootstrap page table 0x00015838|0x00015838: boot info structure 0x80a08000|0x00a08000: kernel entry point Boot loader: 0x00008000 -> 0x00015d40 Payload: 0x00015d40 -> 0x00263d40 Kernel load address: 0x00a08000 Kernel start: 0x80a08000 RAM end: 0x01a08000 (16777216 bytes available) Inflating components ... 0x80a08000|0x00a08000: kernel.elf.gz image (550396/154178 bytes) 0x80a8f000|0x00a8f000: ns.gz image (107112/50188 bytes) 0x80aaa000|0x00aaa000: loader.gz image (107240/50411 bytes) 0x80ac5000|0x00ac5000: init.gz image (134476/62563 bytes) 0x80ae6000|0x00ae6000: locsrv.gz image (122668/58017 bytes) 0x80b04000|0x00b04000: rd.gz image (113604/53331 bytes) 0x80b20000|0x00b20000: vfs.gz image (136864/63692 bytes) 0x80b42000|0x00b42000: logger.gz image (119284/55647 bytes) 0x80b60000|0x00b60000: fat.gz image (182444/86286 bytes) 0x80b8d000|0x00b8d000: initrd.img.gz image (5345280/1769857 bytes) Done. Booting the kernel...
Change History (7)
comment:1 by , 7 years ago
comment:2 by , 7 years ago
The kernel panic started with this commit:
edc64c03b91257aecae0d60886bd274aea300bf9 is the first bad commit
commit edc64c03b91257aecae0d60886bd274aea300bf9
Author: Jakub Jermar <jakub@jermar.eu>
Date: Wed Jul 18 00:42:57 2018 +0200
Zero out new thread's register context
This removes the information leak in which the new thread inherited some
register values from the thread which created it. Also, now each thread
begins execution with a well-defined register state.
:040000 040000 00a5a6a1f0af764b7222a75ae8d5c5b472a9f4f9 06d1f1b58faa1025b6c39f5089ac29a686ebf744 M kernel
comment:3 by , 7 years ago
Commit 336b7393ec3e072439a0e045724088e669be87d4 fixed the panic caused by edc64c03b91257aecae0d60886bd274aea300bf9 (zero cpu_mode in context_t), but the crash due to 4621d2311994bf63dea425ed923239d4ca1babc9 (switch to compiler builtins for atomics) still remains.
comment:4 by , 7 years ago
| Milestone: | 0.8.0 → 0.9.1 |
|---|
comment:5 by , 7 years ago
I made a couple of experiments which helped me to narrow down the problem. It looks like the following test procedure executes as expected when called after the kernel's call to as_switch() in page_arch_init() and misbehaves if executed before:
80a47b10: e1a0c00d mov ip, sp
80a47b14: e92dd800 push {fp, ip, lr, pc}
80a47b18: e24cb004 sub fp, ip, #4
80a47b1c: e24dd008 sub sp, sp, #8
80a47b20: ee070fba mcr 15, 0, r0, cr7, cr10, {5} <= DMB
80a47b24: e24b3010 sub r3, fp, #16
80a47b28: e1932f9f ldrex r2, [r3]
80a47b2c: e2822001 add r2, r2, #1
80a47b30: e1831f92 strex r1, r2, [r3]
80a47b34: e3510000 cmp r1, #0
80a47b38: 1afffffa bne 80a47b28 <= atomic_inc()
80a47b3c: e3a00000 mov r0, #0
80a47b40: ee070fba mcr 15, 0, r0, cr7, cr10, {5} <= DMB
80a47b44: e24bd00c sub sp, fp, #12
80a47b48: e89da800 ldm sp, {fp, sp, pc}
As for what exactly misbehaves mean, I suspect STREX always returns 1, forming thus an infinite loop. It's as if the system was not ready to execute the LDREX-ADD-STREX atomic sequence yet and calling page_arch_init() fixed that.
comment:6 by , 7 years ago
Ok, I figured this out.
The problem is that the loader installs a 1:1 mapping between the virtual and physical address space (and wickedly assumes physical mirrors at 2G). Virtual addresses that map identically to physical memory are mapped as cacheable (both inner- and outer- write-back, write-allocate) and everything else is mapped noncacheable as it is assumed to be a device. Unfortunately this "everything else" happens to include also kernel virtual addresses that use a PA2KA() mapping (i.e. identity with a shift to 2G). So until the kernel installs its own page tables, the LDREX/STREX instructions use mappings which are marked as noncacheable device memory. No wonder it doesn't work. Previous versions were not affected because they used a different mechanism which is not sensitive the memory attribute of the used memory.
Splitting the loader's page table into two halves, first with a 1:1 mapping and second with a PA2KA mapping fixes the problem on RaspberryPi. Unfortunately it breaks bbone, most likely because of its physical memory starts already at 2G (which might also be the reason it was not affected by the issue in the first place).
I am now looking into ways to fix this so that nothing breaks.
comment:7 by , 7 years ago
| Component: | helenos/kernel/arm32 → helenos/boot/arm32 |
|---|---|
| Resolution: | → fixed |
| Status: | new → closed |
Fixed in commit accdbd830beca44bcb50139f5c5e256cbe7afda9.

This behavior (no kernel messages printed) occurs since:
4621d2311994bf63dea425ed923239d4ca1babc9 is the first bad commit commit 4621d2311994bf63dea425ed923239d4ca1babc9 Author: Jiří Zárevúcky <jiri.zarevucky@nic.cz> Date: Mon Aug 13 05:00:17 2018 +0200 Use compiler builtins for kernel atomics :040000 040000 ac60c15132946569adf411d62658b0c5d133d5ac 556da68f1448a4d71cf4e6c6c159fe050a640994 M abi :040000 040000 ca0e6b391de9b4fd2f196b5f1b6be33072a352e1 ba3cea6956d002fb330343d54451a4b923305d8f M kernel :040000 040000 4522e5b77223c8b4e0dbb4dc2e914e90f861ce3d 717b54d24b9459244fec8ea8df7538d8e4da789f M uspaceHowever, even before this commit, as far as the switch to the HelenOS-specific toolchain in commit bbe5e34956da986df4d32357c697e539e8cfec0d, the boot was failing with:
This kernel panic, is more difficult to bisect due to the toolchain change.