Opened 16 years ago
Last modified 7 years ago
#4 reopened defect
HelenOS/sparc64 unstable with CONFIG_TSB
Reported by: | Jakub Jermář | Owned by: | Jakub Jermář |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | helenos/kernel/sparc64 | Version: | mainline |
Keywords: | Cc: | ||
Blocker for: | Depends on: | ||
See also: |
Description
I found out that when I double the size of the buffer allocated for the TSB, the problem disappears. However, the size used for TSB allocation seems right. Therefore, it seems like something is damaging the content of the TSB memory.
I still haven't seen this show elsewhere than on one of the Ultra 60's.
Disabling TSB during compile time is a workaround for this bug.
By further investigating the issue, I have come to the conslusion that the bug was introduced in revision 2161. It is more likely that an already existing bug was exposed by fixing another bug in 2161. 2161 fixes a bug which prevented the TSB from functioning at all. So it looks like a TSB issue.
I have never seen this with r2128.
The earliest revision I saw this bug on is r 2174.
I have not investigated the revisions in between yet.
The problem seems to be independent from whether the kernel was translated with gcc 4.1.1 or gcc 4.1.2.
I saw this only on one Ultra 60 when trying to boot revisions around 2233 from a CD-ROM.
What happened was one of the three scenarios:
- the kernel booted just fine, but the ns task got the data_access_error exception (as reported in klog) and died; several tasks died afterwards, most likely due to the fact that they could not connect to ns; the kconsole was responsive in this case and I could investigate the content of the klog
- the kernel booted just fine, but the ns task exitted and no exception was reported in klog; some other tasks died after ns exitted; the kconsole was responsive in this case and I could investigate the content of the klog
- the kernel booted but then it looked as hung - no console task UI and the kconsole was not responsive
Change History (13)
comment:1 by , 16 years ago
Component: | → kernel/sparc64 |
---|
comment:2 by , 15 years ago
Summary: | Sudden death of userspace tasks → HelenOS/sparc64 unstable with CONFIG_TSB |
---|
comment:3 by , 15 years ago
Milestone: | → 0.5.0 |
---|
comment:4 by , 15 years ago
The respective Ultra 60 system ran fine (without any of the above symptoms) with the current version of HelenOS over the night, having the following load:
- played tetris to around 4500 points
- ran kernel and userspace tests
- ran tester loop1 test
- ran the factorial sysel example in an infinite loop
This morning, the system did not boot, either hanging, or killing the userspace tasks due to an data_access_error, or both. The data_access_error trap is a sign of a hardware problem (i.e. a machine check exception).
comment:5 by , 14 years ago
Status: | new → accepted |
---|
comment:6 by , 14 years ago
Status: | accepted → assigned |
---|
comment:7 by , 14 years ago
Milestone: | 0.5.0 → 0.5.1 |
---|
comment:8 by , 14 years ago
Resolution: | → worksforme |
---|---|
Status: | assigned → closed |
Closing as not reproducible. This ticket has been reproducible only on one Ultra 60 which will shortly become unavailable to me. If the issue reproduces on some other machine, please file a new ticket with up to date data.
comment:9 by , 13 years ago
Resolution: | worksforme |
---|---|
Status: | closed → reopened |
Reopening as my new Ultra 60, 2x CPU, 2GiB RAM exhibits the same problem (mainline,1018).
comment:10 by , 13 years ago
The two cpus identify as:
cpu0: manuf=UltraSPARC, impl=UltraSPARC II, mask=160 (450 MHz) cpu1: manuf=UltraSPARC, impl=UltraSPARC II, mask=160 (450 MHz)
comment:11 by , 13 years ago
Milestone: | 0.5.0 → 0.5.1 |
---|
comment:12 by , 10 years ago
Milestone: | 0.6.0 → 0.7.1 |
---|
comment:13 by , 7 years ago
Milestone: | 0.7.1 |
---|
The issue still exists with revision 4684, but I think it has slightly different symptoms considering the huge evolution step HelenOS made from 2233 to 4684.