A Weird K6 Bug : Tests
A Weird K6 Bug : Tests
- Disable the L2 cache
- Modify gcc to work around the REP MOVSL bug
- Disable write allocation
- Use only the top 32 MB
- Compile the kernel for i386 and do not use 4M pages
- Flush the TLB and caches when returning to user mode
- Disable the L1 cache
- The chessboard test
- Test the linux-stack patch
- Show that this bug is not related to true self-modifying code
- On NT, try to reproduce the bug without gcc
- Modify the memory allocation to work around the bug
and
This page presents a few tests (often Linux specific) that have been performed
in order to understand how the bug works.
Do not trust the results presented here, double-check everything and complain
if you do not agree.
How :
- Disable the L2 cache from the BIOS setup.
- Run burnit.
Result :
Does not work.
How :
- Read the description of the bug in the
AMD-K6 MMX Enhanced Processor Revision Guide
- Modify config/i386/i386.md to output 'call rep_movsl' instead of
'rep; movsl'.
- Write your own slow rep_movsl with a lot of jumps, nops, etc...
- Add your rep_movsl to /usr/lib/crt1.o.
- Bootstrap gcc.
- Run burnit.
Result :
Does not work.
How :
- Compile diswa.c as a module.
- Load diswa
- Run burnit
- Unload diswa
Results :
There are still compilation failures, although it seems to help.
The K6 gets quite slower.
How :
Results :
After 100 loops, no errors. See The chessboard test for
a possible explanation.
How :
- Configure the Linux kernel for i386.
- Boot with mem=pentium
Result :
None :-(
How :
- Add 'movl %cr3,%eax; movl %eax,%cr3; wbinvd' at the beginning of the
RESTORE_ALL macro in arch/i386/kernel/entry.S
- Recompile and install the new kernel.
- Run burnit.
Result :
Still none...
How :
- Apply this patch to the kernel.
- Recompile and install the new kernel.
- Disable the L1 cache from the BIOS setup.
- Run burnit.
Result :
So far, I have received from 3 persons :
- FAILED: 0/60
- FAILED: 0/12 and 0/37
- FAILED: 0/8 and 0/8
The results have been produced with mem=16 (see below).
Description :
Let's split the physical memory (64 MB assumed) in 4 sets :
- 1: even segments below 32 MB (segment 0, 2, ...)
- 2: odd segments below 32 MB (segment 1, 3, ...)
- 4: even segments above 32 MB (segment 512, 514, ...)
- 8: odd segments above 32 MB (segment 513, 515, ...)
A segment contains 64 KB.
If we modify mem_init() (in arch/i386/mm/init.c) to only use two of these 4
sets, or to use only 0-8M,32-40M, we obtain very interesting results...
How :
- Apply this patch to the kernel.
- Recompile and install the new kernel.
- Boot with mem=xxx
- Run burnit.
Results :
- mem=3 (1+2) : ok (same as mem=32M)
- mem=5 (1+4) : FAILED: 2/25
- mem=9 (1+8) : ok (101 loops)
- mem=6 (2+4) : ok (161 loops)
- mem=10 (2+8) : FAILED: 1/14
- mem=12 (4+8) : ok (see Use only the top 32 MB)
- mem=16 (0-8M,32-40M range, no parallel compilation in burnit): FAILED: 10/22
It seems that the K6 does not like using two
addresses that are 32 MB from each other.
Description :
AMD says that the problem is related to self-modifying code. The Linux kernel
uses self-modifying code for signal handling. So it is not a bad idea
to try the linux-stack patch, that makes the stack non-executable and
removes the self-modifying code from the kernel.
How :
- Apply the linux-stack patch to the kernel
- Recompile and install the new kernel.
- Run burnit.
Result :
There are still compilation problems.
Conclusion :
Where can we have self-modifying code ? In the kernel, the application
that segfaults or the shared C library.
-
That's not in the kernel, because there is no self-modifying code anymore
in the kernel with the linux-stack patch.
-
That's not in the applications, because gcc can only generate self-modifying
code when nested functions are used, and commands such as
sh, rm, cksum, cpp, make, cc1, etc... do not use nested functions.
-
That's not be in the shared C library, because they are no nested
functions in libc-5.4.33 (compile it with -pedantic) and the assembly code
does not contain self-modifying code.
So compilations do not fail because of real
self-modifying code.
Description :
The linux-stack can be modified to kill cc1 if it uses self-modifying code.
If burnit still runs with this patch, then this bug in not related to true
self modifying code.
How :
Result :
The new cc1 works like the previous one : it follows the rules discovered
with the chessboard test. With a safe memory configuration,
burnit never fails, so cc1 does not use self-modifying code.
Conclusion :
This test proves, like the previous one, that the problem is not related
to true self-modifying code.
According to AMD, the K6 does not use all 32 address bits
when checking for self-modifying code. So if a write occurs to the physical
address A and the K6 speculatively executes the code at A+32MB, then it
may detect a "potential" self-modifying code situation and may not handle it
properly.
How :
- Read How to reproduce the bug on NT.
- Adapt the k6bug.bat to use a non-gcc compiler.
- Run many k6bug.bat, as described in README.
Result :
With a GreenHills compiler, no problem after 8 hours. With gcc, it usually
takes less than an hour before the first crash.
Who want to run the test with a Microsoft compiler ?
Description :
If page A is a code page, and A+32MB is a data page, then the bug can occur.
If the memory allocation is modified to prevent these collisions, the failure
rate should decrease dramatically.
How :
- Apply this patch to a recent Linux
kernel.
The patch is against 2.1.56 but should work without too many problems on
other kernels.
- Recompile and install the new kernel.
- Run burnit.
Result :
It works : 0/352 on a B9720...
[Main page]
[FAQ page]
[News page]
Updated: 97/09/22.
Send comments to poulot@wanadoo.fr