A Weird K6 Bug : Tests

A Weird K6 Bug : Tests


Table of contents

  1. Disable the L2 cache
  2. Modify gcc to work around the REP MOVSL bug
  3. Disable write allocation
  4. Use only the top 32 MB
  5. Compile the kernel for i386 and do not use 4M pages
  6. Flush the TLB and caches when returning to user mode
  7. Disable the L1 cache
  8. The chessboard test (HOT)
  9. Test the linux-stack patch
  10. Show that this bug is not related to true self-modifying code
  11. On NT, try to reproduce the bug without gcc (NEW)
  12. Modify the memory allocation to work around the bug (HOT) and (NEW)

This page presents a few tests (often Linux specific) that have been performed in order to understand how the bug works.

Do not trust the results presented here, double-check everything and complain if you do not agree.


T1: Disable the L2 cache

How :

Result :

Does not work.

T2: Modify gcc to work around the REP MOVSL bug

How :

Result :

Does not work.

T3: Disable write allocation

How :

Results :

There are still compilation failures, although it seems to help. The K6 gets quite slower.

T4: Use only the top 32 MB

How :

Results :

After 100 loops, no errors. See The chessboard test for a possible explanation.

T5: Compile the kernel for i386 and do not use 4M pages

How :

Result :

None :-(

T6: Flush the TLB and caches when returning to user mode

How :

Result :

Still none...

T7: Disable the L1 cache

How :

Result :

So far, I have received from 3 persons : The results have been produced with mem=16 (see below).

T8: The chessboard test

Description :

Let's split the physical memory (64 MB assumed) in 4 sets : A segment contains 64 KB.
If we modify mem_init() (in arch/i386/mm/init.c) to only use two of these 4 sets, or to use only 0-8M,32-40M, we obtain very interesting results...

How :

Results :

It seems that the K6 does not like using two addresses that are 32 MB from each other.

T9: Test the linux-stack patch

Description :

AMD says that the problem is related to self-modifying code. The Linux kernel uses self-modifying code for signal handling. So it is not a bad idea to try the linux-stack patch, that makes the stack non-executable and removes the self-modifying code from the kernel.

How :

Result :

There are still compilation problems.

Conclusion :

Where can we have self-modifying code ? In the kernel, the application that segfaults or the shared C library. So compilations do not fail because of real self-modifying code.

T10: Show that this bug is not related to true self-modifying code

Description :

The linux-stack can be modified to kill cc1 if it uses self-modifying code. If burnit still runs with this patch, then this bug in not related to true self modifying code.

How :

Result :

The new cc1 works like the previous one : it follows the rules discovered with the chessboard test. With a safe memory configuration, burnit never fails, so cc1 does not use self-modifying code.

Conclusion :

This test proves, like the previous one, that the problem is not related to true self-modifying code.
According to AMD, the K6 does not use all 32 address bits when checking for self-modifying code. So if a write occurs to the physical address A and the K6 speculatively executes the code at A+32MB, then it may detect a "potential" self-modifying code situation and may not handle it properly.

T11: On NT, try to reproduce the bug without gcc

How :

Result :

With a GreenHills compiler, no problem after 8 hours. With gcc, it usually takes less than an hour before the first crash.
Who want to run the test with a Microsoft compiler ?

T12: Modify the memory allocation to work around the bug

Description :

If page A is a code page, and A+32MB is a data page, then the bug can occur. If the memory allocation is modified to prevent these collisions, the failure rate should decrease dramatically.

How :

Result :

(HOT) It works : 0/352 on a B9720...
[Main page] [FAQ page] [News page]
Updated: 97/09/22.
Send comments to poulot@wanadoo.fr