[Tinymux] Pre-built binaries of TinyMUX not optimized for AMD chips.
Stephen Dennis
brazilofmux at gmail.com
Wed Jul 13 02:42:28 EDT 2005
As you may have seen in the technical news, the Intel Compilers
(Fortran and C++) do not generate code that enables any optimized code
paths designed for MMX, SSE, or SSE2 that would benefit the AMD
processors that support them.
Being curious, I looked at the binaries that I'm distributing, and
it's true. The technical cruz of this tizzy is Intel's definition and
use of the CPUID instruction. From their spec, the only use of CPUID
they consider to be vendor-neutral is (eax=0 and eax=80000000h). The
first returns the manufactor's string (GenuineIntel in their case),
and the latter is a mechanism for determining the highest supported
eax request value.
Even though AMD and other processors have followed Intel's format for
the EAX=1 CPUID, the code generated by the Intel Compiler does not
look at those bits unless the chip is GenuineIntel.
So, the feature bits are the same. Certain AMD chips support MMX, SSE,
and SSE3 properly (even licensing these things from Intel), so why not
just stomp on the GenuineIntel test and let the rest of the code
determine the chip's capabilities.
There are two Perl scripts on the Internet that do this narrowly for a
library contained in Intel's Fortran 7.1 and Fortran 8.1 compiler
products. However, that approach would not work for the 8.1 C++
compiler that I'm using to build netmux.exe.
In the newest compiler, Intel has managed to wrap the Family and
Feature decoding up into a tight ball. The Feature field might be the
same between Intel and AMD, but the Family field is different.
___intel_cpu_indicator:
0x00000001 -- Not recognized (older Intel Chips and all AMD chips).
0x00000002 -- GenuineIntel, Family 5, no MMX
0x00000004 -- GenuineIntel, Family 6, no MMX, no FSXR
0x00000008 -- GenuineIntel, Family 5, MMX
0x00000010 -- GenuineIntel, Family 6, MMX, no FSXR
0x00000020 -- GenuineIntel, Family 6, MMX, FSXR, no SSE, no SSE2, no SSE3
0x00000080 -- GenuineIntel, Family 6, MMX, FSXR, SSE, no SSE2, no SSE3
0x00000200 -- GenuineIntel, Family F, SSE2, no SSE3
0x00000400 -- GenuineIntel, Family 6, MMX, FSXR, SSE, SSE2, no SSE3
0x00000800 -- GenuineIntel, Family 6, MMX, FSXR, SSE, SSE2, SSE3
GenuineIntel, Family F, SSE2, SSE3
Family 5 includes Pentium and Pentium MMX
Family 6 includes Celeron, Pentium M, Pentium Pro, Pentium II, and Pentium III.
Family F is a Pentium 4.
Once ___intel_cpu_indicator has been developed, the rest of the code
uses it like so:
_exp:
0049A558 F7 05 94 98 5B 00 00 test dword ptr
[___intel_cpu_indicator (005b9894)],0FFFFFE00h
0049A562 0F 85 10 21 00 00 jne _exp.J (0049c678)
0049A568 F7 05 94 98 5B 00 FF test dword ptr
[___intel_cpu_indicator (005b9894)],0FFFFFFFFh
0049A572 0F 85 E8 23 00 00 jne _exp.A (0049c960)
0049A578 E8 5F EF 01 00 call ___intel_cpu_indicator_init (004b94dc)
0049A57D EB D9 jmp _exp (0049a558)
0049A57F C3 ret
Label _exp.A is the default version. Label _exp.J function is
optimized for SSE/SSE2 and is protected by the 0FFFFFE00h mask.
Another case is the memcpy():
__intel_fast_memcpy:
004B9654 F7 05 94 98 5B 00 00 test dword ptr
[___intel_cpu_indicator (005b9894)],0FFFFFE00h
004B965E 0F 85 E0 FF FF FF jne __intel_fast_memcpy.J (004b9644)
004B9664 F7 05 94 98 5B 00 80 test dword ptr
[___intel_cpu_indicator (005b9894)],0FFFFFF80h
004B966E 0F 85 D8 FF FF FF jne __intel_fast_memcpy.H (004b964c)
004B9674 F7 05 94 98 5B 00 FF test dword ptr
[___intel_cpu_indicator (005b9894)],0FFFFFFFFh
004B967E 0F 85 A8 FF FF FF jne __intel_fast_memcpy.A (004b962c)
004B9684 E8 53 FE FF FF call ___intel_cpu_indicator_init (004b94dc)
004B9689 EB C9 jmp __intel_fast_memcpy (004b9654)
004B968B C3 ret
Label memcpy.J uses SSE/SSE2 instructions, label memcpy.H uses MMX
instructions, and label memcpy.A is the default version.
There is nothing in their approach which prevents a program from
changing the value of ___intel_cpu_indicator later. Someone could
write a routine which would correctly decode both Intel and AMD chips.
One reason for posting this is it is technically entertaining. Another
reason is that some people may want to run the pre-built binaries with
AMD chips. I'd like to know that you are and if it becomes important
to enough people, workaround the issue.
Brazil
More information about the Tinymux
mailing list