x86 Overview
I get a lot of questions concerning the proliferation of x86 compatible
machines. I decided I might save myself some time by hacking out a
quick and dirty explanation page I could refer people to. That is what
this is, and it probably contains a boatload of errors, but these are
the answers you'd get, more or less, if you asked me in person the
following questions:
Reading about the crusades has satisfied a great deal of my hunger for
religous warfare, and so I don't attempt to compare these chips here.
Instruction Set Overview
I get a lot of questions on the state of x86 compatible architectures.
The x86 world is comprised of a *lot* of differing chips. The first
classification would by the native data length, called IA32 and IA64.
This is, I think, Intel Architecture 32 bit and Intel Architecture 64 bit.
This page will only cover IA32, since that is what most people have
these days. I'll just mention that IA64 architectures include the Itanium
and forthcoming McKinley processors, as well as AMD's forthcoming
'hammer series.
A big reason people are talking about IA64 machines is that they can
natively use memories in excess of 4GB, which is something IA32 does not
handle natively . . .
OK, here's the IA32 x86 compatibles I'm aware of, catagorized
by company, and then ordered roughly by release date (I leave out companies
that are no longer real players, such as Cyrix or Centaur):
- Intel : 386, 486, Pentium, PentiumMMX, Pentium PRO,
Pentium II, Pentium III, Pentium 4
- AMD : K5, K6, K6-2, K6-3, Athlon, Enhanced Athlon (AKA thunderbird),
Athlon4, AthlonMP
So, all these chips have one thing (at least) in common: they use the x86
instruction set architecture (ISA). What this means is that the same code
will run on all of them (though it may run slower or faster). However, the
newer chips contain ISA extensions, which, if the programmer uses them,
will result in an executable that is not portable across all x86 platforms.
NOTE: floating point instructions using the standard FPU (Floating Point
Unit) are often said to by x87 compatible, since this is what the FPU
was called.
x86 ISA Extensions:
- MMX
- Set of "MultiMedia eXtensions" to the x86 ISA. Mainly new instructions
for integer performance, and maybe some prefetch. For Intel, all chips
starting with the PentiumMMX processor possess these extensions. For AMD,
all chips starting with the K6 possess these extensions.
- SSE
- Streaming SIMD (Single Instruction Multiple Data) Extensions. SSE is
a superset of MMX (i.e., a chip with SSE automatically possesses MMX)
These instructions are used to speed up single precision (32 bit) floating point
arithmetic. By operating on 4 single precision values with one instruction,
they allow for a theoretical peak of 4 FLOPs (FLoating point OPerations)
every cycle (eg, a 500Mhz PIII can theoretically perform 2GFLOPS
(2 billion FLoating point Operations Per Second)). The results returned by
SSE are IEEE compliant (as are classical x86 floating point results).
For Intel, all chips listed starting with the Pentium III possess SSE
extensions. For AMD, all chips starting from Athlon4 possess SSE.
- 3DNow!
- AMD's extension to MMX that does almost the exact same thing SSE does,
except the single precision arithmetic is not IEEE compliant (i.e. it is
not as fault-tolerant as x86 arithmetic). It is also a superset of MMX
(but not of SSE; 3DNow! was released before SSE). It is supported only
on AMD, starting with the K6-2 chip.
- Enhanced 3DNow!
- An extension to 3DNow! starting with the Athlon onward. Some additional
prefetch commands (essentially, they added support for SSE-style prefetch,
I think), and some other stuff I really don't know a whole lot about.
- 3DNow! Professional
- AMD's extension that is essentially Enhanced 3DNow! + SSE. Available on
AMD chips starting with the Athlon4.
- SSE2
- New instructions that perform double precision floating arithmetic.
Allows for 2 double precision FLOPs every cycle. For Intel, supported
on the Pentium 4. Not supported by any released AMD chip.
A Rose by 10^6 Names
There are an enormous number of names for each of these chips. There is
usually at least two for each chip (a pre-release name, and a final name)
and then there are names for various subcatagories. I'll try to discuss
some of these here.
- Linux/gnu catagories
- These are very generic catagories, and my best guess as to their meaning is:
- i386: All of the chips mentioned above.
- i486: All chips mentioned above except the 386.
- i586: All chips from above step, additionally eliminating 486.
- i686: All chips from above step, additionally eliminating the
Pentium, PentiumMMX, K5, K6, K6-2, and K6-3 chips.
Note that due to the various extensions, these chip catagories are only
binary compatible if you use strict x86 instructions (no SSE or 3DNow!, etc).
- Rundown of selected AMD chips
- Athlon: AKA: Athlon Classic, K7. Debut of the Athlon core, with
512K of off-chip L2 cache, with a bus running at 1/2 speed of proc. 64K
L1 data cache. Later versions had L2 caches running a 1/3 or 2/5 speed of chip.
- Enhanced Athlon: AKA: Thunderbird. Second gen Athlon core,
with 256K of on-chip L2 with full-speed bus. L2 cache is now exclusive with
higher associativity.
- Athlon4: AKA: Palomino. Third gen Athlon core, with new prefetch
in hardware, power optimization (allows laptop use), reordering of silocon
for greater efficiency, among other things.
- AthlonMP: AKA: Palomino. Same as Athlon4, but for SMP use.
- Duron: Athlon core with reduced L2 cache for low-end systems. AMD
seems to have the Duron keep pace with the Athlon, just with reduced L2;
3dNow! Professional) with a smaller cache . . .
- Rundown of selected Intel chips
- PentiumPRO: Debut of Intel's so called P6 core. Featured an
L2 of various sizes running at full speed. 8K L1 data cache.
- Pentium II: Second gen P6 core. Featured 512K L2 cache, with
bus running at 1/2 speed of chip. Added MMX instructions and pumped L1 data
cache to 16K.
- Pentium III with 512K L2: Third gen P6 core, with SSE added,
and 512K of off-chip L2, running at 1/2 speed of chip.
- Pentium III with 256K L2: Fourth gen P6 core, with 256K of on-chip L2,
running at full speed of chip.
- Pentium 4: Complete redesign of P6 core; new core is called
"netburst". Upgrades too numerous to mention, but include
much faster cache, much higher bandwidth between memory and caches levels,
SSE2 support, etc. L1 data cache drops to 8K.
Unfortunately, this doesn't even scratch the surface of the naming game.
For instance, for every Intel class above, they define an extra catagory
called "Xeon", which is usually the same chip with a slightly bigger
L2 and a much bigger price tag. I think AMD is planning on playing this
game as well . . .
Peak Floating Point Performance Overview
Peak floating point performance is given in FLOPS (Floating point Operations
Per Second). It can be derived by some constant times the cycle time
(eg, Mhz) of a chip. It also varies depending on what instruction set
you are using (as explained above). With no ISA extensions, all the IA32
Intel architectures can do at most 1 FLOP per cycle (eg., a 500Mhz PIII
can theoretically get at most 500 MFLOP). For all AMD machines before the
Athlon, this number is actually less than one. For Athlon and later,
however, it is 2 (eg, a 500Mhz Athlon has 1 GFLOP theoretical peak).
Here is a table listing some of the newer chips, and the constant to multiply
the cycle time by to get peak (an entry of 0 indicates that chip does not
have the given ISA extension) FOR SINGLE PRECISION ARITHMETIC:
| CHIP | x87 | SSE | 3DNOW! |
| Pentium | 1 | 0 | 0 |
| Pentium II | 1 | 0 | 0 |
| Pentium III | 1 | 4 | 0 |
| Pentium 4 | 1 | 4 | 0 |
| Athlon | 2 | 0 | 4 |
| Enhanced Athlon | 2 | 0 | 4 |
| Athlon4 | 2 | 4 | 4 |
| AthlonMP | 2 | 4 | 4 |
Multiplier of cycle time to get single precision peak
Here's the same table fore double precision (64 bit) arithmetic:
| CHIP | x87 | SSE2 |
| Pentium | 1 | 0 |
| Pentium II | 1 | 0 |
| Pentium III | 1 | 0 |
| Pentium 4 | 1 | 2 |
| Athlon | 2 | 0 |
| Enhanced Athlon | 2 | 0 |
| Athlon4 | 2 | 0 |
| AthlonMP | 2 | 0 |
Multiplier of cycle time to get double precision peak