Basically, it has been my experience that egcs gets much worse performance than gnu gcc on dec alphas. This has held true for me through several releases of gcc & egcs, so you can imagine that I was less than thrilled with the announcement that egcs was taking over gcc . . .

For what it's worth, gcc 2.95.1 maintained by egcs gets less performance than gcc 2.8.1 did, though still not nearly as bad as egcs . . .

My main project is ATLAS, which involves using a code generator to produce very fast linear algebra kernels using ISO/ANSI C (it's got a BSD-ish license, if you care about that sort of thing). On a 533Mhz Dec ev56, ATLAS sustains a little over 600Mflop for large, out-of-cache matrix multiplies when compiled with gnu gcc 2.8.1. When the best code is generated for egcs, however, a peak of less than 500Mflop is observed.

My suspicion is that at least part of the problem has to do with fetch scheduling, which egcs seems to have optimized for the PPC at the expense of the alpha. I am WAGing this is part of the problem because I have observed egcs running much faster than gcc on a PPC, and a performance killer for egcs/alpha is to throw the

-fschedule-insns -fschedule-insns2
flags, which are big performance wins using gcc. Over the course of several egcs releases, I have tried pretty much every compiler flag I could find, and never gotten gnu-level performance

So, I am curious as to whether other alpha users have comparison shopped the two compilers for computation intensive codes, and if so, what their experiences have been. Can someone who knows more about these issues tell me what is difference between egcs & gcc on the alpha? If there is a problem, how does one go about getting the attention of very busy compiler developers to address it? Any help much appreciated; I can be contacted at

rwhaley@cs.utk.edu.
.

So that people can scope the problem out without installing atlas, I put together a small matmul benchmark showing the problem. This small benchmark repeatedly performs an L1-cache-contained matmul, and calculates the mflop rating achieved when the same codes are compiled with egcs & gcc. It times three different matmul algorithms:

  1. gemm1x1 : Naive, no-unrolling, textbook matmul written with 3 for loops
  2. gemm4x4 : Matmul written with standard 4x4 unrolling
  3. atlasmm : Highly optimized C code produced by ATLAS code generator

Of course, the one *I* care about is the atlasmm, but I found it interesting that even the more commonly seen algorithms perform better with gcc than with egcs.

Here are the results I get on my 533Mhz Dec ev56, using egcs-2.91.66 (1.1.2 release) and gcc 2.8.1:

GCC performance:
./xmm_gcc
ALGORITHM     NB   REPS        TIME      MFLOPS
=========  =====  =====  ==========  ==========

gemm1x1       28    500       0.328       67.02
gemm4x4       28    500       0.035      627.70
atlasmm       28    500       0.026      845.25

EGCS performance:
./xmm_egc
ALGORITHM     NB   REPS        TIME      MFLOPS
=========  =====  =====  ==========  ==========

gemm1x1       28    500       0.344       63.82
gemm4x4       28    500       0.055      399.19
atlasmm       28    500       0.033      662.36


Click here to return to my main page.