Mapping Algorithms to the Network Topology in a Portable Manner

Dave Turner, SCL Ames Laboratory, Ames IA 50011 USA    

PARA’04 Workshop on the State-of-the-Art in Scientific Computing

 

 

Abstract

 

One long-standing challenge in parallel computing is mapping an algorithm to the topology of the network.  Every architecture has a unique topology, and the effective topology can change with each run because the physical arrangement of nodes may be different, I/O nodes can intervene, and communication performance may be affected by other applications.  Most application programmers essentially ignore the network topology.  This retains portability, but sacrifices scalability and efficiency, and typically limits applications to systems of a few hundred processors. 

When dealing with computing systems of 1,000 – 100,000 processors, it is simply no longer possible to ignore the topology of the underlying network.  Good algorithms are needed that take advantage of local communications, and they must be mapped onto the network so local communications are done between neighboring nodes over direct connections to avoid contention for communication channels. 

Many algorithms can be efficiently mapped onto a simple topology like a 2D or 3D mesh.  In turn, these meshes can be mapped onto most network topologies in a manner that insures that neighboring nodes are directly connected to each other.  Mapping algorithms to a virtual 2D or 3D mesh can therefore provide both efficiency and portability.

A tool called NodeMap is being developed that automatically detects the network topology at run time, then provides an optimal mapping of a virtual 2D or 3D mesh to that network topology.  NodeMap will be called from the MPI_Cart_create() function when the reorder flag is set.  It may also be called from MPI_Init() to find optimal mappings for the global MPI functions. 

NodeMap uses a variety of techniques to probe a wide range of network characteristics.  A simple gethostname function determines processes on the same SMP node.  The uname function and compiler definitions are used to identify MPP systems, identifying the global network topology.  Subnet mappers can be probed to determine the topology for Myrinet and InfiniBand networks.  Latency and throughput measurements using simple ping-pong tests can help identify the topology of Ethernet clusters.  Static configuration files are used as a last resort.

If the programmer maps the application to a virtual mesh, the rest will be done automatically.  This should at least take one big step towards providing a portable means to take advantage of the network topology.