The simplicity of the PRAM model avoids communication latency, memory limitations and other difficult issues. Efforts have been made to develop more realistic models, yet the restriction of finite memory space remains virtually unaddressed. Models cannot encompass all aspects of the hardware and the programming of parallel machines without losing much of their simplicity and usefulness. However, when implementing non-numeric algorithms (e.g., merging, sorting, graph algorithms), efficient use of available memory can be a major design concern.
The transport of code from one machine model to a different model presents the programmer with a number of challenges. Parallel numeric computations are generally very regular with respect to memory access, processor cooperation and ordering of computations. The efficient implementation of such algorithms is intimately tied to the executing architecture and, consequently, may require a large amount of redesign when moved to a different machine. Non-numeric computations are typically driven by the values of the data itself. The unpredictable nature of non-numeric computations does not often allow one to take advantage of the architectural constraints. Without such benefits, the design and porting of non-numeric algorithms between machines can be simpler (though less efficient) than for numeric algorithms. Even so, many practical considerations and obstacles cannot be anticipated from models.
We investigate the process of transporting non-numeric algorithms between diverse machine models. For this study we have chosen a time-space optimal parallel merge as our prototypical algorithm. This merging strategy possesses many algorithmic properties shared by a wide range of other parallel non-numeric algorithms. The merge algorithm has the additional property of requiring only a constant number of auxiliary memory cells beyond that needed to hold the data for merging, effectively addressing the finite bound of memory on practical machines. As a result of the variety of architectures on which the merge algorithm is implemented, we also examine the use of the merge code as a non-numeric parallel benchmark. Based on these experiences, we develop improvements to reduce the amount of communication in the merge algorithm. In all cases, actual implementations and executions of programs are described.