Amortizing Software Queue Overhead for Pipelined Inter-Thread Communication [abstract] (PDF)
Ram Rangan and David I. August
Proceedings of the Workshop on Programming Models for Ubiquitous Parallelism (PMUP), September 2006.
Future chip multiprocessors are expected to contain multiple on-die
processing cores. Increased memory system contention and wire delays
will result in high inter-core latencies in these processors. Thus,
parallelizing applications to efficiently execute on multiple contexts
is key to achieving continued performance improvements. Recently
proposed pipelined multithreading (PMT) techniques have shown
significant promise for both manual and automatic
parallelization. They tolerate increasing inter-thread communication
delays by enforcing acyclic dependences amongst communicating threads
and pipelining communication. However, lack of efficient communication support for such programs
hinders related language and compiler research. While researchers have
proposed dedicated interconnects and storage for inter-core
communication, such mechanisms are not cost-effective, consume extra
power, demand chip redesign effort, and necessitate complex operating
system modifications. Software impelementations of shared memory
queues avoid these problems. But, they tend to have heavy overhead per
communication operation, causing them to negate parallelization
benefits and worse still, to perform slower than the original
single-threaded codes. In this paper, we present a simple compiler
analysis to coalesce synchronization and queue pointer updates for
select communication operations, to minimize the intra-thread
overhead of software queue implementations. A preliminary comparison
of static schedule heights shows a considerable performance
improvement over existing software queue
implementations.