Decoupled Software Pipelining: A Promising Technique to Exploit Thread-Level Parallelism [abstract]
Guilherme Ottoni, Ram Rangan, Neil Vachharajani, and David I. August
Proceedings of the Fourth Workshop on Explicitly Parallel
Instruction Computer Architectures and Compiler Technology (EPIC), March 2005.
Processor manufacturers are moving to multi-core,
multi-threaded designs because of several factors such as cost, ease
of design and scalability. As most processors will be multi-threaded
in the future, exposing thread-level parallelism (TLP) is a problem of
increasing importance. Because the adequate granularity of the threads
is dependent on the target architecture, and writing sequential
applications is usually more natural, the compiler plays an important
role in performing the mapping from applications to the appropriate
multi-threaded code. In spite of this, few general-purpose compilation
techniques have been proposed to assist in this task. In this paper,
we propose Decoupled Software Pipelining (DSWP) to extract
thread-level parallelism. DSWP can convert most application loops into
a pipeline of loop threads. This brings pipeline parallelism to most
application loops including those not targeted by traditional software
pipelining. DSWP does not rely on complex hardware speculation support
since it is a non-speculative transformation. This paper describes the
DSWP technique, discusses its implementation in a compiler, and
presents experimental results demonstrating that it is a promising
technique to extract TLP.