Performance Scalability of Decoupled Software Pipelining [abstract] (ACM DL, PDF)
Ram Rangan, Neil Vachharajani, Guilherme Ottoni, and David I. August
ACM Transactions on Architecture and Code Optimization (TACO), Volume 5, Number 2, August 2008.
Any successful solution to using multi-core processors to scale
general-purpose program performance will have to contend with rising
inter-core communication costs while exposing coarsegrained
parallelism. Recently proposed pipelined multithreading (PMT)
techniques have been demonstrated to have general-purpose
applicability and are also able to effectively tolerate intercore
latencies through pipelined inter-thread communication. These
desirable properties make PMT techniques strong candidates for program
parallelization on current and future multi-core processors and
understanding their performance characteristics is critical to their
deployment. To that end, this paper evaluates the performance scalability of a
general-purpose PMT technique called decoupled software pipelining
(DSWP) and presents a thorough analysis of the communication
bottlenecks that must be overcome for optimal DSWP scalability.