A New Approach to Thread Extraction for General-Purpose
Programs [abstract] (PDF)
Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August
Proceedings of the 2nd Watson Conference on 
Interaction between Architecture, Circuits, and Compilers (PAC2), September 2005.
  
  
 
Until recently, a steadily rising clock rate and other uniprocessor
microarchitectural improvements could be relied upon to consistently
deliver increasing performance for a wide range of applications.
Current difficulties in maintaining this trend have lead
microprocessor companies to add value by incorporating multiple
processors on a chip. Unfortunately, since decades of compiler
research have not succeeded in delivering automatic threading for
prevalent code properties, this approach demonstrates no improvement
for a large class of existing codes. To find useful work for chip multiprocessors, we propose an automatic
approach to thread extraction, called Decoupled Software Pipelining
(DSWP). DSWP exploits the fine-grained pipeline parallelism lurking
in most applications to extract long-running, concurrently executing
threads. Use of the non-speculative and truly decoupled threads
produced by DSWP can increase execution efficiency and provide
significant latency tolerance, mitigating design complexity by
reducing inter-core communication and per-core resource requirements.
Using our initial fully automatic compiler implementation and a
validated processor model, we prove the concept by demonstrating
significant gains for dual-core chip multiprocessor models running a
variety of codes. Then, we explore simple opportunities missed by our
initial compiler implementation which suggest a promising future for
this approach.
