Intelligent Speculation for Pipelined Multithreading [abstract] (PDF)
Neil Amar Vachharajani
Ph.D. Thesis, Department of Computer Science,
Princeton University, November 2008.
In recent years, microprocessor manufacturers have shifted their focus
from single-core to multi-core processors. Since many of today's
applications are single-threaded and since it is likely that many of
tomorrow's applications will have far fewer threads than there
will be processor cores, automatic thread extraction is an essential
tool for effectively leveraging today's multi-core and
tomorrow's many-core processors. A recently proposed technique,
Decoupled Software Pipelining (DSWP), has demonstrated promise by
partitioning loops into long-running threads organized into a
pipeline. Using a pipeline organization and execution decoupled by
inter-core communication queues, DSWP offers increased execution
efficiency that is largely independent of inter-core communication
latency and variability in intra-thread performance. This dissertation extends the pipelined parallelism paradigm with
speculation. Using speculation, dependences that manifest infrequently
or are easily predictable can be safely ignored by the compiler
allowing it to carve more, and better balanced, thread-based pipeline
stages from a single thread of execution. Prior speculative threading
proposals were obligated to speculate most, if not all, loop-carried
dependences to squeeze the code segment under consideration into the
mold required by the parallelization paradigm. Unlike those
techniques, this dissertation demonstrates that speculation need only
break the longest few dependence cycles to enhance the applicability
and scalability of the pipelined multi-threading paradigm. By
speculatively breaking these cycles, instructions that were formerly
restricted to a single thread to ensure decoupling are now free to
span multiple threads. To demonstrate the effectiveness of speculative
pipelined multi-threading, this dissertation presents the design and
experimental evaluation of our fully automatic compiler
transformation, Speculative Decoupled Software Pipelining, a
speculative extension to DSWP. This dissertation additionally introduces multi-threaded transactional
memories to support speculative pipelined multi-threading. Similar to
past speculative parallelization approaches, speculative pipelined
multi-threading relies on runtime-system support to buffer speculative
modifications to memory. However, this dissertation demonstrates that
existing proposals to buffer speculative memory state, transactional
memories, are insufficient for speculative pipelined multi-threading
because the speculative buffers are restricted to a single
thread. Further, this dissertation demonstrates that this limitation
leads to modularity and composability problems even for transactional
programming, thus limiting the potential of that approach also. To
surmount these limitations, this thesis introduces multi-threaded
transactional memories and presents an initial hardware
implementation.