Static Dependence Analysis in an Infrastructure for Automatic Parallelization [abstract] (PDF)
Nick P. Johnson
Ph.D. Thesis, Department of Computer Science,
Princeton University, September 2015.
Now that parallel architectures are common, software must exploit
multiple cores to fully utilize hardware resources and
achieve efficient execution.
Restructuring applications for explicit parallelism
requires developers to reason about low-level details
to avoid concurrency bugs and
achieve parallel performance.
Automatic
thread extraction relieves developers of this arduous
task.
This dissertation presents a compiler
{\em middle-ware} for automatic thread extraction---analyses and
transformations that allow the compiler to deliver
parallel performance for sequentially-specified input programs.
This middle-ware reconsiders
the
compilation infrastructure to present alternative technologies
better suited to the needs of automatic thread extraction.
Specifically,
{\bf Collaborative Dependence Analysis:}
Since no analysis algorithm can precisely decide all
analysis facts, this dissertation
combines simple analysis algorithms into a {\em collaborative ensemble}
algorithm:
the ensemble algorithm can disprove dependence queries which no
member disproves alone. Collaboration enables
{\em factored} development, which prescribes the development of
small, orthogonal analysis algorithms tailored to the needs of
analysis clients. Results demonstrate
independently-developed analysis algorithms collaborating
to solve complex, multiple-logic queries.
Further, results
demonstrate a large impact on performance:
for some benchmarks,
analysis strength is the difference between 11$\times$ slowdown
and 28$\times$ speedup on 50 cores.
{\bf Scaling Analysis to Larger Scopes:}
The infrastructure builds around Parallel-Stage
Decoupled Software Pipelining (PS-DSWP) thread extraction.
PS-DSWP targets large, hot program scopes to overcome the
non-recurring overheads of parallel execution.
However, as scopes grow, the burden of analysis becomes prohibitive.
This dissertation contributes a faster algorithm to
compute a dependence graph to drive PS-DSWP.
This algorithm identifies dependence edges which {\em cannot}
affect PS-DSWP. It skips dependence analysis queries pertaining
to unimportant edges, reducing analysis
time---or allowing more expensive analysis
algorithms---for the remaining, important queries.
Evaluation demonstrates that
the algorithm computes the
${\rm DAG}_{\rm SCC}$ twice as fast
using half as many
dependence analysis queries without sacrificing analysis
precision.
{\bf Incorporating Speculation into the Compilation Pipeline:}
A parallelization system may speculate
various properties to enable thread extraction.
This dissertation
presents design patterns to
simplify development of novel speculation types.
The dissertation integrates these contributions
into a robust and flexible middle-ware upon which many works are
built.