Automatically Exploiting Cross-Invocation Parallelism Using Runtime Information [abstract] (PDF)
Jialuh Huang
Ph.D. Thesis, Department of Computer Science,
      Princeton University, September 2013.
  
  
Automatic parallelization is a promising approach to producing
scalable multi-threaded programs for multi-core architectures. Most
existing techniques parallelize independent loops and insert global
synchronizations at the end of each loop invocation. For programs with
few loop invocations, these global synchronizations do not limit
parallel execution performance. However, for programs with many loop
invocations, those synchronizations can easily become the performance
bottleneck since they frequently force all threads to wait, losing
potential parallelization opportunities.  To address this problem,
some automatic parallelization techniques apply static analyses to
enable cross-invocation parallelization.  Instead of waiting, threads
can execute iterations from follow-up invocations if they do not cause
any conflict. However, static analysis must be conservative and cannot
handle irregular dependence patterns manifested by particular program
inputs at runtime.
 
In order to enable more parallelization across loop invocations, this thesis
presents two novel automatic parallelization techniques: DOMORE and SpecCross.
Unlike existing techniques relying on static analyses, these two techniques
take advantage of runtime information to achieve much more aggressive
parallelization. DOMORE constructs a custom runtime engine which
non-speculatively observes dependences at runtime and synchronizes iterations
only when necessary; while SpecCross applies software speculative barriers to
permit some of the threads to execute past the invocation boundaries. The two
techniques are complimentary in the sense that they can parallelize programs
with potentially very different characteristics. SpecCross, with less runtime
overhead, works best when programs' cross-invocation dependences seldom cause
any runtime conflict. DOMORE, on the other hand, has its advantage in handling
dependences which cause frequent conflicts. Evaluating implementations of
DOMORE and SpecCross demonstrates that both techniques can achieve much better
scalability compared to existing automatic parallelization techniques.
