Static and Dynamic Instruction Mapping for Spatial Architectures [abstract] (PDF)
Feng Liu
Ph.D. Thesis, Department of Electrical Engineering,
Princeton University, 2018.
In response to the technology scaling trends, spatial architectures have emerged
as a new style of processors for executing programs more efficiently. Unlike
traditional out-of-order (OoO) processors, which time-share a small set of
functional units, a spatial computer is composed of hundreds or even thousands
of simple and replicated functional units. Spatial architectures avoid the
overheads of time-sharing and of generating schedules repeatedly, by mapping
instruction sequences onto the functional units explicitly and reusing the map-
ping across multiple invocations.
Currently, spatial architectures mainly use static methods to map and schedule
instructions onto the arrays of functional units. The existing methods have
several limitations: First, for programs with irregular memory accesses and
control flows, they yield poor performance because the functional units need
to be invoked sequentially to respect data and control dependences. Second,
static methods cannot fully exploit speculation techniques, which are the
dominant performance sources in OoO processors. Finally, static methods cannot
adapt to changing workloads and are not compatible across hardware generations.
To address these issues and improve the applicability of spatial architectures,
this dissertation proposes two techniques. The first, Coarse-Grained Pipelined
Accelerators (CGPA), is a static compiling framework that exploits the hidden
parallelism within irregular C/C++ loops and translates them into spatial
hardware modules. The proposed technique has been implemented as a compiler pass
and the experiment shows 3.3x speedup over the performance achieved by an
open-source tool baseline.
The second technique, Dynamic Spatial Architecture Mapping (DYNASPAM), reuses
the speculation system in the OoO processors to dynamically produce high
performance scheduling and execution on a dedicated spatial fabric. The proposed
technique is modeled by a cycle accurate simulator and the experiment shows the
new technique can achieve 1.4x geomean performance improvement and 23.9%
energy consumption reduction, compared to an aggressive OoO processor baseline.