Automatic CPU-GPU Communication Management and Optimization [abstract] (ACM DL, PDF)
Thomas B. Jablin, Prakash Prabhu, James A. Jablin, Nick P. Johnson, Stephen R. Beard, and David I. August
Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2011.
The performance benefits of GPU parallelism can be enormous, but unlocking this
performance potential is challenging. The applicability and performance of GPU
parallelizations is limited by the complexities of CPU-GPU communication. To
address these communications problems, this paper presents the first fully
automatic system for managing and optimizing CPU-GPU communcation. This system,
called the CPU-GPU Communication Manager (CGCM), consists of a run-time library
and a set of compiler transformations that work together to manage and optimize
CPU-GPU communication without depending on the strength of static compile-time
analyses or on programmer-supplied annotations. CGCM eases manual GPU
parallelizations and improves the applicability and performance of automatic GPU
parallelizations. For 24 programs, CGCM-enabled automatic GPU parallelization
yields a whole program geomean speedup of 5.36x over the best sequential
CPU-only execution.