PDIP: Priority Directed Instruction Prefetching

[abstract] (PDF)
Bhargav Reddy Godala, Sankara Prasad Ramesh, Gilles A. Pokam, Jared Stark, Andre Seznec, Dean Tullsen, and David I. August
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS), April 2024.
Awarded Best Paper (one of six) out of 922 submissions.
Awarded all top ACM Reproducibility Badges offered by the Artifact Evaluation Committee.
Modern server workloads have large code footprints which are
prone to front-end bottlenecks due to instruction cache capac-
ity misses. Even with the aggressive fetch directed instruction
prefetching (FDIP), implemented in modern processors, there
are still significant front-end stalls due to I-Cache misses. A
major portion of misses that occur on a BPU-predicted path
are tolerated by FDIP without causing stalls. Prior work on
instruction prefetching, however, has not been designed to
work with FDIP processors. Their singular goal is reducing
I-Cache misses, whereas FDIP processors are designed to
tolerate them. Designing an instruction prefetcher that works
in conjunction with FDIP requires identifying the fraction of
cache misses that impact front-end performance (that are not
fully hidden by FDIP), and only targeting them.
In this paper, we propose Priority Directed Instruction
Prefetching (PDIP), a novel instruction prefetching technique
that complements FDIP by issuing prefetches for only targets
where FDIP struggles â along the resteer path of front-end
stall-causing events. PDIP identifies these targets and asso-
ciates them with a trigger for future prefetch. At a 43.5KB
budget, PDIP achieves up to 5.1% IPC speedup on important
workloads such as cassandra and a geomean IPC speedup
of 3.2% across 16 benchmarks.