Home   People   Publications  
 

Refereed International Conference Publications

MPC-Wrapper: Fully Harnessing the Potential of Samsung Aquabolt-XL HBM2-PIM on FPGAs [abstract]
Jinwoo Choi, Yeonan Ha, Hanna Cha, Seil Lee, Sungchul Lee, Jounghoo Lee, Shin-haeng Kang, Bongjun Kim, Hanwoong Jung, Hanjun Kim, and Youngsok Kim
To Appear: The 32nd IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 2024.

Processing-In-Memory (PIM) is an attractive solution for mitigating frequent and large data movement between computational units and memory devices. Among various PIM implementations, Samsung Aquabolt-XL is an HBM2 memory device which implements 16 PIM-enabled pseudo-channels and associates an In-Memory Processor (IMP) to each pair of the memory banks. Recent studies have shown that Aquabolt-XL can greatly accelerate various applications (e.g., deep learning) by offloading memory-intensive operations (e.g., matrix-vector multiplications) to the IMPs. However, the prior study fails to fully utilize Aquabolt-XL and achieves limited performance gains by offloading operations to the IMPs of only a single pseudo-channel. Ideally, utilizing all the 16 pseudo-channels of Aquabolt-XL can further accelerate the key operations by a factor of 16x compared to utilizing only a single pseudo-channel. To fully exploit Aquabolt-XL, therefore, memory-intensive operations should be offloaded to and concurrently executed on the IMPs of all the PIM-enabled pseudo-channels. This paper presents MPC-Wrapper, a multi-pseudo-channel wrapper interface which allows memory-intensive operations to be offloaded to and concurrently executed on the IMPs of all the 16 PIM-enabled pseudo-channels of Aquabolt-XL. First, MPC-Wrapper allows all the PIM-enabled pseudo-channels to operate independently and in parallel, thus achieving high scalability needed for fully utilizing all the PIM-enabled pseudo-channels of Aquabolt-XL. Second, MPC-Wrapper is highly flexible as it exposes the PIM-enabled pseudo-channels as separate ports and enables an FPGA logic to flexibly utilize any set of the PIM-enabled pseudo-channels according to its needs. Third, MPC-Wrapper achieves high usability by hiding the complex low-level interactions between the memory controller and Aquabolt-XL for initializing and invoking the PIM-enabled pseudo-channels from the other FPGA logics. Using an Aquabolt-XL-equipped Xilinx Alveo U280 FPGA and four memory-intensive benchmarks, we show that utilizing all the 16 PIM-enabled pseudo-channels of Aquabolt-XL with MPC-Wrapper achieves a geometric mean speedup of 13.66x over the baseline single PIM-enabled pseudo-channel implementations of the benchmarks.