This paper presents HNPU, which is an energy-efficient DNN training processor by adopting algorithm-hardware co-design. The HNPU supports stochastic dynamic fixed-point representation and layer-wise adaptive precision searching unit for low-bit-precision training. It additionally utilizes slice-level reconfigurability and sparsity to maximize its efficiency both in DNN inference and training. Adaptive-bandwidth reconfigurable accumulation network enables reconfigurable DNN allocation and maintains its high core utilization even in various bit-precision conditions. Fabricated in a 28nm process, the HNPU accomplished at least 5.9 × higher energy-efficiency and 2.5 × higher area efficiency in actual DNN training compared with the previous state-of-the-art on-chip learning processors.
Han, Donghyeon, et al. “HNPU: An adaptive DNN training processor utilizing stochastic dynamic fixed-point and active bit-precision searching.” IEEE Journal of Solid-State Circuits 56.9 (2021): 2858-2869.