Title: A Full HD 60 fps CNN Super Resolution Processor with Selective Caching based Layer Fusion for Mobile Devices
Authors: Ju-Hyoung Lee, Dong-Joo Shin, Jin-Su Lee, Jin-Mook Lee, Sang-Hoon Kang, and Hoi-Jun Yoo
Recently, super resolution algorithms based on convolution neural network (SR-CNN) has been broadly utilized to enable mobile devices to support better user experience (UX) from video quality enhancement or far object recognition. However, SRCNN’s distinct architecture makes it harder to meet the high throughput requirement in conventional hardware targeting classification CNNs. It is because the intermediate feature maps of SR do not decrease when they pass through the layers, while classification CNN’s feature maps shrink due to pooling or strided convolutions. Because of the huge amount of feature maps in SR-CNN, it requires larger external memory access (EMA), on-chip memory footprint and computation workload than the classification CNN.
In this work, we propose a high throughput SR-CNN processor which minimizes the amount of EMA and on-chip memory footprint with three key features; 1) Selective caching based layer fusion (SCLF) algorithm to reduce the overall memory cost (product of on-chip memory size and EMA), 2) memory compaction scheme to reduce the on-chip memory footprint further and 3) cyclic ring core architecture to increase the PE utilization for SCLF. As a result, the implemented processor achieves 60 frames-per-second throughput in generating full HD images.
Figure 1. An illustration of the proposed SR computing algorithm & proposed ring core architecture