AI in EE

AI IN DIVISIONS

AI in Circuit Division

AI in EE

AI IN DIVISIONS

AI in Circuit Division ​

AI in Circuit Division

Deferred Dropout: An Algorithm-Hardware Co-Design DNN Training Method Provisioning Consistent High Activation Sparsity

Title : Deferred Dropout: An Algorithm-Hardware Co-Design DNN Training Method Provisioning Consistent High Activation Sparsity

 

Author: Kangkyu Park, Yunki Han, Lee-Sup Kim

 

Conference : IEEE/ACM International Conference On Computer Aided Design 2021

 

Abstract: This paper proposes a deep neural network training method that provisions consistent high activation sparsity and the ability to adjust the sparsity. To improve training performance, prior work reduces the memory footprint for training by exploiting input activation sparsity which is observed due to the ReLU function. However, the previous approach relies solely on the inherent sparsity caused by the function, and thus the footprint reduction is not guaranteed. In particular, models for natural language processing tasks like BERT do not use the function, so the models have almost zero activation sparsity and the previous approach loses its efficiency. In this paper, a new training method, Deferred Dropout, and its hardware architecture are proposed. With the proposed method, input activations are dropped out after the conventional forward-pass computation. In contrast to the conventional dropout where activations are zeroed before forward-pass computation, the dropping timing is deferred until the completion of the computation. Then, the sparsified activations are compressed and stashed in memory. This approach is based on our observation that networks preserve training quality even if only a few high magnitude activations are used in the backward pass. The hardware architecture enables designers to exploit the tradeoff between training quality and activation sparsity. Evaluation results demonstrate that the proposed method achieves 1.21-3.60 × memory footprint reduction and 1.06-1.43 x speedup on the TPUv3 architecture, compared to the prior work5