AI in EE

AI IN DIVISIONS

AI in Circuit Division

“DPIM: A 19.36 TOPS/W 2T1C eDRAM Transformer-in-Memory Chip with Sparsity-Aware Quantization and Heterogeneous Dense-Sparse Core,” IEEE European Solid-State Electronics Research Conference, Sep. 2024 Accept (김주영 교수 연구실)

Donghyuk Kim, Jae-Young Kim, Hyunjun Cho, Seungjae Yoo, Sukjin Lee, Sungwoong Yune, Hoichang Jeong, Keonhee Park, Ki-Soo Lee, Jongchan Lee, Chanheum Han, Gunmo Koo, Yuli Han, Jaejin Kim, Jaemin Kim, Kyuho Lee, Joo-Hyung Chae, Kunhee Cho, and Joo-Young Kim, “DPIM: A 19.36 TOPS/W 2T1C eDRAM Transformer-in-Memory Chip with Sparsity-Aware Quantization and Heterogeneous Dense-Sparse Core,” IEEE European Solid-State Electronics Research Conference, Sep. 2024

Abstract: This paper presents DPIM, the first 2T1C eDRAM Transformer-in-memory chip. Its high-density eDRAM cell supports large-capacity processing-in-memory (PIM) macros of 1.38 Mb/mm2, reducing external memory access. DPIM adopts a sparse-aware quantization scheme to entire layers of Transformer, which quantizes the model to 8-bit integer (INT8) with a minimal accuracy drop of 2\% in the BERT-large model on the GLUE dataset while increasing the bit-slice sparsity ratio of both weight and activation from dense matrices to 83.3\% and 88.4\%, respectively. Its heterogeneous PIM macro supports intensive dense matrix multiplications with an extreme to moderate range of sparse matrix multiplications with a peak throughput of 3.03-12.12 TOPS, enhancing the efficiency up to 4.84-19.36 TOPS/W.

Main figure4