Seok-Hwan Oh, Myeong-Gee Kim, Youngmin Kim, Hyuksool Kwon, Hyeon-Min Bae, “A Neural Framework for Multi-Variable Lesion Quantification Through B-mode Style Transfer”, International Conference on Medical Image Computing & Computer Assisted Intervention (MICCAI), Sept. 2021.

Abstract: In this paper, we present a scalable lesion-quantifying neural network based on b-mode-to-quantitative neural style transfer. Quantitative tissue characteristics have great potential in diagnostic ultrasound since pathological changes cause variations in biomechanical properties. The proposed system provides four clinically critical quantitative tissue images such as sound speed, attenuation coefficient, effective scatterer diameter, and effective scatterer concentration simultaneously by applying quantitative style information to structurally accurate b-mode images. The proposed system was evaluated through numerical simulation, and phantom and ex-vivo measurements. The numerical simulation shows that the proposed framework outperforms the baseline model as well as existing state-of-the-art methods while achieving significant parameter reduction per quantitative variables. In phantom and ex-vivo studies, the BQI-Net demonstrates that the proposed system achieves sufficient sensitivity and specificity in identifying and classifying cancerous lesions.

2

Myeong-Gee Kim, Seok-Hwan Oh, Youngmin Kim, Hyuksool Kwon, Hyeon-Min Bae, “Learning-based attenuation quantification in abdominal ultrasound”, International Conference on Medical Image Computing & Computer Assisted Intervention (MICCAI), Sept. 2021

Myeong-Gee Kim, Seok-Hwan Oh, Youngmin Kim, Hyuksool Kwon, Hyeon-Min Bae, “Learning-based attenuation quantification in abdominal ultrasound”, International Conference on Medical Image Computing & Computer Assisted Intervention (MICCAI), Sept. 2021

Abstract: The attenuation coefficient (AC) of tissue in medical ultrasound has great potential as a quantitative biomarker due to its high sensitivity to pathological properties. In particular, AC is emerging as a new quantitative biomarker for diagnosing and quantifying hepatic steatosis. In this paper, a learning-based technique to quantify AC from pulse-echo data obtained through a single convex probe is presented. In the proposed method, ROI adaptive transmit beam focusing (TxBF) and envelope detection schemes are employed to increase the estimation accuracy and noise resilience, respectively. In addition, the proposed network is designed to extract accurate AC of the target region considering attenuation/sound speed/scattering of the propagating waves in the vicinities of the target region. The accuracy of the proposed method is verified through simulation and phantom tests. In addition, clinical pilot studies show that the estimated liver AC values using the proposed method are correlated strongly with the fat fraction obtained from magnetic resonance imaging (R2=0.89R2=0.89, p<0.001p<0.001). Such results indicate the clinical validity of the proposed learning-based AC estimation method for diagnosing hepatic steatosis.

1

Compact Mixed-Signal Convolutional Neural Network Using a Single Modular Neuro

“Compact Mixed-Signal Convolutional Neural Network Using a Single Modular Neuron”, IEEE Transactions on Circuits and Systems I: Regular Papers ( Volume: 67, Issue: 12, Dec. 2020) 

Abstract:

This paper demonstrates a compact mixed-signal (MS) convolutional neural network (CNN) design procedure by proposing a MS modular neuron unit that alleviates analog circuit related design issues such as noise. Through the first step of the proposed procedure, we design a CNN in software with a minimized number of channels for each layer while satisfying the network performance to the target, which creates low representational and computational cost. Then, the network is reconstructed and retrained with a single modular neuron that is recursively utilized for the entire network for the maximum hardware efficiency with a fixed number of parameters that consider signal attenuation. For the last step of the proposed procedure, the parameters of the networks are quantized to an implementable level of MS neurons. We designed the networks for MNIST and Cifar-10 and achieved compact CNNs with a single MS neuron with 97% accuracy for MNIST and 85% accuracy for Cifar-10 whose representational cost and computational cost are reduced to least two times smaller than prior works. The estimated energy per classification of the hardware network for Cifar-10 with a single MS neuron, designed with optimum noise and matching requirements, is 0.5μ J, which is five times smaller than its digital counterpart.

2

3

MixedNet: Network Design Strategies for Cost-Effective Quantized CNNs, IEEE Access, 23 August 2021

This paper proposes design strategies for a low-cost quantized neural network. To prevent the classification accuracy from being degraded by quantization, a structure-design strategy that utilizes a large number of channels rather than deep layers is proposed. In addition, a squeeze-and-excitation (SE) layer is adopted to enhance the performance of the quantized network. Through a quantitative analysis and simulations of the quantized key convolution layers of ResNet and MobileNets, a low-cost layer-design strategy for use when building a neural network is proposed. With this strategy, a low-cost network referred to as a MixedNet is constructed. A 4-bit quantized MixedNet example achieves an on-chip memory size reduction of 60% and fewer memory access by 53% with negligible classification accuracy degradation in comparison with conventional networks while also showing classification accuracy rates of approximately 73% for Cifar-100 and 93% for Cifar-10.

1

A 1.23W/mm2 83.7%-Efficiency 400MHz 6-Phase Fully-Integrated Buck Converter in 28nm CMOS with On-Chip Capacitor Dynamic Re-Allocation for Inter-Inductor Current Balancing and Fast DVS of 75mV/ns

This paper presents a 400MHz 6-phase buck converter with bond-wire inductors. Peak-and-valley differential sensing, which is realized with a reused on-chip capacitor, corrects inter-inductor current imbalance without power overhead. Owing to dynamic re-allocation of on-chip capacitors, transient response can be accelerated with constrained ripple. The chip fabricated in 28-nm achieved the state-of-the-art power density of 1.23W/mm2 and DVS rate of 75mV/ns. Peak efficiency of 83.7% was measured.

1

An Overview of Processing-in-Memory Circuits for Artificial Intelligence and Machine Learning

Title : An Overview of Processing-in-Memory Circuits for Artificial Intelligence and Machine Learning

Authors :  Donghyuk Kim, Chengshuo Yu, Shanshan Xie, Yuzong Chen, Joo-Young Kim, Bongjin Kim, Jaydeep Kulkarni, Tony Tae-Hyoung Kim

Publications : IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), 2022

Artificial intelligence (AI) and machine learning (ML) are revolutionizing many fields of study, such as visual recognition, natural language processing, autonomous vehicles, and prediction. Traditional von-Neumann computing architecture with separated processing elements and memory devices have been improving their computing performances rapidly with the scaling of process technology. However, in the era of AI and ML, data transfer between memory devices and processing elements becomes the bottleneck of the system. To address this data movement issue, memory-centric computing takes an approach of merging the memory devices with processing elements so that computations can be done in the same location without moving any data. Processing-In-Memory (PIM) has attracted research community’s attention because it can improve the energy efficiency of memory-centric computing systems substantially by minimizing the data movement. Even though the benefits of PIM are well accepted, its limitations and challenges have not been investigated thoroughly. This paper presents a comprehensive investigation of state-of-the-art PIM research works based on various memory device types, such as static-random-access-memory (SRAM), dynamic-random-access-memory (DRAM), and resistive memory (ReRAM). We will present the overview of PIM designs in each memory type, covering from bit cells, circuits, and architecture. Then, a new software stack standard and its challenges for incorporating PIM with the conventional computing architecture will be discussed. Finally, we will discuss various future research directions in PIM for further reducing the data conversion overhead, improving

3

 

FIXAR: A Fixed-Point Deep Reinforcement Learning Platform with Quantization-Aware Training and Adaptive Parallelism

2 0

 

Deep reinforcement learning (DRL) is a powerful technology to deal with decision-making problem in various application domains such as robotics and gaming, by allowing an agent to learn its action policy in an environment to maximize a cumulative reward. Unlike supervised models which actively use data quantization, DRL still uses the single-precision floating-point for training accuracy while it suffers from computationally intensive deep neural network (DNN) computations.

In this paper, we present a deep reinforcement learning acceleration platform named FIXAR, which employs fixed-point data types and arithmetic units for the first time using a SW/HW co-design approach. We propose a quantization-aware training algorithm in fixed-point, which enables to reduce the data precision by half after a certain amount of training time without losing accuracy. We also design a FPGA accelerator that employs adaptive dataflow and parallelism to handle both inference and training operations. Its processing element has configurable datapath to efficiently support the proposed quantized-aware training. We validate our FIXAR platform, where the host CPU emulates the DRL environment and the FPGA accelerates the agent’s DNN operations, by running multiple benchmarks in continuous action spaces based on a latest DRL algorithm called DDPG. Finally, the FIXAR platform achieves 25293.3 inferences per second (IPS) training throughput, which is 2.7 times higher than the CPU-GPU platform. In addition, its FPGA accelerator shows 53826.8 IPS and 2638.0 IPS/W energy efficiency, which are 5.5 times higher and 15.4 times more energy efficient than those of GPU, respectively. FIXAR also shows the best IPS throughput and energy efficiency among other state-of-the-art acceleration platforms using FPGA, even it targets one of the most complex DNN models.

Wonjae Lee, Yonghwi Kwon, and Youngsoo Shin, “Fast ECO leakage optimization using graph convolutional network”, Proc. Great Lakes Symp. on VLSI (GLSVLSI), Sep. 2020.

회로 설계 마지막 단계에서 더 적은 누설전류를 가지는 cell (e.g. 더 높은 Vth 혹은 더 긴 gate 길이를 가지는 cell)로 바꾸는 engineering change order (ECO)는 과정을 통해 회로의 소비 전력을 줄일 수 있다. 하지만 이 과정은 cell swapping하는것과 cell들의 timing을 검증하는 과정이 반복적으로 수행되기 때문에 오랜 시간이 소요된다. 본 논문에서는 graph convolutional network (GCN)을 적용하여 빠른 ECO를 수행하는 것을 제안한다. GCN cell들간의 연결관계와 timing 정보를 통해 Vth를 예측하고 평균적으로 83%의 정확도로 Vth를 정확하게 예측하였다. 또한, minimum implant width (MIW)를 고려한 timing violation fix를 위하여 heuristic Vth 재배정 방법을 제안하였다. 이를 통하여 누설전류의 52% 감소 (기존 ECO의 경우 61% 감소)와 기존 ECO방법 대비 두배 이상 빠른 ECO를 수행하였다.

 

AI in EE 신영수교수님 연구실4 0

Yonghwi Kwon and Youngsoo Shin, “Optimization of accurate resist kernels through convolutional neural network," Proc. SPIE Advanced Lithography, Feb. 2021

빠르고 정확한 리소그래피 시뮬레이션 모델은 OPC 리소그래피 검증 등에 필수적이다. 이러한 리소그래피 모델은 웨이퍼에 가해지는 빛의 세기를 계산하는 optical 모델과 이를 이용하여 PR 모양을 결정하는 resist 모델로 이루어져 있다. Resist 모델의 경우 빛의 세기를 map으로 나타낸 이미지와 Gaussian kernel들의 convolution 결과를 weighted sum , 이를 특정 threshold 비교하여 PR 현상 여부를 결정하는 방식이다. Gaussian 같은 단순한 모양의 kernel 이용하기 때문에 정확한 시뮬레이션 결과를 얻기 위해서는 많은 수의 kernel 필요하고 이에 따라 많은 연산이 필요하게 된다. Resist 모델의 연산 방식이 CNN 유사한 것에 착안하여 resist 모델을 CNN으로 나타내고, free-form 가지는 resist kernel들을 CNN 학습시킴으로써 최적화하였다. 기존에 9개의 Gaussian resist kernel 사용하던 모델을 2개의 free-form kernel 대체하였고, 이를 통해 35% 빠른 리소그래피 시뮬레이션 속도와 함께 모델 정확도가 15% 향상되었다.

 

AI in EE 신영수교수님 연구실3 0

Yonghwi Kwon, Daijoon Hyun, Giyoon Jung, and Youngsoo Shin, “Dynamic IR drop prediction using image-to-image translation neural network," Proc. Int'l Symp. on Circuits and Systems (ISCAS), May 2021.

회로의 실제 동작 중에 최대로 발생하는 IR drop dynamic IR drop 분석은 매우 오래 걸리는 과정이다. 따라서, 연구에서는 이미지이미지 변환 인공신경망의 일종인 U-net 이용하여 빠르게 dynamic IR drop 분석을 수행하는 방법을 제안하였다. U-net input으로는 gate까지의 effective 저항, gate 시간별 전류 소모량, 가장 가까운 power pad까지의 거리를 각각 map으로 나타낸 이미지 clip 들어가게 된다. 보다 빠른 IR drop 예측을 위하여 모든 clip 예측하지 않고 높은 IR drop 발생 가능성이 있는 time window 대해서만 예측을 수행하며, PDN 저항을 빠르게 근사값으로 구하는 방법을 적용하였다. 실험결과, 제안한 IR drop 예측 방법은 실제 dynamic IR drop 분석방법에 대비하여 20 빠르면서 15% 오차를 보였다.

 

AI in EE 신영수교수님 연구실2 0