EE Prof. Rhu, Minsoo’s Team Build First-ever Privacy-aware A. I Semiconductor, Speeding Up the Differentially Private Learning Process

EE Professor Rhu, Minsoo’s Research Team Build First-ever Privacy-aware Artificial Intelligence Semiconductor, Speeding Up the Differentially Private Learning Process 3.6 Times Google’s TPUv3

 

유민수교수 캡처

[Professor Rhu, Minsoo]

 

EE Professor Rhu and his research team have taken artificial intelligence semiconductors a big leap forward in the application of differentially private machine learning. Professor Rhu’s team analyzed the bottleneck component in the differentially private machine learning performance and devised a semiconductor chip greatly improving differentially private machine learning application performance.

Professor Rhu’s artificial intelligence chip consists of, among others, a cross-product-based arithmetic unit and an addition tree-based post-processing arithmetic unit and is capable of 3.6 times faster machine learning process compared with that of Google’s TPUv3, today’s most widely used AI processor.

The new chip also boasts comparable performance to that of NVIDIA’s A100 GPU, even with 10 times less resources.

 

연구진4명 캡처

[From left, Co-lead authors Park, Beomsik and Hwang, Ranggi; co-authors Yoon, Dongho and Choi, Yoonhyuk]

 

This work, with EE researchers Park, Beomsik and Hwang, Ranggi as co-first authors, will be presented as DiVa: An Accelerator for Differentially Private Machine Learning at the 55th IEEE/ACM International Symposium on Microarchitectures (MICRO 2022), the premier research venue for computer architecture research coming October 1 through 5 in Chicago, USA.

Professor Rhu’s achievements have been reported in multiple press coverage.

 

Links:

AI Times: http://www.aitimes.com/news/articleView.html?idxno=146435

Yonhap : https://www.yna.co.kr/view/AKR20201116072400063?input=1195m

Financial News : https://www.fnnews.com/news/202208212349474072

Donga Science : https://www.dongascience.com/news.php?idx=55893

Industry News : http://www.industrynews.co.kr/news/articleView.html?idxno=46829

Boan : https://www.boannews.com/media/view.asp?idx=108883&kind=
 

EE Prof. Myoungsoo Jung’s team develops the world’s first CXL2.0 based memory expanding platform

연구진 캡처

[Prof. Myoungsoo Jung, PHD candidate Donghyun Gouk, PHD candidate Miryeong Kwon, From left]
 
Our department’s Professor Myounsoo Jung’s research team has developed the world’s first CXL2.0 based freely scalable and direct accessible memory expanding platform DirectCXL.
 
The research team has demonstrated the large-size datacenter applicationon on the end-to-end memory expanding platform consisting CXL hardware prototype and operating system. Though a few of the memory vendors just showed a single memory device, it is the first to demonstrate the application on the full platform with operating system. Compared to conventional memory expanding system, DirectCXL shows 3x performance improvement in executing data center application and supports increasing the memory capacity greatly.
 
RDMA based memory expanding solution which is commonly used in data center can expand system’s memory by adding memory node which consist of CPU and memory. However, the RDMA solution degrades the performance and needs a substantial budget to add memory node with CPU. To address these problems, PCI express interface based new protocol called CXL which supports high performance and scalability has appeared, but memory vendors and academia fall on hard times in conducting the research into CXL.
 
To suggest the solution and cornerstone about CXL2.0 based memory expanding, Jung’s research team developed CXL memory device, host CXL processor and CXL network swith to expand system’s memory. They also developed Linux based CXL software module so that existing computer system can control these memory expanding platform. With our proposed DirectCXL, memory capacity can be scaled out freely without extra cost of computing resources. 
This work is expected to be utilized in a variety of ways, such as data centers and high-performance computing, as it can provide efficient memory expanding and high performance. 
The paper (Direct Access, High-Performance Memory Disaggregation with DirectCXL) was reported in July, 11th at ‘USENIX Annual Technical Conference, ATC, 2022’. 
 
In addition, the research was introduced to the UK top technology newspaper ‘The Next Platform’ with Microsoft and Meta(Facebook)(https://www.nextplatform.com/2022/07/18/kaist-shows-off-directcxl-disaggregated-memory-prototype/) and will be presented in August 2nd/3rd at CXL forum in Flash Memory Summit. 
 
More information about ‘DirectCXL’ can be found at CAMELab website (http://camelab.org/) and the video about accelerating the machine learning based recommendation model from Meta(Facebook) is available at CAMELab YouTube (https://youtu.be/jm8k-JM0qbM).
 
 
 
성과도1image01
 
 
 
[News Link]
 
Naver/ZDNet(지디넷): https://n.news.naver.com/mnews/article/092/0002264153?sid=105
etnews: https://www.etnews.com/20220801000168
Digital Times: 
 http://www.dt.co.kr/contents.html?article_no=2022080102109931650003&ref=naver
Financial News: https://www.fnnews.com/news/202208011051322708

EE Prof. Song Min Kim’s Team Awarded ACM MobiSys ’22 Best Paper Award for Enabling Massive Connectivity in IoT

연구진사진 2

[Professor Song Min Kim and first author Kang Min Bae, from left to right]

 

On the 28th, School of EE professor Song Min Kim’s research team has announced that they have succeeded in creating the world’s first mmWave backscatter system for massive IoT connectivity.

 

The research, (OmniScatter: extreme sensitivity mmWave backscattering using commodity FMCW radar), led by Kang Min Bae as first author, was presented at ACM MobiSys 2022 this June, and was presented with the best paper award. This is meaningful as it marks the second consecutive year in which the best paper award was presented to a paper belonging to a research group at KAIST’s School of Electrical Engineering.

 

The backscatter technology described by this research team can greatly reduce the maintenance cost as it operates on ultra-lower power of less than 10 μW, being able to run on a single battery for more than 40 years.

 

By enabling connectivity on a scale that far exceeds the network density required by next gen communication technologies such as 5G and 6G, this system may serve as a great potential for serving as a stepping stone for the upcoming hyperconnected era.

 

“mmWave backscattering is a dreamlike technology that can run IoT devices on a large scale, which can drive massive communications at ultra-low power compared to any other technology,” said Professor Song Min Kim. “We hope that this technology will be actively used for the upcoming era of Internet of Things,” he added.

 

The research was made possible by the funding from Samsung Future Technology Development Project and the Institute for Information & Communication Technology Planning & Evalution.

 

 

연구성과도 mage

[Fig 1. Tags used for massive IoT communications (as depicted by red triangles). Over 1100 tags are able to communicate simultaneously without any conflicts]

 

수상사진등 image

 

 

News Link:
https://www.etnews.com/20220728000090
http://vip.mk.co.kr/news/view/21/21/3550810.html
 

EE Prof.Sung-Ju Lee ‘s team announced the method for speeding up Federated Learning in ACM MobiSys 2022

KAIST EE Prof.Sung-Ju Lee ‘s team announced the method for speeding up Federated Learning in ACM MobiSys 2022
 
202207 ACM mobisys 연구진사진

[Prof. Sung-Ju Lee, Jaemin Shin (KAIST PhD Candidate),  Prof. Yunxin Liu of  Tsinghua University , Prof. Yuanchun Li  of  Tsinghua University, from left]

 

The research team led by Prof. Sung-Ju Lee of KAIST has published a paper “FedBalancer: Data and Pace Control for Efficient Federated Learning on Heterogeneous Clients” at ACM MobiSys (International Conference on Mobile Systems, Applications, and Services) 2022. Founded in 2003, MobiSys has been a premier conference on Mobile Computing and Systems. This year, 38 out of 176 submitted papers have been accepted to be presented at the conference. 

 

Jaemin Shin (KAIST PhD Candidate) was the lead author, and this work was in collaboration with Tsinghua University in China (Professors Yuanchun Li and Yunxin Liu participated).

 

Federated Learning is a recent machine learning paradigm proposed by Google that trains on a large corpus of private user data without collecting them. The authors developed a systematic federated learning framework that accelerates the global learning process of federated learning. The new framework actively measures the contribution of each training sample of clients and selects the optimal samples to optimize the training process. The authors also included an adaptive deadline control scheme with varying training data, and achieved 4.5 times speedup in global learning process without sacrificing the model accuracy.

 

Prof. Lee stated that “Federated learning is an important technology used by many top companies. With the accelerated training time achieved by this research, it has become even more attractive for real deployments. Moreover, our technology has shown to work well in different domains such as computer vision, natural language processing, and human activity recognition.”

 

This research was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT (MSIT) of Korea.
 
 
dataURItoBlob

EE Prof. Myoungsoo Jung’s team develops the world’s first nonvolatile computer maintaining execution states even power fails

연구진사진
[ Prof. Myounsoo Jung, KAIST Ph.D. Candidates (Miryeong Kwon, Sangwon Lee, and Gyuyoung Park), from left]
 
 
Our department’s Professor Myounsoo Jung’s research team has developed the world’s first nonvolatile computer maintaining execution states without power supply when power fails.
 
The research team has developed the ‘ Lightweight  Persistence- Centric system ( LightPC)’, which uses nonvolatile memory as a main memory, that can maintain all execution states regardless of power supply state. LightPC outperforms compared to conventional volatile computing systems by  4.3x while reducing power consumption by  73%, and providing  8 times larger memory capacity.
 
Nonvolatile memory is a type of computer memory that can retain stored information even after power is removed. It provides larger capacity and consumes lower power compared to DRAM which is volatile memory, but, nonvolatile memory provides lower write performance. Because of this shortcoming, existing nonvolatile memory, such as Intel’s Optane memory, is used with DRAM. However a computer with the existing nonvolatile memory system has a problem that data in DRAM needs to be transferred to nonvolatile memory or SSD in order to retain execution states.
 
To solve this problem, the research team developed processor and memory controller that raise nonvolatile memory’s performance and also developed OS for maintaining all execution states of the nonvolatile memory based computer. With our proposed techniques even if power is suddenly cut off, LightPC can restore the state before the power loss. The research team implemented the proposed LightPC on their FPGA-based system board prototype and verified the effectiveness of LightPC.
 
This work is expected to be utilized in a variety of ways, such as data centers and high-performance computing, as it can provide  large-capacity memory,  high performance,  low power and  service reliability.
 
The KAIST Ph.D. Candidates (Miryeong Kwon, Sangwon Lee, and Gyuyoung Park) participate in this research, and the paper (LightPC: Hardware and Software Co-Design for Energy-Efficient Full System Persistence) will be reported in June at ‘International Symposium on Computer Architecture, (ISCA) 2022’.
 
 
 
dataURItoBlob 1
 
 
The research was supported by the MEMRAY, Ministry of Science & ICT (MSIT), National Research Foundation of Korea (NRF), and Institute of Information & Communications Technology Planning & Evaluation (IITP).
 
More information on this paper can be found at CAMELab website (http://camelab.org) and YouTube (https://youtu.be/mlF7W_RmYRk). This result has been reported by domestic media as follow.
 
 
[Link]
https://news.kaist.ac.kr/newsen/html/news/?mode=V&mng_no=20111&skey=&sval=&list_s_date=&list_e_date=&GotoPage=1

Sample Selection for Fair and Robust Training Y. Roh, K. Lee, S. E. Whang, and C. Suh Accepted to 35th Annual Conference on Neural Information Processing Systems (NeurIPS), Dec. 2021. (Top Machine Learning conference)

Abstract

Fairness and robustness are critical elements of Trustworthy AI that need to be addressed together. Fairness is about learning an unbiased model while robustness is about learning from corrupted data, and it is known that addressing only one of them may have an adverse affect on the other. In this work, we propose a sample selection-based algorithm for fair and robust training. To this end, we formulate a combinatorial optimization problem for the unbiased selection of samples in the presence of data corruption. Observing that solving this optimization problem is strongly NP-hard, we propose a greedy algorithm that is efficient and effective in practice. Experiments show that our algorithm obtains fairness and robustness that are better than or comparable to the state-of-the-art technique, both on synthetic and benchmark real datasets. Moreover, unlike other fair and robust training baselines, our algorithm can be used by only modifying the sampling step in batch selection without changing the training algorithm or leveraging additional clean data.

 

황의종3

황의종4

Responsible AI Challenges in End-to-end Machine Learning S. E. Whang, K. Tae, Y. Roh, and G. Heo IEEE Data Engineering Bulletin, Mar. 2021.

abstract

Responsible AI is becoming critical as AI is widely used in our everyday lives. Many companies that deploy AI publicly state that when training a model, we not only need to improve its accuracy, but also need to guarantee that the model does not discriminate against users (fairness), is resilient to noisy or poisoned data (robustness), is explainable, and more. In addition, these objectives are not only relevant to model training, but to all steps of end-to-end machine learning, which include data collection, data cleaning and validation, model training, model evaluation, and model management and serving. Finally, responsible AI is conceptually challenging, and supporting all the objectives must be as easy as possible. We thus propose three key research directions towards this vision – depth, breadth, and usability – to measure progress and introduce our ongoing research. First, responsible AI must be deeply supported where multiple objectives like fairness and robust must be handled together. To this end, we propose FR-Train, a holistic framework for fair and robust model training in the presence of data bias and poisoning. Second, responsible AI must be broadly supported, preferably in all steps of machine learning. Currently we focus on the data pre-processing steps and propose Slice Tuner, a selective data acquisition framework for training fair and accurate models, and MLClean, a data cleaning framework that also improves fairness and robustness. Finally, responsible AI must be usable where the techniques must be easy to deploy and actionable. We propose FairBatch, a batch selection approach for fairness that is effective and simple to use, and Slice Finder, a model evaluation tool that automatically finds problematic slices. We believe we scratched the surface of responsible AI for end-to-end machine learning and suggest research challenges moving forward.

 

 

황의종1

황의종2

Woo-Joong Kim and Chan-Hyun Youn “Cooperative Scheduling Schemes for Explainable DNN Acceleration in Satellite Image Analysis and Retraining” to appear in IEEE Transaction on Parallel and Distributed System (2021)

Abstract

The deep learning-based satellite image analysis and retraining systems are getting emerging technologies to enhance the capability of the sophisticated analysis of terrestrial objects. In principle, to apply the explainable DNN model for the process of satellite image analysis and retraining, we consider a new acceleration scheduling mechanism. Especially, the conventional DNN acceleration schemes cause serious performance degradation due to computational complexity and costs in satellite image analysis and retraining. In this article, to overcome the performance degradation, we propose cooperative scheduling schemes for explainable DNN acceleration in analysis and retraining process. For the purpose of it, we define the latency and energy cost modeling to derive the optimized processing time and cost required for explainable DNN acceleration. Especially, we show a minimum processing cost considered in the proposed scheduling via layer-level management of the explainable DNN on FPGA-GPU acceleration system. In addition, we evaluate the performance using an adaptive unlabeled data selection scheme with confidence threshold and a semi-supervised learning driven data parallelism scheme in accelerating retraining process. The experimental results demonstrate that the proposed schemes reduce the energy cost of the conventional DNN acceleration systems by up to about 40% while guaranteeing the latency constraints.

윤찬현1

윤찬현2

Elastic Resource Sharing for Distributed Deep Learning

Changho Hwang, Taehyun Kim, Sunghyun Kim, Jinwoo Shin, and Kyoungsoo Park

In Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’21)

April 2021

 

Abstract

Resource allocation and scheduling strategies for deep learning training (DLT) jobs have a critical impact on their average job completion time (JCT). Unfortunately, traditional algorithms such as Shortest-Remaining-Time-First (SRTF) often perform poorly for DLT jobs. This is because blindly prioritizing only the short jobs is suboptimal and job-level resource preemption is too coarse-grained for effective mitigation of head-of-line blocking. We investigate the algorithms that accelerate DLT jobs. Our analysis finds that (1) resource efficiency often matters more than short job prioritization and (2) applying greedy algorithms to existing jobs inflates average JCT due to overly optimistic views toward future resource availability. Inspired by these findings, we propose Apathetic Future Share (AFS) that balances resource efficiency and short job prioritization while curbing unrealistic optimism in resource allocation. To bring the algorithmic benefits into practice, we also build CoDDL, a DLT system framework that transparently handles automatic job parallelization and efficiently performs frequent share re-adjustments. Our evaluation shows that AFS outperforms Themis, SRTF, and Tiresias-L in terms of average JCT by up to 2.2x, 2.7x, and 3.1x, respectively.

 

박경수3

박경수4

Accelerating GNN Training with Locality-Aware Partial Execution (Awarded Best Paper)

Taehyun Kim, Changho Hwang, Kyoungsoo Park, Zhiqi Lin, Peng Cheng, Youshan Miao, Lingxiao Ma, and Yongqiang Xiong

In Proceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys ’21)

August 2021

 

 

Abstract

Graph Neural Networks (GNNs) are increasingly popular for various prediction and recommendation tasks. Unfortunately, the graph datasets for practical GNN applications are often too large to fit into the memory of a single GPU, leading to frequent data loading from host memory to GPU. This data transfer overhead is highly detrimental to the performance, severely limiting the training throughput. In this paper, we propose locality-aware, partial code execution that significantly cuts down the data copy overhead for GNN training. The key idea is to exploit the “near-data” processors for the first few operations in each iteration, which reduces the data size for DMA operations. In addition, we employ task scheduling tailored to GNN training and apply load balancing between CPU and GPU. We find that our approach substantially improves the performance, achieving up to 6.6x speedup in training throughput over the state-of-the-art system design.

 

박경수1

박경수2