EE Prof.Sung-Ju Lee ‘s team announced the method for speeding up Federated Learning in ACM MobiSys 2022

KAIST EE Prof.Sung-Ju Lee ‘s team announced the method for speeding up Federated Learning in ACM MobiSys 2022
 
202207 ACM mobisys 연구진사진

[Prof. Sung-Ju Lee, Jaemin Shin (KAIST PhD Candidate),  Prof. Yunxin Liu of  Tsinghua University , Prof. Yuanchun Li  of  Tsinghua University, from left]

 

The research team led by Prof. Sung-Ju Lee of KAIST has published a paper “FedBalancer: Data and Pace Control for Efficient Federated Learning on Heterogeneous Clients” at ACM MobiSys (International Conference on Mobile Systems, Applications, and Services) 2022. Founded in 2003, MobiSys has been a premier conference on Mobile Computing and Systems. This year, 38 out of 176 submitted papers have been accepted to be presented at the conference. 

 

Jaemin Shin (KAIST PhD Candidate) was the lead author, and this work was in collaboration with Tsinghua University in China (Professors Yuanchun Li and Yunxin Liu participated).

 

Federated Learning is a recent machine learning paradigm proposed by Google that trains on a large corpus of private user data without collecting them. The authors developed a systematic federated learning framework that accelerates the global learning process of federated learning. The new framework actively measures the contribution of each training sample of clients and selects the optimal samples to optimize the training process. The authors also included an adaptive deadline control scheme with varying training data, and achieved 4.5 times speedup in global learning process without sacrificing the model accuracy.

 

Prof. Lee stated that “Federated learning is an important technology used by many top companies. With the accelerated training time achieved by this research, it has become even more attractive for real deployments. Moreover, our technology has shown to work well in different domains such as computer vision, natural language processing, and human activity recognition.”

 

This research was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT (MSIT) of Korea.
 
 
dataURItoBlob

EE Prof. Myoungsoo Jung’s team develops the world’s first nonvolatile computer maintaining execution states even power fails

[ Prof. Myounsoo Jung, KAIST Ph.D. Candidates (Miryeong Kwon, Sangwon Lee, and Gyuyoung Park), from left]
 
 
Our department’s Professor Myounsoo Jung’s research team has developed the world’s first nonvolatile computer maintaining execution states without power supply when power fails.
 
The research team has developed the ‘ Lightweight  Persistence- Centric system ( LightPC)’, which uses nonvolatile memory as a main memory, that can maintain all execution states regardless of power supply state. LightPC outperforms compared to conventional volatile computing systems by  4.3x while reducing power consumption by  73%, and providing  8 times larger memory capacity.
 
Nonvolatile memory is a type of computer memory that can retain stored information even after power is removed. It provides larger capacity and consumes lower power compared to DRAM which is volatile memory, but, nonvolatile memory provides lower write performance. Because of this shortcoming, existing nonvolatile memory, such as Intel’s Optane memory, is used with DRAM. However a computer with the existing nonvolatile memory system has a problem that data in DRAM needs to be transferred to nonvolatile memory or SSD in order to retain execution states.
 
To solve this problem, the research team developed processor and memory controller that raise nonvolatile memory’s performance and also developed OS for maintaining all execution states of the nonvolatile memory based computer. With our proposed techniques even if power is suddenly cut off, LightPC can restore the state before the power loss. The research team implemented the proposed LightPC on their FPGA-based system board prototype and verified the effectiveness of LightPC.
 
This work is expected to be utilized in a variety of ways, such as data centers and high-performance computing, as it can provide  large-capacity memory,  high performance,  low power and  service reliability.
 
The KAIST Ph.D. Candidates (Miryeong Kwon, Sangwon Lee, and Gyuyoung Park) participate in this research, and the paper (LightPC: Hardware and Software Co-Design for Energy-Efficient Full System Persistence) will be reported in June at ‘International Symposium on Computer Architecture, (ISCA) 2022’.
 
 
 
dataURItoBlob 1
 
 
The research was supported by the MEMRAY, Ministry of Science & ICT (MSIT), National Research Foundation of Korea (NRF), and Institute of Information & Communications Technology Planning & Evaluation (IITP).
 
More information on this paper can be found at CAMELab website (http://camelab.org) and YouTube (https://youtu.be/mlF7W_RmYRk). This result has been reported by domestic media as follow.
 
 
[Link]
https://news.kaist.ac.kr/newsen/html/news/?mode=V&mng_no=20111&skey=&sval=&list_s_date=&list_e_date=&GotoPage=1

Sample Selection for Fair and Robust Training Y. Roh, K. Lee, S. E. Whang, and C. Suh Accepted to 35th Annual Conference on Neural Information Processing Systems (NeurIPS), Dec. 2021. (Top Machine Learning conference)

Abstract

Fairness and robustness are critical elements of Trustworthy AI that need to be addressed together. Fairness is about learning an unbiased model while robustness is about learning from corrupted data, and it is known that addressing only one of them may have an adverse affect on the other. In this work, we propose a sample selection-based algorithm for fair and robust training. To this end, we formulate a combinatorial optimization problem for the unbiased selection of samples in the presence of data corruption. Observing that solving this optimization problem is strongly NP-hard, we propose a greedy algorithm that is efficient and effective in practice. Experiments show that our algorithm obtains fairness and robustness that are better than or comparable to the state-of-the-art technique, both on synthetic and benchmark real datasets. Moreover, unlike other fair and robust training baselines, our algorithm can be used by only modifying the sampling step in batch selection without changing the training algorithm or leveraging additional clean data.

 

3

4

Responsible AI Challenges in End-to-end Machine Learning S. E. Whang, K. Tae, Y. Roh, and G. Heo IEEE Data Engineering Bulletin, Mar. 2021.

abstract

Responsible AI is becoming critical as AI is widely used in our everyday lives. Many companies that deploy AI publicly state that when training a model, we not only need to improve its accuracy, but also need to guarantee that the model does not discriminate against users (fairness), is resilient to noisy or poisoned data (robustness), is explainable, and more. In addition, these objectives are not only relevant to model training, but to all steps of end-to-end machine learning, which include data collection, data cleaning and validation, model training, model evaluation, and model management and serving. Finally, responsible AI is conceptually challenging, and supporting all the objectives must be as easy as possible. We thus propose three key research directions towards this vision – depth, breadth, and usability – to measure progress and introduce our ongoing research. First, responsible AI must be deeply supported where multiple objectives like fairness and robust must be handled together. To this end, we propose FR-Train, a holistic framework for fair and robust model training in the presence of data bias and poisoning. Second, responsible AI must be broadly supported, preferably in all steps of machine learning. Currently we focus on the data pre-processing steps and propose Slice Tuner, a selective data acquisition framework for training fair and accurate models, and MLClean, a data cleaning framework that also improves fairness and robustness. Finally, responsible AI must be usable where the techniques must be easy to deploy and actionable. We propose FairBatch, a batch selection approach for fairness that is effective and simple to use, and Slice Finder, a model evaluation tool that automatically finds problematic slices. We believe we scratched the surface of responsible AI for end-to-end machine learning and suggest research challenges moving forward.

 

 

1

2

Woo-Joong Kim and Chan-Hyun Youn “Cooperative Scheduling Schemes for Explainable DNN Acceleration in Satellite Image Analysis and Retraining” to appear in IEEE Transaction on Parallel and Distributed System (2021)

Abstract

The deep learning-based satellite image analysis and retraining systems are getting emerging technologies to enhance the capability of the sophisticated analysis of terrestrial objects. In principle, to apply the explainable DNN model for the process of satellite image analysis and retraining, we consider a new acceleration scheduling mechanism. Especially, the conventional DNN acceleration schemes cause serious performance degradation due to computational complexity and costs in satellite image analysis and retraining. In this article, to overcome the performance degradation, we propose cooperative scheduling schemes for explainable DNN acceleration in analysis and retraining process. For the purpose of it, we define the latency and energy cost modeling to derive the optimized processing time and cost required for explainable DNN acceleration. Especially, we show a minimum processing cost considered in the proposed scheduling via layer-level management of the explainable DNN on FPGA-GPU acceleration system. In addition, we evaluate the performance using an adaptive unlabeled data selection scheme with confidence threshold and a semi-supervised learning driven data parallelism scheme in accelerating retraining process. The experimental results demonstrate that the proposed schemes reduce the energy cost of the conventional DNN acceleration systems by up to about 40% while guaranteeing the latency constraints.

1

2

Elastic Resource Sharing for Distributed Deep Learning

Changho Hwang, Taehyun Kim, Sunghyun Kim, Jinwoo Shin, and Kyoungsoo Park

In Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’21)

April 2021

 

Abstract

Resource allocation and scheduling strategies for deep learning training (DLT) jobs have a critical impact on their average job completion time (JCT). Unfortunately, traditional algorithms such as Shortest-Remaining-Time-First (SRTF) often perform poorly for DLT jobs. This is because blindly prioritizing only the short jobs is suboptimal and job-level resource preemption is too coarse-grained for effective mitigation of head-of-line blocking. We investigate the algorithms that accelerate DLT jobs. Our analysis finds that (1) resource efficiency often matters more than short job prioritization and (2) applying greedy algorithms to existing jobs inflates average JCT due to overly optimistic views toward future resource availability. Inspired by these findings, we propose Apathetic Future Share (AFS) that balances resource efficiency and short job prioritization while curbing unrealistic optimism in resource allocation. To bring the algorithmic benefits into practice, we also build CoDDL, a DLT system framework that transparently handles automatic job parallelization and efficiently performs frequent share re-adjustments. Our evaluation shows that AFS outperforms Themis, SRTF, and Tiresias-L in terms of average JCT by up to 2.2x, 2.7x, and 3.1x, respectively.

 

3

4

Accelerating GNN Training with Locality-Aware Partial Execution (Awarded Best Paper)

Taehyun Kim, Changho Hwang, Kyoungsoo Park, Zhiqi Lin, Peng Cheng, Youshan Miao, Lingxiao Ma, and Yongqiang Xiong

In Proceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys ’21)

August 2021

 

 

Abstract

Graph Neural Networks (GNNs) are increasingly popular for various prediction and recommendation tasks. Unfortunately, the graph datasets for practical GNN applications are often too large to fit into the memory of a single GPU, leading to frequent data loading from host memory to GPU. This data transfer overhead is highly detrimental to the performance, severely limiting the training throughput. In this paper, we propose locality-aware, partial code execution that significantly cuts down the data copy overhead for GNN training. The key idea is to exploit the “near-data” processors for the first few operations in each iteration, which reduces the data size for DMA operations. In addition, we employ task scheduling tailored to GNN training and apply load balancing between CPU and GPU. We find that our approach substantially improves the performance, achieving up to 6.6x speedup in training throughput over the state-of-the-art system design.

 

1

2

Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models

Ki Hyun Tae, a Ph.D. student of Prof. Steven Euijong Whang in the EE department, proposed a selective data acquisition framework for accurate and fair machine learning models.

As machine learning becomes widespread in our everyday lives, making AI more responsible is becoming critical. Beyond high accuracy of AI, the key objectives of responsible AI include fairness, robustness, explainability, and more. In particular, companies including Google, Microsoft, and IBM are emphasizing responsible AI.

Among the objectives, this work focuses on model fairness. Based on the key insight that the root cause of unfairness is in biased training data, Ki Hyun proposed Slice Tuner, a selective data acquisition framework that optimizes both model accuracy and fairness. Slice Tuner efficiently and reliably manages learning curves, which are used to estimate model accuracy given more data, and utilizes them to provide the best data acquisition strategy for training an accurate and fair model.

The research team believes that Slice Tuner is an important first step towards realizing responsible AI starting from data collection. This work was presented at ACM SIGMOD (International Conference on Management of Data) 2021, a top Database conference.

For more details, please refer to the links below.

 

3 0

Figure 1. Slice Tuner architecture

https://arxiv.org/abs/2003.04549

https://docs.google.com/presentation/d/1thnn2rEvTtcCbJc8s3TnHQ2IEDBsZOe66-o-u4Wb3y8/edit?usp=sharing

https://youtu.be/QYEhURcd4u4?list=PL3xUNnH4TdbsfndCMn02BqAAgGB0z7cwq

 

Professor Steven Euijong Whang and Professor Changho Suh’s Research Team Develops a New Batch Selection Technique for Fair AI

Professors Steven Euijong Whang and Changho Suh’s research team in the School of Electrical Engineering has developed a new batch selection technique for fair artificial intelligence (AI) systems. The research was led by Ph.D. student Yuji Roh (advisor: Steven Euijong Whang) and was conducted in collaboration with Professor Kangwook Lee from the Department of Electrical and Computer Engineering at the University of Wisconsin-Madison.

 

AI technologies are now widespread and influence everyday lives of humans. Unfortunately, researchers have recently observed that machine learning models may discriminate against specific demographics or individuals. As a result, there is a growing social consensus that AI systems need to be fair.

 

The research team proposes FairBatch, a new batch selection technique for building fair machine learning models. Existing fair training algorithms require significant non-trivial modifications either in the training data or model architecture. In contrast, FairBatch effectively achieves high accuracy and fairness with only a single-line change of code in the batch selection, which enables FairBatch to be easily deployed in various applications. FairBatch’s key approach is solving a bi-level optimization for jointly achieving accuracy and fairness.

 

This research was presented at the International Conference for Learning Representations (ICLR) 2021, a top machine learning conference. More details are in the links below.

 

1 0

Figure 1. A scenario that shows how FairBatch adaptively adjusts batch ratios in model training for fairness.

2 0

Figure 2.  PyTorch code for model training where FairBatch is used for batch selection. Only a single-line code change is required to replace an existing sampler with FairBatch, marked in blue.

 

Title: FairBatch: Batch Selection for Model Fairness

Authors: Yuji Roh (KAIST EE), Kangwook Lee (Wisconsin-Madison Electrical & Computer Engineering), Steven Euijong Whang (KAIST EE), and Changho Suh (KAIST EE)

 

Paper: https://openreview.net/forum?id=YNnpaAKeCfx

Source code: https://github.com/yuji-roh/fairbatch

Slides: https://docs.google.com/presentation/d/1IfaYovisZUYxyofhdrgTYzHGXIwixK9EyoAsoE1YX-w/edit?usp=sharing

Machine Learning Augmented Reliable Flash Memory and Storage Systems

Ministry of Science and ICT (IITP) awards CAMEL for AI-augmented Flash-based storage for self-driving car research. Specifically, CAMEL will perform the research of machine-learning algorithms that recover all runtime and device faults observed in automobiles. As the reliability of storage devices has a significant impact on self-driving automobiles, self-governing algorithms and fault-tolerant hardware architecture are significantly important. Prof. Jung as a single PI will be supported by around $1.6 million USD to develop lightweight machine-learning algorithms, hardware automation technology, and computer architecture for reliable storage and self-driving automobiles.

4
5