EE Professor Dongsu Han’s Research Team Develops Technology to Accelerate AI Model Training in Distributed Environments Using Consumer-Grade GPUs

EE Professor Dongsu Han’s Research Team Develops Technology to Accelerate AI Model Training in Distributed Environments Using Consumer-Grade GPUs

캡처 2024 09 02 211619

<(from left) Professor Dongsu Han, Dr. Hwijoon Iim, Ph.D. Candidate Juncheol Ye>

 

Professor Dongsu Han’s research team of the KAIST Department of Electrical Engineering has developed a groundbreaking technology that accelerates AI model training in distributed environments with limited network bandwidth using consumer-grade GPUs.

 

Training the latest AI models typically requires expensive infrastructure, such as high-performance GPUs costing tens of millions in won and high-speed dedicated networks.

As a result, most researchers in academia and small to medium-sized enterprises have to rely on cheaper, consumer-grade GPUs for model training.

However, they face difficulties in efficient model training due to network bandwidth limitations.

 

Inline image 2024 09 02 14.59.01.205

<Figure 1. Problems in Conventional Low-Cost Distributed Deep Learning Environments>

 

To address these issues, Professor Han’s team developed a distributed learning framework called StellaTrain.

StellaTrain accelerates model training on low-cost GPUs by integrating a pipeline that utilizes both CPUs and GPUs. It dynamically adjusts batch sizes and compression rates according to the network environment, enabling fast model training in multi-cluster and multi-node environments without the need for high-speed dedicated networks.

 

StellaTrain adopts a strategy that offloads gradient compression and optimization processes to the CPU to maximize GPU utilization by optimizing the learning pipeline. The team developed and applied a new sparse optimization technique and cache-aware gradient compression technology that work efficiently on CPUs.

 

This implementation creates a seamless learning pipeline where CPU tasks overlap with GPU computations. Furthermore, dynamic optimization technology adjusts batch sizes and compression rates in real-time according to network conditions, achieving high GPU utilization even in limited network environments.

 

Inline image 2024 09 02 14.59.01.206

<Figure 2. Overview of the StellaTrain Learning Pipeline>

 

Through these innovations, StellaTrain significantly improves the speed of distributed model training in low-cost multi-cloud environments, achieving up to 104 times performance improvement compared to the existing PyTorch DDP.

 

Professor Han’s research team has paved the way for efficient AI model training without the need for expensive data center-grade GPUs and high-speed networks. This breakthrough is expected to greatly aid AI research and development in resource-constrained environments, such as academia and small to medium-sized enterprises.

 

Professor Han emphasized, “KAIST is demonstrating leadership in the AI systems field in South Korea.” He added, “We will continue active research to implement large-scale language model (LLM) training, previously considered the domain of major IT companies, in more affordable computing environments. We hope this research will serve as a critical stepping stone toward that goal.”

 

The research team included Dr. Hwijoon Iim and Ph.D. candidate Juncheol Ye from KAIST, as well as Professor Sangeetha Abdu Jyothi from UC Irvine. The findings were presented at ACM SIGCOMM 2024, the premier international conference in the field of computer networking, held from August 4 to 8 in Sydney, Australia (Paper title: Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs). 

 

Meanwhile, Professor Han’s team has also made continuous research advancements in the AI systems field, presenting a framework called ES-MoE, which accelerates Mixture of Experts (MoE) model training, at ICML 2024 in Vienna, Austria.

 

By overcoming GPU memory limitations, they significantly enhanced the scalability and efficiency of large-scale MoE model training, enabling fine-tuning of a 15-billion parameter language model using only four GPUs. This achievement opens up the possibility of effectively training large-scale AI models with limited computing resources.

 

Inline image 2024 09 02 14.59.01.207 1

<Figure 3. Overview of the ES-MoE Framework>

 

Inline image 2024 09 02 14.59.01.207

<Figure 4. Professor Dongsu Han’s research team has enabled AI model training in low-cost computing environments, even with limited or no high-performance GPUs, through their research on StellaTrain and ES-MoE.>

 

 

 

 

 

Professor Sung-Ju Lee’s research team develops a smarthphone AI system that diagnoses mental health based on user’s voice and text input

Professor Sung-Ju Lee’s research team develops a smarthphone AI system that diagnoses mental health based on user’s voice and text input

 

영문 6583f9170dbd4

 

A research team led by Professor Sung-Ju Lee of the Department of Electrical and Electronic Engineering has developed an artificial intelligence (AI) technology that automatically analyzes users’ language usage patterns on smartphones without personal information leakage, thereby monitoring users’ mental health status.
 
This technology allows smartphones to analyze and diagnose a user’s mental health state simply by carrying and using the phone in everyday life.
 
The research team focused on the fact that clinical diagnosis of mental disorders is often done through language use analysis during patient consultations.
 
The new technology uses (1) keyboard input content such as text messages written by the user and (2) voice data collected in real-time from the smartphone’s microphone for mental health diagnosis.  
This language data, which may contain sensitive user information, has previously been challenging to utilize.
 
The solution to this issue in this technology involves the application of federated learning AI, which trains the AI model without data leakage outside the user’s device, thus eliminating privacy invasion concerns. 
 
The AI model is trained on datasets based on everyday conversation content and the speaker’s mental health. It analyzes the conversations into the smartphone in real-time and predicts the user’s mental health scale based on the learned content. 
 
Furthermore, the research team developed a methodology to effectively diagnose mental health from the large amount of user language data provided on smartphones.
 
Recognizing that users’ language usage patterns vary in different real-life situations, they designed the AI model to focus on relatively important language data based on the current situation indicated by the smartphone.
For example, the AI model may prioritize analyzing conversations with family or friends in the evening over work-related discussions, as they may provide more clues for monitoring mental health.
 
This research was conducted in collaboration with Jaemin Shin(CS), HyungJun Yoon (EE Ph.d course) , Seungjoo Lee (EE master’s course), Sung-Joon Park, CEO of Softly AI (KAIST alumnus), Professor Yunxin Liu of Tsinghua University in China, and Professor Jin-Ho Choi of Emory University in the USA. 
 
The paper, titled “FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning,” was presented at the EMNLP (Conference on Empirical Methods in Natural Language Processing), the most prestigious conference in the field of natural language processing, held in Singapore from December 6th to 10th.
 
Professor Sung-Ju Lee commented, “This research is significant as it is a collaboration of experts in mobile sensing, natural language processing, artificial intelligence, and psychology. It enables early diagnosis of mental health conditions through smartphone use without worries of personal information leakage or privacy invasion. We hope this research can be commercialized and benefit society.” 
 
This research was funded by the government (Ministry of Science and ICT) and supported by the Institute for Information & Communications Technology Planning & Evaluation (No. 2022-0-00495, Development of Voice Phishing Detection and Prevention Technology in Mobile Terminals, No. 2022-0-00064, Development of Human Digital Twin Technology for Predicting and Managing Mental Health Risks of Emotional Laborers).
 
 
 

Inline image 2023 12 19 16.28.15.531

<picture 1. A smartphone displaying an app interface for mental health diagnosis. The app shows visualizations of user’s voice and keyboard input analysis, with federated learning technology>
 
 
Inline image 2023 12 19 16.28.42.148
 
<picture 2. A schematic diagram of the mental health diagnosis technology using federated learning, based on user voice and keyboard input on a smartphone>
 
 
 
 
 

CAMEL research team has been successively selected for 2023 Samsung Future Technology Development Program

CAMEL research team has been successively selected for 2023 Samsung Future Technology Development Program

 

Inline image 2023 10 11 16.46.27.000

 

The CAMEL research team from our department, led by Professor Myoungsoo Jung, has been chosen to participate in the Samsung Future Technology Development Program.

 

This recognition comes with support for their study titled, “Software-Hardware Co-Design for Dynamic Acceleration in Sparsity-aware Hyperscale AI.”

 

Large AI models, such as mixture of experts (MoE), autoencoders, and multimodal learning, grouped under the umbrella of Hyperscale AI, have gained traction due to the success of expansive model-driven applications, including ChatGPT.

Based on the insight that computational traits of these models often shift during training, the research team has suggested acceleration strategies.

These encompass software technologies, unique algorithms, and hardware accelerator layouts.

A key discovery by the team was the inability of existing training systems to account for variations in data sparsity and computational dynamics between model layers. This oversight obstructs adaptive acceleration. 

 

To address this, the CAMEL team introduced a dynamic acceleration method that can detect shifts in computational traits and adapt computation techniques in real-time.

The findings from this research could benefit not only Hyperscale AI but also the larger domain of deep learning and the burgeoning services sector.

The team’s goals include producing tangible hardware models and offering open-source software platforms.

 

Samsung Electronics, since 2013, has initiated the ‘Future Technology Development Program’, investing KRW 1.5 trillion to stimulate technological innovation pivotal for future societal progress.

For a decade, they have backed initiatives in foundational science, innovative materials, and ICT, particularly favoring ventures that are high-risk but offer significant returns.

The CAMEL team has been collaborating with Samsung since 2021 on a project focusing on accelerating Graph Neural Networks (GNNs). We extend our hearty congratulations to them as they embark on this fresh exploration into the realm of Hyperscale AI.

 

Inline image 2023 10 11 16.51.09.000

EE Prof. Myoungsoo Jung’s research team develops the world’s first AI semiconductor for search engines based on CXL 3.0.

Our department’s Professor Myounsoo Jung’s research team has developed the world’s first AI semiconductor for search engines based on CXL 3.0.

 

Approximate nearest neighbor search (ANNS) is widely used in commercial services such as image search, database, and recommendation systems.

However, in production-level ANNS, there is a challenge of requiring a large amount of memory due to the extensive dataset.

To address this memory pressure issue, modern ANNS techniques leverage lossy compression methods or employ persistent storage for their memory expansion.

However, these approaches often suffer from low accuracy and performance.

 

The research team proposed expanding memory capacity via compute express link (CXL), which is PCIe based open-industry interconnect technology that allows the underlying working memory to be highly scalable and composable at a low cost.

Furthermore, the use of a CXL switch enables connecting multiple memory expanders to a single port, providing greater scalability. However, memory expansion through CXL has the disadvantage of increased memory access time compared to local memory.

 

The research team has developed an AI semiconductor, ‘CXL-ANNS‘, which leverages CXL switch and memory expanders to accommodate high memory pressure that comes from extensive datasets without losing accuracy or performance.

Additionally, by using near data processing and data partitioning based on locality, the performance of CXL-ANNS is improved.

They also compared prototyped CXL-ANNS with the existing solutions for ANNS. Compared to previous research, CXL-ANNS shows 111 times higher performance. Particularly, 92 times higher performance can be achieved compared to Microsoft’s solution that is used in commercial service.

 

This research, along with the paper titled “CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search”, will be presented in July at ‘USENIX Annual Technical Conference, ATC, 2023’.

 

2

 

image01

 

image02

 

The research was supported by Panmnesia (http://panmnesia.com). More information on this paper can be found at CAMELab website (http://camelab.org).

 

[News Link]

The Korea Economic Daily: https://www.hankyung.com/it/article/202305259204i

The Herald Business: http://news.heraldcorp.com/view.php?ud=20230525000225

ChosunBiz: https://biz.chosun.com/science-chosun/technology/2023/05/25/4UW5LPX3WVARVIS3QBBICPINFM/

etnews: https://www.etnews.com/20230525000092

EE Prof. Minsoo Rhu and Prof. Min Seok Jang, “Young Leaders to Lead the Development of Science and Technology” National Academy of Science and Technology, elected members of ‘2023 Y-KAIST’.

Professor Minsoo Rhu and Professor Minseok Jang of the electrical engineering department have been elected as members of the ‘2023 Y-KAST’ of the Korean Academy of Science and Technology (hereinafter ‘Hallymwon’).
 
Y-KAST members are researchers with outstanding academic achievements among young scientists under the age of 43, and Hallymwon prioritizes the achievements made as independent researchers in Korea after receiving a doctorate degree, and fosters next-generation science and technology leaders who are highly likely to contribute to the development of science and technology in Korea.
 
On December 13, 2022 at 4:00 PM, ‘2022 Y-KAST Members Day’ will be held both online and offline, and Hallymwon plans to present membership plaques to new Y-KAST members and introduce research achievements.
 
The head of Hallymwon said “Hallymwon wants to build an environment in which young scientists can fully demonstrate their skills and grow as leaders in the future science and technology field, and we will support them to present new ideas for R&D innovation.”
 
 
dataURItoBlob 3 
[Prof. Minsoo Rhu]    [ Prof. Min Seok Jang]
 
Professor Minsu Rhu’s research achievements: Development of intelligent semiconductors and computer systems for artificial intelligence
 
Professor Min Seok Jang’s research achievements: 
Pioneering the border between science and engineering in the field of nano optics and metamaterials and solving important problems one after another in the research of two-dimensional material-based active optical devices, leading the field
 
 
Link: https://m.ajunews.com/view/20221212094151237
 
 

EE Prof. Hyuncheol Shim’s team won 1st place in 5th Army Tiger DroneBot Mission Challenge

42.심현철

[Prof. David Hyunchul Shim]

 

Professor David Hyunchul Shim’s team (PhD Student Boseong Kim, M.S. Student Jaeyong Park)  won 1st place in the indoor reconnaissance drone section of the 5th Army Tiger DroneBot Mission Challenge (held on Aug. 31) hosted by the Army Headquarters and the team deserved 10 million won prize money.
 
The awards ceremony was held on Oct. 4 at the Republic of Korea Army Training and Doctrine Command (ROKA TRADOC).
The teams are required to fly from the parking lot outside of the building, enter the building through a window on the second floor, and explore the inside of the building autonomously. The drone needed to find hidden objects, send the results to the ground station in real time, and come back to the home position after completing the missions.
Professor Hyunchul Shim’s research team performed all the missions flawlessly using various algorithms and techniques, such as in-house 3D LiDAR-based localization (SLAM), 3D obstacle avoidance path planning, onboard real-time object detection, and autonomous exploration algorithm in the unknown area.
Among eight participants (four teams withdrew) Professor Shim’s team was the only team that performed a completely autonomous flight from takeoff to return, showing an overwhelming ability to perform such complex missions which difficult for human pilots.
The indoor autonomous flight algorithm developed by the team is the key technology for indoor reconnaissance drones to be used in future battlefield and disaster situations. Once again, this competition showed KAIST’s autonomous flight drone technology capabilities.
 
Video data : https://youtu.be/SXe_FJpxv94
 
수상후사진 IMG 1267

Prof. Sung-Ju Lee and Prof. Jinwoo Shin developed an new AI technology and present upcoming NeurIPS 2022

연구진사진 캡처 3

[Prof. Sung-Ju Lee, Prof. Jinwoo Shin, Taesik Gong, Jongheon Jeong, Yewon Kim ,Taewon Kim, from left]
 
A research team led by Professor Sung-Ju Lee of the School of Electrical and Electronic Engineering and Professor Jinwoo Shin of the Graduate School of AI developed a test-time adaptation artificial intelligence technology that adapts itself to environmental changes. 
The algorithm proposed by the research team showed an average improvement of 11% in accuracy compared to the existing best performing algorithm.
 
Titled “NOTE: Robust Continual Test-time Adaptation Against Temporal Correlation,” this study will be presented in December at ‘NeurIPS (NeurIPS) 2022’, one of the most prestigious international conferences in the field of artificial intelligence.
Dr. Taeshik Gong led the research as the first author, and Jongheon Jeong, Taewon Kim, and Yewon Kim contributed as co-authors.
 
Professor Sung-Ju Lee and Professor Jinwoo Shin said, “Test time domain adaptation is a technology that allows artificial intelligence to adapt itself to changes in the environment and improve its performance, and its uses are limitless. The NOTE technology to be announced is the first technology to show performance improvement in actual data distribution, and is expected to be applicable to various fields such as autonomous driving, artificial intelligence medical care, and mobile health care.”
 
dataURItoBlob 2
 
This research was conducted at the Korea Advanced Institute of Science and Technology’s Future Defense Artificial Intelligence Specialized Research Center (UD190031RD) with support from the National Research Foundation (No. NRF-2020R1A2C1004062).
 

KAIST EE PhD candidate Yuji Roh (advisor: Prof. Steven Euijong Whang), won 2022 Microsoft Research PhD Fellowship

KAIST PhD candidate Yuji Roh from the School of Electrical Engineering (advisor: Prof. Steven Euijong Whang) was selected as a recipient of the 2022 Microsoft Research PhD Fellowship. 

dataURItoBlob

[Yuji Roh]

 

The Microsoft Research PhD Fellowship is a scholarship program that recognizes outstanding graduate students for their exceptional and innovative research in areas relevant to computer science and related fields.
 
This year, 36 people from around the world received the fellowship, and Yuji Roh from KAIST EE is the only recipient from universities in Korea. Each selected fellow will receive a $10,000 scholarship and an opportunity to intern at Microsoft under the guidance of an experienced researcher.
 
Yuji Roh was named a fellow in the field of “Machine Learning” for her outstanding achievements in Trustworthy AI.
Her research highlights include designing a state-of-the-art fair training framework using batch selection and developing novel algorithms for both fair and robust training.
Her works have been presented at the top machine learning conferences ICML, ICLR, and NeurIPS among others.
 
She also co-presented a tutorial on Trustworthy AI at the top data mining conference ACM SIGKDD. She is currently interning at the NVIDIA Research AI Algorithms Group developing large-scale real-world fair AI frameworks. 
 
The list of fellowship recipients and the interview videos are displayed on the Microsoft webpage and Youtube.
 

The list of recipients: https://www.microsoft.com/en-us/research/academic-program/phd-fellowship/2022-recipients/

Interview (Global): https://www.youtube.com/watch?v=T4Q-XwOOoJc

Interview (Asia): https://www.youtube.com/watch?v=qwq3R1XU8UE

 

 

dataURItoBlob 1

[Research achievements of Yuji Roh: Fair batch selection framework (left) and fair and robust training framework (right)]

EE Prof. Minsoo Rhu is induced into IEEE/ACM Micro Hall of Fame

유민수교수 캡처

[Prof. Minsoo Rhu]

 
 
KAIST EE professor Minsoo Rhu was inducted into Institute of Electrical and Electronics Engineers / Association for Computing Machinery (MICRO) Hall of Fame this year. 
 
Celebrating its 55th anniversary in 2022, MICRO has been recognized as not only the oldest international conference in the field of computer architectures, but also as one of the most prestigious along with ISCA and HPCA, 
 
Prof. Minsoo Rhu, one of the best Korean experts in AI semiconductor and GPU-based high performing computing systems, was inducted into HPCA Hall of Fame in 2021 and published a total of 8 papers in MICRO conference this year, thereby establishing himself as a member of MICRO Hall of Fame.
 
hof

[Award picture of MICRO Hall of Fame]

 

 

Related links:

MICRO: https://www.microarch.org/micro55

MICRO Hall of Fame: https://www.sigmicro.org/awards/microhof.php

Prof. Myoungsoo Jung’s team, awarded KAIST-Samsung Electronics Cooperation Best Paper Award for PLM SSD based hardware and software co-designed framework for LSM KV store

캡처 2

[Prof. Myoungsoo Jung, Miryeong Kwon, Seungjun Lee, and Hyunkyu Cho from left]

 

Our department’s Professor Myoungsoo Jung’s research team has developed the world’s first Predictable Latency Mode (PLM) SSD based hardware and software co-designed framework for Log-Structured Merge Key-Value Stores (LSM KV store).

 

The research team has developed the ‘hardware and software co-designed framework for LSM KV store, Vigil-KV’ that eliminates long-tail latency by utilizing the Predictable Latency Mode (PLM) interface, which provides constant read latency, to the actual datacenter-scale SSD. Vigil-KV outpoerforms 3.19x faster tail latency and 34% faster average latency compared to the existing LSM KV store.

 

LSM KV store, a kind of database, is used to manage various application data, and it must process the user requests within the requirement time in order not to degrade the user experience. To this end, Vigil-KV enables a predictable latency mode (PLM) interface on an actual datacenter-scale NVMe SSD (PLM SSD), which guarantees constant read latency in deterministic mode related to read service without performing SSD’s internal tasks.

 

Specifically, Vigil-KV hardware makes the deterministic mode SSDs exist in the system to remove SSD’s internal tasks by configuring PLM SSD RAID. In addition, Vigil-KV software prevents the deterministic mode from being released by LSM KV store’s internal tasks, scheduling LSM KV store operations (e.x., compaction/flush operations) and client requests.

 

Among the proposed research results, especially noteworthy is that Vigil-KV is the first work that implements the PLM interface in a real SSD and makes the read latency of LSM KV store deterministic in a hardware-software co-design manner. They prototype Vigil-KV hardware on a 1.92TB datacenter-scale NVMe SSD while implementing Vigil-KV software using Linux 4.19.91 and RocksDB 6.23.0.

 

The KAIST Ph.D. Candidates (Miryeong Kwon, Seungjun Lee, and Hyunkyu Choi) participate in this research, and the paper (Vigil-KV: Hardware-Software Co-Design to Integrate Strong Latency Determinism into Log-Structured Merge Key-Value Stores) was reported in July, 11th at ‘USENIX Annual Technical Conference, ATC, 2022’. In addition, they has won the Best Paper Award from Samsung for this paper (Vigil-KV) with Professor Jae-Hyeok Choi’s research team.

 

The Best Paper Award from Samsung recognizes master’s and doctorate students that participated in research grant projects and published papers related to the project among papers adopted by foreign journals/conferences since September 21st. This year’s awards consisted of grand award (2 people), excellence award (1 person), and encouragement award (2 people).

The research was supported by Samsung. More information on this paper can be found at http://camelab.org.

 

noname01

 

noname02

 

noname022

 

noname03

 

noname04