From Block to Byte: Transforming PCIe SSDs with CXL Memory Protocol and Instruction Annotation (Prof. Jung, Myoungsoo’s Lab)

Abstract

Cache coherence interconnects have recently emerged to integrate CPUs, accelerators, and memory components into a unified, heterogeneous computing domain. These interconnect technologies ensure data coherency between CPU memory and device-attached private memory, creating a new paradigm of globally shared memory and network space. Among several efforts to establish such connectivity, including Gen-Z [1] and Cache coherent interconnect for accelerators (CCIX) [2], Compute Express Link (CXL) has become the first open interconnect protocol capable of supporting diverse processors and device endpoints. With the absorption of Gen-Z, CXL stands out as a promising interconnect interface due to its highspeed coherence control and seamless compatibility with the widely adopted PCIe standard. This makes it particularly advantageous for a wide range of datacenter-scale hardware, including CPUs, GPUs, FPGAs, and domain-specific ASICs. Furthermore, the CXL consortium has highlighted its potential for memory disaggregation, enabling pooling of DRAM and byte-addressable persistent memory.

 

Main Figure

교수님

 

Private Yet Social: How LLM Chatbots Support and Challenge Eating Disorder Recovery(Prof. Sung-Ju Lee’s Lab)

Abstract

Eating disorders (ED) are complex mental health conditions that require long-term management and support. Recent advancements in large language model (LLM)-based chatbots offer the potential to assist individuals in receiving immediate support. Yet, concerns remain about their reliability and safety in sensitive contexts such as ED. We explore the opportunities and potential harms of using LLM-based chatbots for ED recovery. We observe the interactions between 26 participants with ED and an LLM-based chatbot, WellnessBot, designed to support ED recovery, over 10 days. We discovered that our participants have felt empowered in recovery by discussing ED-related stories with the chatbot, which served as a personal yet social avenue. However, we also identified harmful chatbot responses, especially concerning individuals with ED, that went unnoticed partly due to participants’ unquestioning trust in the chatbot’s reliability. Based on these findings, we provide design implications for safe and effective LLM-based interventions in ED management.

 

Main Figure

Efficient Disaggregated Cloud Storage for Cold Videos with Neural Enhancement (Prof. Han, Dongsu’s Lab)

Abstract

The rapid growth of video-sharing platforms has driven immense storage demands, with disaggregated cloud storage emerging as a scalable and reliable solution. However, the proportional cost of cloud storage relative to capacity and duration limits the cost-efficiency for managing large-scale video data. This is particularly critical for cold videos, which constitute the majority of video data but are accessed infrequently. To address this challenge, this paper proposes Neural Cloud Storage (NCS), leveraging content-aware super-resolution (SR) powered by deep neural networks. By reducing the resolution of cold videos, NCS decreases file sizes while preserving perceptual quality. optimizing the cost trade-offs in multi-tiered disaggregated storage. This approach extends the cost-efficiency benefits to a greater range of cold videos and achieves up to a 21.2% reduction in total cost of ownership (TCO), providing a scalable, cost-effective solution for video storage.

 

Main Figure

Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in Large Language Models (Prof. Steven Euijong Whang’s Lab)

Abstract

Facts evolve over time, making it essential for Large Language Models (LLMs) to handle time-sensitive factual knowledge accurately and reliably. While factual Time-Sensitive Question-Answering (TSQA) tasks have been widely studied, existing benchmarks often rely on manual curation or a small, fixed set of predefined templates, which restricts scalable and comprehensive TSQA evaluation. To address these challenges, we propose TDBench, a new benchmark that systematically constructs TSQA pairs by harnessing temporal databases and database techniques such as temporal SQL and functional dependencies. We also introduce a fine-grained evaluation metric called time accuracy, which assesses the validity of time references in model explanations alongside traditional answer accuracy to enable a more reliable TSQA evaluation. Extensive experiments on contemporary LLMs show how \ours{} enables scalable and comprehensive TSQA evaluation while reducing the reliance on human labor, complementing existing Wikipedia/Wikidata-based TSQA evaluation approaches by enabling LLM evaluation on application-specific data and seamless multi-hop question generation. 

 

Main Figure

교수님

A Characterization of Generative Recommendation Models: Study of Hierarchical Sequential Transduction Unit (Prof. Rhu, Minsoo’s Lab)

Abstract

Recommendation systems are crucial for personalizing userexperiences on online platforms. While Deep Learning Recommendation Models (DLRMs) have been the state-of-the-art for nearly a decade, their scalability is limited, as model quality scales poorly with compute. Recently, there have been research efforts applying Transformer architecture to recommendation systems, and Hierarchical Sequential Transaction Unit (HSTU), an encoder architecture, has been proposed to address scalability challenges. Although HSTU-based generative recommenders show significant potential, they have received little attention from computer architects. In this paper, we analyze the inference process of HSTU-based generative recommenders and perform an in-depth characterization of the model. Our findings indicate the attention mechanism is a major performance bottleneck. We further discuss promising research directions and optimization strategies that can potentially enhance the efficiency of HSTU models.

 

Main Figure

 

Amuse: Human-AI Collaborative Songwriting with Multimodal Inspirations(Prof. Sung-Ju Lee’s Lab)

Abstract

Songwriting is often driven by multimodal inspirations, such as imagery, narratives, or existing music, yet songwriters remain unsupported by current music AI systems in incorporating these multimodal inputs into their creative processes. We introduce Amuse, a songwriting assistant that transforms multimodal (image, text, or audio) inputs into chord progressions that can be seamlessly incorporated into songwriters’ creative process. A key feature of Amuse is its novel method for generating coherent chords that are relevant to music keywords in the absence of datasets with paired examples of multimodal inputs and chords. Specifically, we propose a method that leverages multimodal LLMs to convert multimodal inputs into noisy chord suggestions and uses a unimodal chord model to filter the suggestions. A user study with songwriters shows that Amuse effectively supports transforming multimodal ideas into coherent musical suggestions, enhancing users’ agency and creativity throughout the songwriting process.

 

Main Figure 

피겨1

 

Criteria-Aware Graph Filtering: Extremely Fast Yet Accurate Multi-Criteria Recommendation (Prof. Yoo, Jaemin’s Lab)

Abstract

Multi-criteria (MC) recommender systems, which utilize MC rating information for recommendation, are increasingly widespread in various e-commerce domains. However, the MC recommendation using training-based collaborative filtering, requiring consideration of multiple ratings compared to single-criterion counterparts, often poses practical challenges in achieving state-of-the-art performance along with scalable model training. To solve this problem, we propose CA-GF, a training-free MC recommendation method, which is built upon criteria-aware graph filtering for efficient yet accurate MCrecommendations. Specifically, first, we construct an item–item similarity graph using an MC user-expansion graph. Next, we design CA-GF composed of the following key components, including 1) criterion-specific graph filtering where the optimal filter for each criterion is found using various types of polynomial low-pass filters and 2) criteria preference-infused aggregation where the smoothed signals from each criterion are aggregated. We demonstrate that CA-GF is (a) efficient: providing the computational efficiency, offering the extremely fast runtime of less than 0.2 seconds even on the largest benchmark dataset, (b) accurate: outperforming benchmarkMCrecommendationmethods,achievingsubstantialaccuracy gains up to 24% compared to the best competitor, and (c) interpretable: providing interpretations for the contribution of each criterion to the model prediction based on visualizations.

 

Main Figure

EE Professor Dongsu Han’s Research Team Develops Technology to Accelerate AI Model Training in Distributed Environments Using Consumer-Grade GPUs

EE Professor Dongsu Han’s Research Team Develops Technology to Accelerate AI Model Training in Distributed Environments Using Consumer-Grade GPUs

2024 09 02 211619

<(from left) Professor Dongsu Han, Dr. Hwijoon Iim, Ph.D. Candidate Juncheol Ye>

 

Professor Dongsu Han’s research team of the KAIST Department of Electrical Engineering has developed a groundbreaking technology that accelerates AI model training in distributed environments with limited network bandwidth using consumer-grade GPUs.

 

Training the latest AI models typically requires expensive infrastructure, such as high-performance GPUs costing tens of millions in won and high-speed dedicated networks.

As a result, most researchers in academia and small to medium-sized enterprises have to rely on cheaper, consumer-grade GPUs for model training.

However, they face difficulties in efficient model training due to network bandwidth limitations.

 

Inline image 2024 09 02 14.59.01.205

<Figure 1. Problems in Conventional Low-Cost Distributed Deep Learning Environments>

 

To address these issues, Professor Han’s team developed a distributed learning framework called StellaTrain.

StellaTrain accelerates model training on low-cost GPUs by integrating a pipeline that utilizes both CPUs and GPUs. It dynamically adjusts batch sizes and compression rates according to the network environment, enabling fast model training in multi-cluster and multi-node environments without the need for high-speed dedicated networks.

 

StellaTrain adopts a strategy that offloads gradient compression and optimization processes to the CPU to maximize GPU utilization by optimizing the learning pipeline. The team developed and applied a new sparse optimization technique and cache-aware gradient compression technology that work efficiently on CPUs.

 

This implementation creates a seamless learning pipeline where CPU tasks overlap with GPU computations. Furthermore, dynamic optimization technology adjusts batch sizes and compression rates in real-time according to network conditions, achieving high GPU utilization even in limited network environments.

 

Inline image 2024 09 02 14.59.01.206

<Figure 2. Overview of the StellaTrain Learning Pipeline>

 

Through these innovations, StellaTrain significantly improves the speed of distributed model training in low-cost multi-cloud environments, achieving up to 104 times performance improvement compared to the existing PyTorch DDP.

 

Professor Han’s research team has paved the way for efficient AI model training without the need for expensive data center-grade GPUs and high-speed networks. This breakthrough is expected to greatly aid AI research and development in resource-constrained environments, such as academia and small to medium-sized enterprises.

 

Professor Han emphasized, “KAIST is demonstrating leadership in the AI systems field in South Korea.” He added, “We will continue active research to implement large-scale language model (LLM) training, previously considered the domain of major IT companies, in more affordable computing environments. We hope this research will serve as a critical stepping stone toward that goal.”

 

The research team included Dr. Hwijoon Iim and Ph.D. candidate Juncheol Ye from KAIST, as well as Professor Sangeetha Abdu Jyothi from UC Irvine. The findings were presented at ACM SIGCOMM 2024, the premier international conference in the field of computer networking, held from August 4 to 8 in Sydney, Australia (Paper title: Accelerating Model Training in Multi-cluster Environments with Consumer-grade GPUs). 

 

Meanwhile, Professor Han’s team has also made continuous research advancements in the AI systems field, presenting a framework called ES-MoE, which accelerates Mixture of Experts (MoE) model training, at ICML 2024 in Vienna, Austria.

 

By overcoming GPU memory limitations, they significantly enhanced the scalability and efficiency of large-scale MoE model training, enabling fine-tuning of a 15-billion parameter language model using only four GPUs. This achievement opens up the possibility of effectively training large-scale AI models with limited computing resources.

 

Inline image 2024 09 02 14.59.01.207 1

<Figure 3. Overview of the ES-MoE Framework>

 

Inline image 2024 09 02 14.59.01.207

<Figure 4. Professor Dongsu Han’s research team has enabled AI model training in low-cost computing environments, even with limited or no high-performance GPUs, through their research on StellaTrain and ES-MoE.>

 

 

 

 

 

Professor Sung-Ju Lee’s research team develops a smarthphone AI system that diagnoses mental health based on user’s voice and text input

Professor Sung-Ju Lee’s research team develops a smarthphone AI system that diagnoses mental health based on user’s voice and text input

 

6583f9170dbd4

 

A research team led by Professor Sung-Ju Lee of the Department of Electrical and Electronic Engineering has developed an artificial intelligence (AI) technology that automatically analyzes users’ language usage patterns on smartphones without personal information leakage, thereby monitoring users’ mental health status.
 
This technology allows smartphones to analyze and diagnose a user’s mental health state simply by carrying and using the phone in everyday life.
 
The research team focused on the fact that clinical diagnosis of mental disorders is often done through language use analysis during patient consultations.
 
The new technology uses (1) keyboard input content such as text messages written by the user and (2) voice data collected in real-time from the smartphone’s microphone for mental health diagnosis.  
This language data, which may contain sensitive user information, has previously been challenging to utilize.
 
The solution to this issue in this technology involves the application of federated learning AI, which trains the AI model without data leakage outside the user’s device, thus eliminating privacy invasion concerns. 
 
The AI model is trained on datasets based on everyday conversation content and the speaker’s mental health. It analyzes the conversations into the smartphone in real-time and predicts the user’s mental health scale based on the learned content. 
 
Furthermore, the research team developed a methodology to effectively diagnose mental health from the large amount of user language data provided on smartphones.
 
Recognizing that users’ language usage patterns vary in different real-life situations, they designed the AI model to focus on relatively important language data based on the current situation indicated by the smartphone.
For example, the AI model may prioritize analyzing conversations with family or friends in the evening over work-related discussions, as they may provide more clues for monitoring mental health.
 
This research was conducted in collaboration with Jaemin Shin(CS), HyungJun Yoon (EE Ph.d course) , Seungjoo Lee (EE master’s course), Sung-Joon Park, CEO of Softly AI (KAIST alumnus), Professor Yunxin Liu of Tsinghua University in China, and Professor Jin-Ho Choi of Emory University in the USA. 
 
The paper, titled “FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning,” was presented at the EMNLP (Conference on Empirical Methods in Natural Language Processing), the most prestigious conference in the field of natural language processing, held in Singapore from December 6th to 10th.
 
Professor Sung-Ju Lee commented, “This research is significant as it is a collaboration of experts in mobile sensing, natural language processing, artificial intelligence, and psychology. It enables early diagnosis of mental health conditions through smartphone use without worries of personal information leakage or privacy invasion. We hope this research can be commercialized and benefit society.” 
 
This research was funded by the government (Ministry of Science and ICT) and supported by the Institute for Information & Communications Technology Planning & Evaluation (No. 2022-0-00495, Development of Voice Phishing Detection and Prevention Technology in Mobile Terminals, No. 2022-0-00064, Development of Human Digital Twin Technology for Predicting and Managing Mental Health Risks of Emotional Laborers).
 
 
 

Inline image 2023 12 19 16.28.15.531

<picture 1. A smartphone displaying an app interface for mental health diagnosis. The app shows visualizations of user’s voice and keyboard input analysis, with federated learning technology>
 
 
Inline image 2023 12 19 16.28.42.148
 
<picture 2. A schematic diagram of the mental health diagnosis technology using federated learning, based on user voice and keyboard input on a smartphone>
 
 
 
 
 

CAMEL research team has been successively selected for 2023 Samsung Future Technology Development Program

CAMEL research team has been successively selected for 2023 Samsung Future Technology Development Program

 

Inline image 2023 10 11 16.46.27.000

 

The CAMEL research team from our department, led by Professor Myoungsoo Jung, has been chosen to participate in the Samsung Future Technology Development Program.

 

This recognition comes with support for their study titled, “Software-Hardware Co-Design for Dynamic Acceleration in Sparsity-aware Hyperscale AI.”

 

Large AI models, such as mixture of experts (MoE), autoencoders, and multimodal learning, grouped under the umbrella of Hyperscale AI, have gained traction due to the success of expansive model-driven applications, including ChatGPT.

Based on the insight that computational traits of these models often shift during training, the research team has suggested acceleration strategies.

These encompass software technologies, unique algorithms, and hardware accelerator layouts.

A key discovery by the team was the inability of existing training systems to account for variations in data sparsity and computational dynamics between model layers. This oversight obstructs adaptive acceleration. 

 

To address this, the CAMEL team introduced a dynamic acceleration method that can detect shifts in computational traits and adapt computation techniques in real-time.

The findings from this research could benefit not only Hyperscale AI but also the larger domain of deep learning and the burgeoning services sector.

The team’s goals include producing tangible hardware models and offering open-source software platforms.

 

Samsung Electronics, since 2013, has initiated the ‘Future Technology Development Program’, investing KRW 1.5 trillion to stimulate technological innovation pivotal for future societal progress.

For a decade, they have backed initiatives in foundational science, innovative materials, and ICT, particularly favoring ventures that are high-risk but offer significant returns.

The CAMEL team has been collaborating with Samsung since 2021 on a project focusing on accelerating Graph Neural Networks (GNNs). We extend our hearty congratulations to them as they embark on this fresh exploration into the realm of Hyperscale AI.

 

Inline image 2023 10 11 16.51.09.000