Professor Yun Insu’s Lab (as a Part of Team Atlanta) Advances to Finals of the U.S. DARPA ‘AI Cyber Challenge (AIxCC)’ and Secures $2 Million in Research Funding
<Professor Insu Yun>
Professor Yun Insu’s Lab (as a Part of Team Atlanta) Advances to Finals of the U.S. DARPA ‘AI Cyber Challenge (AIxCC)’ and Secures $2 Million in Research Funding
<Professor Insu Yun>
Ph.D. candidate Hee Suk Yoon (Prof. Chang D. Yoo) wins excellent paper award
<(From left) Professor Chang D. Yoo, Hee Suk Yoon integrated Ph.D. candidate>
The Korean Society for Artificial Intelligence holds conferences quarterly, and this year’s summer conference is scheduled to take place from August 15 to 17 at BEXCO in Busan.
Hee Suk Yoon, a PhD candidate, has been recognized for the excellence of his paper titled “BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation” and has been selected as an award recipient.
Moreover, the findings will be presented at the ‘European Conference on Computer Vision (ECCV) 2024′, one of the top international conferences in the field of computer vision, to be held in Milan, Italy, in September this year (Paper title: BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation).
The detailed information is as follows:
* Conference Name: 2024 Summer Conference of the Korean Artificial Intelligence Association
* Period: August 15 to 17, 2024
* Award Name: Excellent Paper Award
* Authors: Hee Suk Yoon, Eunseop Yoon, Chang D. Yoo (Supervising Professor)
* Paper Title: BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
This research is considered an innovative breakthrough that overcomes the limitations of existing multimodal dialogue large models, such as ChatGPT, and maintains consistency in image generation within multimodal dialogues.
Figure 1 : Image Response of ChatGPT and BI-MDRG (ours)
Traditional multimodal dialogue models prioritize generating textual descriptions of images and then create images using text-to-image models.
This approach often fails to sufficiently reflect the visual information from previous dialogues, leading to inconsistent image responses.
However, Professor Yoo’s BI-MDRG minimizes image information loss through a direct image referencing technique, enabling consistent image response generation.
Figure 2 : Framework of previous multimodal dialogue system and our proposed BI-MDRG
BI-MDRG is a new system designed to solve the problem of image information loss in existing multimodal dialogue models by proposing Attention Mask Modulation and Citation Module.
Attention Mask Modulation allows the dialogue to focus directly on the image itself instead of its textual description, while the Citation Module ensures consistent responses by directly referencing objects that should be maintained in image responses through citation tagging of the same objects appearing in the conversation.
The research team validated BI-MDRG’s performance across various multimodal dialogue benchmarks, achieving high dialogue performance and consistency.
Figure 3: Overall framework of BI-MDRG
BI-MDRG offers practical solutions in various multimodal application fields.
For instance, in customer service, it can enhance user satisfaction by providing accurate images based on conversation content.
In education, it can improve understanding by consistently providing relevant images and texts in response to learners’ questions. Additionally, in the entertainment field, it can enable natural and immersive interactions in interactive games.
Professor Junmo Kim’s research team has garnered funding for 2024 SW Star Lab project under the Information and Communication Broadcasting Technology Development Program
<Professor Junmo Kim>
Professor Junmo Kim’s research team from our department has garnered funding for 2024 SW Star Lab project under the Information and Communication Broadcasting Technology Development Program, administered by the Ministry of Science and ICT and the Institute of Information & Communications Technology Planning & Evaluation (IITP).
The SW Star Lab project aims to secure world-class original technology in five core SW domains (‘Big Data’, ‘Cloud’, ‘Algorithm’, ‘Application SW’, and ‘Artificial Intelligence’) and to cultivate master’s and doctoral-level SW talent. Selected research teams receive funding of approximately 1.5 billion won (about 200 million won annually) over an eight-year period.
Professor Junmo Kim’s research team proposed a project titled “Developing Sustainable, Real-Time Generative AI for Multimodal Interaction” in the ‘Artificial Intelligence’ domain.
This project seeks to overcome the limitations of current 2D-based image/video generation models through the development of 3D-based image/video generation model. The research focuses on developing image/video generation model capable of understanding objects as complete entities rather than from specific(fixed) viewpoint, enabling more realistic object generation and movement representation.
Furthermore, the proposal extends beyond simple generation models to include understanding multimodal inputs and restricting harmful content, aiming to develop technology that generates content with a deep comprehension of societal and economic implications.
This project is expected to create synergies by collaborating with the Vision-Centered Artificial General Intelligence (ViC-AGI) Lab, led by Professor In So Kweon from our department, which has been designated as KAIST Cross-Generation Collaborative Lab.
Professor Minsoo Rhu’s research lab has been selected for the 2024 SW Star Lab Project under the Information and Communication Broadcasting Technology Development Program
<Professor Minsoo Rhu>
Professor Changick Kim’s Research Team Develops ‘VideoMamba,’ a High-Efficiency Model Opening a New Paradigm in Video Recognition
<(From left) Professor Changick Kim, Jinyoung Park integrated Ph.D. candidate, Hee-Seon Kim Ph.D. candidate, Kangwook Ko Ph.D. candidate, and Minbeom Kim Ph.D. candidate>
On the 9th, Professor Changick Kim’s research team announced the development of a high-efficiency video recognition model named ‘VideoMamba.’ VideoMamba demonstrates superior efficiency and competitive performance compared to existing video models built on transformers, like those underpinning large language models such as ChatGPT. This breakthrough is seen as pioneering a new paradigm in the field of video utilization.
Figure 1: Comparison of VideoMamba’s memory usage and inference speed with transformer-based video recognition models.
VideoMamba is designed to address the high computational complexity associated with traditional transformer-based models.
These models typically rely on the self-attention mechanism, which scales quadratically in complexity. However, VideoMamba utilizes a Selective State Space Model (SSM) mechanism, enabling efficient linear complexity processing. This allows VideoMamba to effectively capture the spatio-temporal information in videos and efficiently handle long dependencies within video data.
Figure 2: Detailed structure of the spatio-temporal forward and backward Selective State Space Model in VideoMamba.
To maximize the efficiency of the video recognition model, Professor Kim’s team incorporated spatio-temporal forward and backward SSMs into VideoMamba. This model integrates non-sequential spatial information and sequential temporal information effectively, enhancing video recognition performance.
The research team validated VideoMamba’s performance across various video recognition benchmarks. As a result, VideoMamba achieved high accuracy with low GFLOPs (Giga Floating Point Operations) and memory usage, and it demonstrated very fast inference speed.
VideoMamba offers an efficient and practical solution for various applications requiring video analysis. For example, autonomous driving can analyze driving footage to accurately assess road conditions and recognize pedestrians and obstacles in real time, thereby preventing accidents.
In the medical field, it can analyze surgical videos to monitor the patient’s condition in real-time and respond swiftly to emergencies. In sports, it can analyze players’ movements and tactics during games to improve strategies and detect fatigue or potential injuries during training to prevent them. VideoMamba’s fast processing speed, low memory usage, and high performance provide significant advantages in these diverse video utilization fields.
The research team includes Jinyoung Park (integrated Ph.D candidate), Hee-Seon Kim (Ph.D. candidate), Kangwook Ko (Ph.D. candidate) as co-first authors, and Minbeom Kim (Ph.D. candidate) as a co-author, with Professor Changick Kim as the corresponding author from the Department of Electrical and Electronic Engineering at KAIST.
The research findings will be presented at the European Conference on Computer Vision (ECCV) 2024, one of the top international conferences in the field of computer vision, to be held in Milan, Italy, in September this year. (Paper title: VideoMamba: Spatio-Temporal Selective State Space Model).
This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2020-0-00153, Penetration Security Testing of ML Model Vulnerabilities and Defense).
Professor Seungwon Shin’s Research Team Publishes Paper at Top Conference in Computer Science (USENIX Security)
<Professor Seungwon Shin>
Professor Chan-Hyun Youn’s Research Team Developed a Technique to Prevent Abnormal Data Generation in Diffusion Models
<(From left) Professor Chan-Hyun Youn, Jinhyeok Jang Ph.D. candidate, Changha Lee Ph.D. candidate, Minsu Jeon Ph.D. >
Professor Chan-Hyun Youn’s research team from the EE department has developed a momentum-based generation technique to address the issue of abnormal data generation frequently encountered by diffusion model-based generative AI.
While diffusion model-based generative AI, which has recently garnered significant attention, generally produces realistic images, it often generates abnormal details, such as unnaturally bent joints or horses with only three legs.
Figure 1 : The generated images by Stable Diffusion with the proposed technique
To address this problem, the research team reformulated the generative process of diffusion models as an optimization problem, such as gradient descent. Both the generative process of diffusion models and gradient descent can be expressed as a Generalized Expectation-Maximization problem, and visualization revealed the presence of numerous local minima and saddle points in the generative process.
This observation demonstrated that inappropriate outcomes are akin to local minima or saddle points. Based on this insight, the team introduced the widely used momentum technique from optimization into the generative process.
Various experiments confirmed that the generation of inappropriate images significantly decreased without additional training, and the quality of generated images improved even with reduced computational cost. These results suggest a new insight about the generative process of diffusion models as a progressive optimization problem and show that introducing the momentum technique into the generative process reduces inappropriate outcomes.
This new research outcome is expected to not only improve generation results but also provide a new interpretation of generative AI and inspire various follow-up studies. The research findings were presented in February at the 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024) in Vancouver, Canada, one of the leading international conferences in the AI field, under the title “Rethinking Peculiar Images by Diffusion Models: Revealing Local Minima’s Role.”
Professor Chan-Hyun Youn’s Research Team Developed a Dataset Watermarking Technique for Dataset Copyright Protection
<Professor Chan-Hyun Youn, and Jinhyeok Jang Ph.D. candidate>
Professor Chan-Hyun Youn’s research team from the EE department has developed a technique for dataset copyright protection named “Undercover Bias.” Undercover Bias is based on the premise that all datasets contain bias and that bias itself has discriminability. By embedding artificially generated biases into a dataset, it is possible to verify AI models using the watermarked data without adequate permission.
This technique addresses the issues of data copyright and privacy protection, which have become significant societal concerns with the rise of AI. It embeds a very subtle watermark into the target dataset. Unlike prior methods, the watermark is nearly imperceptible and clean-labeled. However, AI models trained on the watermarked dataset unintentionally acquire the ability to classify the watermark. The presence or absence of this property allows for the verification of unauthorized use of the dataset.
Figure 1 : Schematic of verification based on Undercover Bias
The research team demonstrated that the proposed method can verify models trained using the watermarked dataset with 100% accuracy across various benchmarks.
Further, they showed that models trained with adequate permission are misidentified as unauthorized with a probability of less than 3e-5%, proving the high reliability of the proposed watermark. The study will be presented at one of the leading international conferences in the field of computer vision, the European Conference on Computer Vision (ECCV) 2024, to be held in Milan, Italy, in October this year.
ECCV is renowned in the field of computer vision, alongside conferences like CVPR and ICCV, as one of the top-tier international academic conferences. The paper will be titled “Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-wise Hidden Bias.”
Professor Chan-Hyun Youn’s Research Team Developed a Network Calibration Technique to Improve the reliability of artificial neural networks
<(from left) Professor Chan-Hyun Youn, Gyusang Cho ph.d. candidate>
Professor Chan-Hyun Youn’s research team from the EE department, has successfully developed a network calibration algorithm called “Tilt and Average; TNA” to improve the reliability of neural networks. Unlike existing methods based on calibration maps, the TNA technique transforms the weights of the classifier’s last layer, offering a significant advantage in that it can be seamlessly integrated with existing methods. This research is being evaluated as an outstanding technology in the field of enhancing artificial intelligence reliability.
The research proposes a new algorithm to address the overconfident prediction problem inherent in existing artificial neural networks. Utilizing the high-dimensional geometry of the last linear layer, this algorithm focuses on the angular aspects between the row vectors of the weights, suggesting a mechanism to adjust (Tilt) and compute the average (Average) of their directions.
The research team confirmed that the proposed method can reduce calibration error by up to 20%, and the algorithm’s ability to integrate with existing calibration map-based techniques is a significant advantage. The results of this study are scheduled to be presented at the ICML (International Conference on Machine Learning, https://icml.cc), one of the premier international conferences in the field of artificial intelligence, held in Vienna, Austria, this July. Now in its 41st year, ICML is renowned as one of the most prestigious and long-standing international conferences in the machine learning field, alongside other top conferences such as CVPR, ICLR, and NeurIPS.
In addition, this research was conducted with support from the Korea Coast Guard (RS-2023-00238652) and the Defense Acquisition Program Administration (DAPA) (KRIT-CT-23-020). The paper can be found as : Gyusang Cho and Chan-Hyun Youn, “Tilt and Average : Geometric Adjustment of the Last Layer for Recalibration”, ICML (2024)
Recently, big tech companies at the forefront of large-scale AI service provision are competitively increasing the size of their models and data to deliver better performance to users. The latest large-scale language models require tens of terabytes (TB, 10^12 bytes) of memory for training. A domestic research team has developed a next-generation interface technology-enabled high-capacity, high-performance AI accelerator that can compete with NVIDIA, which currently dominates the AI accelerator market.
Professor Jung Myoungsoo’s research team, announced on the 8th that they have developed a technology to optimize the memory read/write performance of high-capacity GPU devices with the next-generation interface technology, Compute Express Link (CXL).
The internal memory capacity of the latest GPUs is only a few tens of gigabytes (GB, 10^9 bytes), making it impossible to train and infer models with a single GPU. To provide the memory capacity required by large-scale AI models, the industry generally adopts the method of connecting multiple GPUs. However, this method significantly increases the total cost of ownership (TCO) due to the high prices of the latest GPUs.
< Representative Image of the CXL-GPU >
Therefore, the ‘CXL-GPU’ structure technology, which directly connects large-capacity memory to GPU devices using the next-generation connection technology, CXL, is being actively reviewed in various industries. However, the high-capacity feature of CXL-GPU alone is not sufficient for practical AI service use. Since large-scale AI services require fast inference and training performance, the memory read/write performance to the memory expansion device directly connected to the GPU must be comparable to that of the local memory of the existing GPU for actual service utilization.
*CXL-GPU: It supports high capacity by integrating the memory space of memory expansion devices connected via CXL into the GPU memory space. The CXL controller automatically handles operations needed for managing the integrated memory space, allowing the GPU to access the expanded memory space in the same manner as accessing its local memory. Unlike the traditional method of purchasing additional expensive GPUs to increase memory capacity, CXL-GPU can selectively add memory resources to the GPU, significantly reducing system construction costs.
Our research team has developed technology to improve the causes of decreased memory read/write performance of CXL-GPU devices. By developing technology that allows the memory expansion device to determine its memory write timing independently, the GPU device can perform memory writes to the memory expansion device and the GPU’s local memory simultaneously. This means that the GPU does not have to wait for the completion of the memory write task, thereby solving the write performance degradation issue.
< Proposed CXL-GPU Architecture >
Furthermore, the research team developed a technology that provides hints from the GPU device side to enable the memory expansion device to perform memory reads in advance.
Utilizing this technology allows the memory expansion device to start memory reads faster, achieving faster memory read performance by reading data from the cache (a small but fast temporary data storage space) when the GPU device actually needs the data.
< CXL-GPU Hardware Prototype >
This research was conducted using the ultra-fast CXL controller and CXL-GPU prototype from Panmnesia*, a semiconductor fabless startup. Through the technology efficacy verification using Panmnesia’s CXL-GPU prototype, the research team confirmed that it could execute AI services 2.36 times faster than existing GPU memory expansion technology. The results of this research will be presented at the USENIX Association Conference and HotStorage research presentation in Santa Clara this July.
*Panmnesia possesses a proprietary CXL controller with pure domestic technology that has reduced the round-trip latency for CXL memory management operations to less than double-digit nanoseconds (nanosecond, 10^9 of a second) for the first time in the industry. This is more than three times faster than the latest CXL controllers worldwide. Panmnesia has utilized its high-speed CXL controller to directly connect multiple memory expansion devices to the GPU, enabling a single GPU to form a large-scale memory space in the terabyte range.
Professor Jung stated, “Accelerating the market adoption of CXL-GPU can significantly reduce the memory expansion costs for big tech companies operating large-scale AI services.”
< Evaluation Results of CXL-GPU Execution Time >