
Researchers led by Professor Jung-Woo Choi at the School of Electrical Engineering, KAIST, have developed DeepASA, an unified auditory AI model capable of comprehensive auditory scene analysis using diverse acoustic cues, similarly to human hearing. This research has been presented at NeurIPS 2025, the world’s top-tier AI conference, under the title “DeepASA: An Object-Oriented Multi-Purpose Network for Auditory Scene Analysis.”
Humans naturally analyze sounds collected through both ears and extract information such as the direction, type, onset time of the sound, as well as the spatial environments where reflections occur. Furthermore, when multiple sounds overlap, humans can selectively focus on each source, separate them, and understand individual sound contents.

DeepASA processes multi-channel audio recordings in an object-oriented manner—analogous to the human binaural system—and performs almost every auditory scene analysis task, including moving sound source separation, dereverberation of direct and reflected components, sound classification, event detection, and direction-of-arrival estimation. Unlike conventional single-channel methods, DeepASA enables multi-channel separation for immersive audio such as Dolby Atmos and Ambisonics, allowing editing and remixing of spatial audio data by sound object.

(Right) detected sources, events, directions, and separated results compared with ground truth>
The researchers demonstrated that a single model performing multiple tasks yields improved performance for each task. They further introduced a Chain of Inference approach, in which temporal coherence among separated source signals, detected classes, and directional patterns is analyzed to refine the inference results, thereby significantly improving the robustness of auditory AI systems.

Even before the NeurIPS presentation, the research team had achieved first place in Task 4 of the DCASE Challenge 2025, the world’s most prestigious competition in acoustic detection and analysis. This task focused on “Spatial Semantic Segmentation of Sound Scenes.” At the DCASE 2025 Workshop held in October 2025, the team received the Best Student Paper Award (given to a single team) and simultaneously won the Best Judge’s Award.

Such advanced audio AI technology enables unprecedented capabilities for sound-based detection of hazardous or critical events. For instance, it can detect long-distance drones based solely on sound, monitor abnormal activity in border surveillance systems, or recover faint audio buried by noise. Therefore, it can play a critical role in national-defense and security applications requiring detection of potential threats using acoustic information.
In addition, by separating sound objects and extracting directional and spatial acoustic features from recorded immersive audio, DeepASA enables re-editing of complex sound fields, which is essential for next-generation AR/VR spatial audio rendering. It represents a core technology enabling complete re-synthesis and reconstruction of immersive sound scenes.
The DeepASA research team includes Dr. Dong-Heon Lee and Ph.D. student Young-Hoo Kwon from KAIST EE. This project was supported by the National Research Foundation of Korea (NRF, No. RS-2024-00337945), the Ministry of Science and ICT (STEAM Research Program), and the Center for Applied Research in Artificial Intelligence (CARAI) grant funded by DAPA and ADD.