Title: In search of strong embedding extractors for speaker diarisation
Authors: J. Jung, B. Lee, J. Huh, A. Brown, Y. Kwon, S. Watanabe, J. S. Chung
Conference: International Conference on Acoustics, Speech, and Signal Processing
Abstract: Speaker embedding extractors (EEs), which map input audio to a speaker discriminant latent space, are of paramount importance in speaker diarisation. However, there are several challenges when adopting EEs for diarisation, from which we tackle two key problems. First, the evaluation is not straightforward because the required features differ between speaker verification and diarisation. We show that better performance on widely adopted speaker verification evaluation protocols does not lead to better diarisation performance. Second, embedding extractors have not seen utterances in which multiple speakers exist. These inputs are inevitably present in speaker diarisation because of overlapped speech and speaker changes; they degrade the performance. To mitigate the f irst problem, we generate speaker verification evaluation protocols that better mimic the diarisation scenario. We propose two data augmentation techniques to alleviate the second problem, making embedding extractors aware of overlapped speech or speaker change input. One technique generates overlapped speech segments, and the other generates segments where two speakers utter sequentially. Extensive experimental results using three state-of-the-art speaker embedding extractors demonstrate that both proposed approaches are effective.
Main Figure:
Title: Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Authors: J. Lee, J. S. Chung, S. Chung
Conference: International Conference on Acoustics, Speech, and Signal Processing
Abstract: The goal of this work is zero-shot text-to-speech synthesis, with speaking styles and voices learnt from facial characteristics. Inspired by the natural fact that people can imagine the voice of someone when they look at his or her face, we introduce a face-styled diffusion text-to-speech (TTS) model within a unified framework learnt from visible attributes, called FACE-TTS. This is the first time that face images are used as a condition to train a TTS model. We jointly train cross-model biometrics and TTS models to preserve speaker identity between face images and generated speech segments. We also propose a speaker feature binding loss to enforce the similarity of the generated and the ground truth speech segments in speaker embedding space. Since the biometric information is extracted directly from the face image, our method does not require extra fine-tuning steps to generate speech from unseen and unheard speakers. We train and evaluate the model on the LRS3 dataset, an in-the-wild audio-visual corpus containing background noise and diverse speaking styles. The project page is https: //facetts.github.io.
Main Figure:
[Title]
Repeated K-means Clustering Algorithm For Radar Sorting
[Authors]
Dong Hyun ParK, Dong-ho Seo, Jee-hyeon Baek, Won-jin Lee, Dong Eui Chang
[Abstract]
In modern electronic warfare, a number of radar emitters are in operation, causing radar receivers to receive high-density signal pulses that occur simultaneously. To analyze the radar signals more accurately and identify enemies, the sorting process of high-density radar signals is very important before analysis. Recently, machine learning algorithms, specifically K-means clustering, are the subject of research aimed at improving the accuracy of radar signal sorting. One of the challenges faced by these studies is that the clustering results can vary depending on how the initial points are selected and how many clusters number are set. This paper introduces a repeated K-means clustering algorithm that aims to accurately cluster all data by identifying and addressing false clusters in the radar sorting problem. To verify the performance of the proposed algorithm, experiments are conducted by applying it to simulated signals that are generated by a signal generator.

[Title]
A Novel Batch Streaming Pipeline for Radar Emitter Classification
[Authors]
Dong Hyun Park, Dong-Ho Seo, Jee-Hyeon Baek, Won-Jin Lee, Dong Eui Chang
[Abstract]
In electronic warfare, radar emitter classification plays a crucial role in identifying threats in complex radar signal environments. Traditionally, this has been achieved using heuristic-based methods and handcrafted features. However, these methods struggle to adapt to the complexities of modern combat environments and varying radar signal characteristics. To address these challenges, this paper introduces a novel batch streaming pipeline for radar emitter classification. Our pipeline consists of two key components: radar deinterleaving and radar pattern recognition. We leveraged the DBSCAN algorithm and an RNN encoder, which are relatively light and simple models, considering the limited hardware resource environment of a military weapon system. Although we chose to utilize lightweight machine learning and deep learning models, we designed our pipeline to perform optimally through hyperparameter optimization of each component. We demonstrate the effectiveness of our proposed model and pipeline through experimental validation and analysis. Overall, this paper provides background knowledge on each model, introduces the proposed pipeline, and presents experimental results.

[Title]
A Deep Learning-based Fault Recovery System for Safe Flight of UAV in the Position Sensor Freezing Situation
[Authors]
Dong Hyun Park, Jong Seo Kim, Jae-Hyeon Park, Dong Eui Chang
[Abstract]
As the use of robots such as unmanned aerial vehicles (UAVs), unmanned ground vehicles, and robot arms in industry and leisure continues to grow, it becomes increasingly important to maintain these robots in a stable condition to prevent potential danger, including actuator, sensor, and system faults. Consequently, researchers have developed various algorithms to address these faults. In this study, we propose a deep learning-based fault recovery system designed to ensure the safe flight of UAVs in situations where position sensors freeze. When a position sensor freezing event is detected, this fault recovery system rectifies the issue by enabling the UAV to utilize values from a long short-term memory-based position prediction model, thus replacing the frozen sensor data. We tested our fault recovery system with a UAV in a Gazebo simulation and validated its effectiveness by comparing it with an inertial measurement unit kinematic model-based fault recovery system. The proposed deep learning-based fault recovery system demonstrated superior performance.

[Title]
Deep Learning Algorithm with Residual Blocks for Chemical Gas Concentration Estimation
[Authors]
Hee-Deok Jang, Jae-Hyeon Park, Dong Eui Chang, Hyun-Soo Seo, Hyunwoo Nam
[Abstract]
Chemical warfare agents (CWA) are highly toxic and hazardous substances that cause serious harm to humans, even when used in small quantities. The accurate estimation of the concentration of CWA is crucial to allow effective responses to these types of attacks. In this paper, we propose a deep learning algorithm for chemical gas concentration estimation, referred to as MLP-res, and compare its estimation performance with those of other machine learning algorithms. MLP-res utilizes a structure with residual blocks and demonstrates comparable or even superior performance compared with those of existing machine learning algorithms. Additionally, MLP-res exhibits high-generalization performance even with the use of experimental condition data that were not used for training. These results indicate that MLP-res can accurately estimate the concentration of chemical gases in actual environments.

[Title]
CNN-LSTM network for accurate gas source position estimation
[Authors]
장희덕, 박재현, 장동의, 남현우
[Abstract]
The accurate determination of the source of a gas cloud that poses a threat to human health is crucial for an early response. Recently, researchers have been using deep neural networks to identify a cell containing the source in a 2D grid map. This study presents a novel approach using a CNN-LSTM network to pinpoint both the cell containing the gas source and its precise location within the cell. This is achieved by analyzing gas diffusion data generated through virtual simulations.\

[Title]
FTIR absorption spectrum classification using convolutional neural network
[Authors]
박재현, 장희덕, 장동의, 남현우
[Abstract]
FTIR (Fourier transform infrared) equipment measures the infrared absorption spectrum of a chemical contaminant cloud that exists between the light source and spectrometer. The shape of this spectrum is a unique property of a chemical substance. So, by measuring the FTIR spectrum and classifying the material, it is possible to determine the composition of a distant chemical contaminant cloud. In this study, CNN is used for spectral classification of pure chemicals measured by FTIR equipment manufactured by the Agency for Defense Development, and the classification performance of CNN is verified by comparing with other machine learning algorithms.

[Title]
Enhancing safety of image processing-based guidewire navigation through collision risk function and fail-safe mechanism
[Authors]
유상백, 최재순, 장동의
[Abstract]
This paper aims to improve the safety of image processing-based guidewire navigation for vascular interventions. To achieve this, a function based on an image is designed to represent the collision between the vessel and guidewire. This function can be used to reduce the risk in reinforcement learning (RL). Additionally, a fail-safe mechanism is implemented to enhance safety and overcome errors in RL.

[Title]
Optimized Network Pruning Method for Li-ion Batteries State-of-charge Estimation on Robot Embedded System
[Authors]
Dong Hyun Park , Hee-deok Jang , Dong Eui Chang
[Abstract]
Lithium-ion batteries are actively used in various industrial sites such as field robots, drones, and electric vehicles due to their high energy efficiency, light weight, long life span, and low self-discharge rate. When using a lithium-ion battery in a field, it is important to accurately estimate the SoC (State of Charge) of batteries to prevent damage. In recent years, SoC estimation using data-based artificial neural networks has been in the spotlight, but it has been difficult to deploy in the embedded board environment at the actual site because the computation is heavy and complex. To solve this problem, neural network lightening technologies such as network pruning have recently attracted attention. When pruning a neural network, the performance varies depending on which layer and how much pruning is performed. In this paper, we introduce an optimized pruning technique by improving the existing pruning method, and perform a comparative experiment to analyze the results.