Speech processing technology has become very popular with the voice assistant such as Siri and Alexa and closely related in our real-life environment.
In this technology, extracting discriminant speakers’ identity from voice is key-point to enabling model adaptation and user-specific service incorporating
other speech processing system. Also, this speaker’s information can be used individually to verify security system using the voice as biometric input.
Neural networks approach open new era on this technology and role a significant component. However, understanding how the neural network operated
on the voice is difficult while it can approximate any function to enable robust performance. For this reason, the neural network often called “black-box” of
which structure will not give you any insights directly what it exactly has learned.
In this talk, I will review the development of speaker recognition system and progress what I made in recent years. I will also introduce domain adaptation
approaches for speaker and language recognition system under channel domain mismatched condition. Then I will present the investigation of neural network
based speaker recognition system by analyzing hidden representation. In this investigation, I explored what phonetic information is encoded in end-to-end
speaker recognition model from human voices and how the model captures and discriminates voice when there is text-independent input.
Suwon Shon received his B.S and integrated Ph.D. in electrical engineering from Korea University at South Korea in 2010 and 2017. He is now a postdoctoral associate at Spoken Language Systems group of MIT CSAIL. His research focuses on machine learning technologies for speech signal processing. He has worked on speaker and language recognition and related pre-processing techniques. He is now exploring dialect recognition, blacklist detection and anti-spoofing algorithm to analyze personal identity as well.