A central question in the era of ‘big data’ is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this talk is the opposite, namely that most of the value in the information – in some applications – is in the parts that deviate from the average, that are unusual, atypical. We think of this a new knowledge, things that are not learned from ordinary data. Think of art: the valuable paintings or writings are those that deviate from the norms and break the rules, that are atypical. Or groundbreaking scientific discoveries, which finds new structure in data.
The aim of our approach is to extract such ‘rare interesting’ data out of big data sets. A central question is what ‘interesting’ means. Universal approaches are required, since it is not known in advance what we are looking for; and for something to be interesting it is not sufficient to be rare. We develop a measure of ‘interestingness’ based on Kolmogorov complexity, information theory and descriptive length which we call Atypicality. We show that atypicality is optimum for anomaly detection as well as having other important theoretical properties. Atypicality can be seen as a complement to machine learning, where we are looking for information that is not learned from prior data.
During the talk we will discuss new methods for universal source coding and minimum descriptive length (MDL), and we show applications to the stock market, heart beat signals, ocean acoustics, and genetics.
Anders Høst-Madsen was born in Denmark in 1966. He received the M.Sc. degree in engineering in 1990 and the Ph.D. degree in mathematics in 1993, both from the Technical University of Denmark.
From 1993 to 1996 he was with Dantec Measurement Technology A/S, Copenhagen, Denmark, from 1996 to 1998 he was an assistant professor at Gwangju Institute of Science and Technology (GIST), Gwangju, Korea, and from 1998 to 2000 an assistant professor at Department of Electrical and Computer Engineering, University of Calgary, Calgary, AB, Canada, and a staff scientist at TRLabs, Calgary. Since 2001 he has been with the Department of Electrical Engineering, University of Hawaii at Manoa, Honolulu, since 2009 as professor. He was also a founder and CTO (2007-2008) of Kai Medical, Inc., which is making equipment for non-contact heart monitoring. His research interests are in statistical signal processing, information theory, and wireless communications, including ad-hoc networks, cooperative diversity, wireless sensor networks, heart monitoring, life detection, and marine mammal signal processing.
He has served as Editor for Multiuser Communications for the IEEE Transactions on Communications and as Associate Editor for Detection and Estimation for the IEEE Transactions on Information Theory. He was general co-chair of ISITA 2012 and IEEE ISIT 2014. He received the Eurasip Journal of Wireless Communications and Networks (JWCN) best paper award in 2006, and is a Fellow of IEEE. He is currently on sabbatical leave at Seoul National University.