In this talk, I will introduce a sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization. The proposed policy regularization induces a sparse and multi-modal optimal policy distribution of a sparse MDP. The optimality condition of a sparse MDP and sparse value iteration will be discussed and compared to existing methods. The convergence and optimality of sparse value iteration will be presented. Interestingly, it can be shown that the performance error of a sparse MDP has a constant bound, while the error of a soft MDP increases logarithmically with respect to the number of actions. In experiments, sparse MDPs are applied to reinforcement learning problems and outperform existing methods in terms of the convergence speed and performance.
If time permits, I will also discuss our group’s recent work in deep learning, including Text2Action and nested sparse networks.
Songhwai Oh received the B.S. (with highest honors), M.S., and Ph.D. degrees in electrical engineering and computer sciences from the University of California, Berkeley, in 1995, 2003, and 2006, respectively. He is currently an Associate Professor in the Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea. Before his Ph.D. studies, he was a Senior Software Engineer at Synopsys, Inc. and a Microprocessor Design Engineer at Intel Corporation. In 2007, he was a Postdoctoral Researcher in the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley. From 2007 to 2009, he was an Assistant Professor of electrical engineering and computer science in the School of Engineering, University of California, Merced. His research interests include robotics, computer vision, cyber-physical systems, and machine learning.