The recent emergence of cost-effective and easy-operation depth sensors have opened the door to a new family of methods for action recognition/3D pose estimation from depth sequences. Estimating 3D poses of hands from a single depth map finds numerous applications in human-computer interaction, computer graphics, and virtual & augmented reality and it has emerged as a key problem in computer vision. We interact with the world using our hands to manipulate objects, machines, tools, and socialize with other humans. 1) The paper tackling Insufficient data issue in hand pose estimation problem (CVPR'18 Oral) will be presented: The key idea is to synthesize data in the skeleton space (instead of doing so in the depth-map space) which enables an easy and intuitive way of manipulating data entries. Since the skeleton entries generated in this way do not have the corresponding depth map entries, we exploit them by training a separate hand pose generator (HPG) which synthesizes the depth map from the skeleton entries. 2) The paper about hand-object interacting benchmark (CVPR'18 Poster) will be presented: Towards understanding first-person dynamic hand actions interacting with 3D objects, we collected RGB-D video sequences comprised of more than 100K frames of 45 daily hand action categories, involving 26 different objects in several hand configurations.