D. Hong, S. Lee, Y. H. Cho, D. Baek, J. Kim and N. Chang, "Least-Energy Path Planning With Building Accurate Power Consumption Model of Rotary Unmanned Aerial Vehicle," in IEEE Transactions on Vehicular Technology, vol. 69, no. 12, pp. 14803-14817, Dec. 2

abstract 

Rotary unmanned aerial vehicles (UAVs), also known as drones, have various advantages, yet their actual applications are limited owing to their flight range. However, increasing the flight range by enhancing the hardware is a challenging task. In this study, we introduce the first step of systematic drone low-power optimization based on the framework of electronic design automation (EDA). We attempt drone power management without in-depth knowledge of aerodynamics and control theory. Instead, we introduce a novel power model of drones using physical parameters that can affect power consumption, such as the three-axis velocity and acceleration, drone height, wind velocity, and the weight and volume of payloads. We detail the experimental setup, power modeling, accuracy verification, and optimization for minimum energy paths. We achieved over 90% accuracy in power modeling without depending on aerodynamics. The proposed approach shows the feasibility of energy-aware rotary UAV flight trajectory optimization considering the external forces affecting drones such as wind. The proposed method presents up to 24.01% energy saving through path changes considering external forces.

1

Youngeun Kwon, Yunjae Lee, and Minsoo Rhu, "Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training," The 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA-27), Seoul, South Korea, Feb.

Abstract

Personalized recommendations are one of the most widely deployed machine learning (ML) workload serviced from cloud datacenters. As such, architectural solutions for high-performance recommendation inference have recently been the target of several prior literatures. Unfortunately, little have been explored and understood regarding the training side of this emerging ML workload. In this paper, we first perform a detailed workload characterization study on training recommendations, root-causing sparse embedding layer training as one of the most significant performance bottlenecks. We then propose our algorithm-architecture co-design called Tensor Casting, which enables the development of a generic accelerator architecture for tensor gather-reduce that encompasses all the key primitives of training embedding layers. When prototyped on a real CPU-GPU system, Tensor Casting provides 1.9-15x improvements in training throughput compared to state-of-the-art approaches.

2

Lazy Batching: An SLA-aware Batching System for Cloud Machine Learning Inference, The 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA-27), Seoul, South Korea, Feb. 2021

Abstract

In cloud ML inference systems, batching is an essential technique to increase throughput which helps optimize total-cost-of-ownership. Prior graph batching combines the individual DNN graphs into a single one, allowing multiple inputs to be concurrently executed in parallel. We observe that the coarse-grained graph batching becomes suboptimal in effectively handling the dynamic inference request traffic, leaving significant performance left on the table. This paper proposes LazyBatching, an SLA-aware batching system that considers both scheduling and batching in the granularity of individual graph nodes, rather than the entire graph for flexible batching. We show that LazyBatching can intelligently determine the set of nodes that can be efficiently batched together, achieving an average 15x,1.5x, and 5.5x improvement than graph batching in terms of average response time, throughput, and SLA satisfaction, respectively.

1

Game-Theoretic Model Predictive Control with Data-Driven Identification of Vehicle Model for Head-to-Head Autonomous Racing

Title : Game-Theoretic Model Predictive Control with Data-Driven Identification of Vehicle Model for Head-to-Head Autonomous Racing

Authors: Chanyoung Jung, Seungwook Lee, Hyunki Seong, Andrea Finazzi and David Hyunchul Shim

Workshop: IEEE ICRA 2021 : Opportunities and challenges with autonomous racing [Best paper award]

Link : https://linklab-uva.github.io/icra-autonomous-racing/

Abstract : Resolving edge-cases in autonomous driving, head-to-head autonomous racing is getting a lot of attention from the industry and academia. In this study, we propose a game-theoretic model predictive control (MPC) approach for head-to-head autonomous racing and data-driven model identification method. For the practical estimation of nonlinear model parameters, we adopted the hyperband algorithm, which is used for neural model training in machine learning. The proposed controller comprises three modules: 1) game-based opponents’ trajectory predictor, 2) high-level race strategy planner, and 3) MPC-based low-level controller. The game-based predictor was designed to predict the future trajectories of competitors. Based on the prediction results, the high-level race strategy planner plans several behaviors to respond to various race circumstances. Finally, the MPC-based controller computes the optimal control commands to follow the trajectories. The proposed approach was validated under various racing circumstances in an official simulator of the Indy Autonomous Challenge. The experimental results show that the proposed method can effectively overtake competitors, while driving through the track as quickly as possible without collisions.

4.jpg 0

Caption 1. Overview of the proposed approach for head-to-head autonomous racing.

 

5.jpg 0

Caption 2. Head-to-head simulation racing results

Incorporating Multi-Context Into the Traversability Map for Urban Autonomous Driving Using Deep Inverse Reinforcement Learning.

Title : Incorporating Multi-Context Into the Traversability Map for Urban Autonomous Driving Using Deep Inverse Reinforcement Learning.

Authors: Chanyoung Jung and David Hyunchul Shim

Journal: IEEE Robotics and Automation Letters (IEEE ICRA 2021 presentation)

Abstract : Autonomous driving in an urban environment with surrounding agents remains challenging. One of the key challenges is to accurately predict the traversability map that probabilistically represents future trajectories considering multiple contexts: inertial, environmental, and social. To address this, various approaches have been proposed; however, they mainly focus on considering the individual context. In addition, most studies utilize expensive prior information (such as HD maps) of the driving environment, which is not a scalable approach.

In this study, we extend a deep inverse reinforcement learning-based approach that can predict the traversability map while incorporating multiple contexts for autonomous driving in a dynamic environment. Instead of using expensive prior information of the driving scene, we propose a novel deep neural network to extract contextual cues from sensing data and effectively incorporate them in the output, i.e., the reward map. Based on the reward map, our method predicts the ego-centric traversability map that represents the probability distribution of the plausible and socially acceptable future trajectories.

The proposed method is qualitatively and quantitatively evaluated in real-world traffic scenarios with various baselines. The experimental results show that our method improves the prediction accuracy compared to other baseline methods and can predict future trajectories similar to those followed by a human driver.

1 0

Caption 1. Scheme for predicting traversability map that incorporates inertial, environmental, and social contexts using the deep inverse reinforcement learning (DIRL) framework.

 

2 0

Caption 2. Illustration of the proposed network architecture as a reward function approximator, and training procedures. The encoder module with two branches extracts the contextual cues from the input data. The convolutional long short-term memory (ConvLSTM)-based decoder module is added to incorporate them into the output reward map. With the inferred reward map, the difference between the demonstration and the expected state visitation frequency (SVF) is used as a training signal.

 

3 0

Caption 3. Visualization of traversability map prediction result over time. The first row shows the occupied grid map (OGM) and the traversability map overlaid on the demonstration (in red) in order. The second row shows the front view image with the neighboring vehicles marked in green bounding boxes.

Changho Hwang and Taehyun Kim, Ph.D. candidates at the School of EE, have developed a scalable resource management system framework for a high-performance GPU cluster for AI training acceleration.

Changho Hwang and Taehyun Kim, Ph.D. candidates at the School of EE (with advisor, prof. KyoungSoo Park, collaborating with prof. Jinwoo Shin, and Sunghyun Kim at MIT CSAIL), have developed the CoDDL system, a scalable GPU resource management system framework that accelerates deep learning model training. This system is developed under collaboration with the Electronics and Telecommunications Research Institute (ETRI).

 

The demand for GPU resources in training AI models has dramatically increased over time. Accordingly, many enterprises and cloud computing providers build their own GPU cluster for sharing the resources with AI model developers for training computations. As a GPU cluster is often highly costly to build out while it consumes a vast amount of electric power, it is critically important to efficiently manage the GPU resources across the entire cluster.

 

The CoDDL system automatically manages the training of multiple AI models to run fast and efficiently in a GPU cluster. When a developer submits a model for training, the system automatically accelerates the training by parallelizing its execution with multiple GPUs to utilize them simultaneously. Especially, CoDDL provides a high-performance job scheduler that optimizes the cluster-wide performance by elastically re-adjusting the GPU shares across multiple training jobs, even when some of the jobs are already running. CoDDL is designed to minimize the system overhead to re-adjust the GPU shares, which enables the job scheduler to make precise and efficient resource allocation decisions that substantially increase the overall cluster performance.

 

The AFS-P job scheduler that is presented along with the CoDDL system reduces the average job completion time by up to 3.11x using a public DNN training workload trace released by Microsoft. The results have been presented in USENIX NSDI 2021, one of the top networked computing systems conferences.

 

1 0

Figure: The overview of the CoDDL system

 

More details on the research are found at the links below.

 

Paper: https://www.usenix.org/system/files/nsdi21-hwang.pdf

Presentation video: https://www.usenix.org/conference/nsdi21/presentation/hwang

Incorporating Multi-Context Into the Traversability Map for Urban Autonomous Driving Using Deep Inverse Reinforcement Learning.

Journal: IEEE Robotics and Automation Letters (IEEE ICRA 2021 presentation)

 

Abstract : Autonomous driving in an urban environment with surrounding agents remains challenging. One of the key challenges is to accurately predict the traversability map that probabilistically represents future trajectories considering multiple contexts: inertial, environmental, and social. To address this, various approaches have been proposed; however, they mainly focus on considering the individual context. In addition, most studies utilize expensive prior information (such as HD maps) of the driving environment, which is not a scalable approach.

In this study, we extend a deep inverse reinforcement learning-based approach that can predict the traversability map while incorporating multiple contexts for autonomous driving in a dynamic environment. Instead of using expensive prior information of the driving scene, we propose a novel deep neural network to extract contextual cues from sensing data and effectively incorporate them in the output, i.e., the reward map. Based on the reward map, our method predicts the ego-centric traversability map that represents the probability distribution of the plausible and socially acceptable future trajectories.

The proposed method is qualitatively and quantitatively evaluated in real-world traffic scenarios with various baselines. The experimental results show that our method improves the prediction accuracy compared to other baseline methods and can predict future trajectories similar to those followed by a human driver.

 

심현철교수님

Caption 1. Scheme for predicting traversability map that incorporates inertial, environmental, and social contexts using the deep inverse reinforcement learning (DIRL) framework.

 

 

심현철교수님2

Caption 2. Illustration of the proposed network architecture as a reward function approximator, and training procedures. The encoder module with two branches extracts the contextual cues from the input data. The convolutional long short-term memory (ConvLSTM)-based decoder module is added to incorporate them into the output reward map. With the inferred reward map, the difference between the demonstration and the expected state visitation frequency (SVF) is used as a training signal.

심현철교수님3번

Caption 3. Visualization of traversability map prediction result over time. The first row shows the occupied grid map (OGM) and the traversability map overlaid on the demonstration (in red) in order. The second row shows the front view image with the neighboring vehicles marked in green bounding boxes.

Prof. Sung-Ju Lee's research team develops a context-aware emoji recommendation system

  As emojis are increasingly used in everyday online communication such as messaging, email, and social networks, various techniques have attempted to improve the user experience in communicating emotions and information through emojis. Emoji recommendation is one such example in which machine learning is applied to predict which emojis the user is about to select, based on the user’s current input message. Although emoji suggestion helps users identify and select the right emoji among a plethora of emojis, analyzing only a single sentence for this purpose has several limitations. First, various emotions, information, and contexts that emerge in a flow of conversation could be missed by simply looking at the most recent sentence. Second, it cannot suggest emojis for emoji-only messages, where the users use only emojis without any text. To overcome these issues, we present Reeboc (Recommending emojis based on context), which combines machine learning and k-means clustering to analyze the conversation of a chat, extract different emotions or topics of the conversation, and recommend emojis that represent various contexts to the user. To evaluate the effectiveness of our proposed emoji recommendation system and understand its effects on user experience, we performed a user study with 17 participants in eight groups in a realistic mobile chat environment with three different modes: (i) a default static layout without emoji recommendations, (ii) emoji recommendation based on the current single sentence, and (iii) our emoji recommendation model that considers the conversation. Participants spent the least amount of time in identifying and selecting the emojis of their choice with Reeboc(38% faster than the baseline). They also chose emojis that were more highly ranked with Reeboc than with current-sentence-only recommendations. Moreover, participants appreciated emoji recommendations for emoji-only messages, which contributed to 36.2% of all sentences containing emojis.

 

2 캡처

 

Reference : No More One Liners: Bringing Context into Emoji Recommendations, Joon-Gyum Kim, Taesik Gong, Bogoan Kim, Jaeyeon Park, Woojeong Kim, Evey Huang, Kyungsik Han, Juho Kim, Jeonggil Ko, and Sung-Ju Lee

ACM Transactions on Social Computing (ACM TSC) 2020.

 

 

Prof. Sung-Ju Lee's team develops machine learning-based mobile sensing systems that adapt to unknown conditions

Many applications utilize sensors on mobile devices and apply deep learning for diverse applications. However, they have rarely enjoyed mainstream adoption due to many different individual conditions users encounter. Individual conditions are characterized by users’ unique behaviors and different devices they carry, which collectively make sensor inputs different. It is impractical to train countless individual conditions beforehand and we thus argue meta-learning is a great approach in solving this problem. We present MetaSense that leverages “seen” conditions in training data to adapt to an “unseen” condition (i.e., the target user). Specifically, we design a meta-learning framework that learns “how to adapt” to the target via iterative training sessions of adaptation. MetaSense requires very few training examples from the target (e.g., one or two) and thus requires minimal user effort. In addition, we propose a similar condition detector (SCD) that identifies when the unseen condition has similar characteristics to seen conditions and leverages this hint to further improve the accuracy. Our evaluation with 10 different datasets shows that MetaSense improves the accuracy of state-of-the-art transfer learning and meta learning methods by 15% and 11%, respectively. Furthermore, our SCD achieves additional accuracy improvement (e.g., 15% for human activity recognition).  

1캡처

 

Youtube reference https://youtu.be/-6y0I1pd6XI

KAIST EE, Changho Hwang and Taehyun Kim, Ph.D. candidates at the School of EE, have developed a scalable resource management system framework for a high-performance GPU cluster for AI training acceleration.

 

 

Changho Hwang and Taehyun Kim, Ph.D. candidates at the School of EE (with advisor, prof. KyoungSoo Park, collaborating with prof. Jinwoo Shin, and Sunghyun Kim at MIT CSAIL), have developed the CoDDL system, a scalable GPU resource management system framework that accelerates deep learning model training. This system is developed under collaboration with the Electronics and Telecommunications Research Institute (ETRI).

 

The demand for GPU resources in training AI models has dramatically increased over time. Accordingly, many enterprises and cloud computing providers build their own GPU cluster for sharing the resources with AI model developers for training computations. As a GPU cluster is often highly costly to build out while it consumes a vast amount of electric power, it is critically important to efficiently manage the GPU resources across the entire cluster.

 

The CoDDL system automatically manages the training of multiple AI models to run fast and efficiently in a GPU cluster. When a developer submits a model for training, the system automatically accelerates the training by parallelizing its execution with multiple GPUs to utilize them simultaneously. Especially, CoDDL provides a high-performance job scheduler that optimizes the cluster-wide performance by elastically re-adjusting the GPU shares across multiple training jobs, even when some of the jobs are already running. CoDDL is designed to minimize the system overhead to re-adjust the GPU shares, which enables the job scheduler to make precise and efficient resource allocation decisions that substantially increase the overall cluster performance.

 

The AFS-P job scheduler that is presented along with the CoDDL system reduces the average job completion time by up to 3.11x using a public DNN training workload trace released by Microsoft. The results have been presented in USENIX NSDI 2021, one of the top networked computing systems conferences.

 

1

 

Figure: The overview of the CoDDL system

 

More details on the research are found at the links below.

 

Paper: https://www.usenix.org/system/files/nsdi21-hwang.pdf

Presentation video: https://www.usenix.org/conference/nsdi21/presentation/hwang