Cooperative Scheduling Schemes for Explainable DNN Acceleration in Satellite Image Analysis and Retraining

저자:

김우중, 윤찬현(지도교수)

 

Abstract:

딥 러닝 기반 위성 이미지 분석 및 이를 위한 학습 시스템은 지상 물체의 정교한 분석 능력을 향상시키는 방향으로 새로운 기술들이 개발되고 있다. 이러한 기술들 기반에서 위성 이미지 분석 및 학습 과정에 설명 가능한 DNN 모델을 적용하기 위해 새로운 가속 스케줄링 메커니즘을 제안하고자 한다. 특히, 기존의 DNN 가속 기법들로는 설명 가능한 DNN 모델의 연산 복잡성과 위성 이미지 분석 및 재학습 비용으로 인해 연산 처리 및 서비스 측면에서의 성능 저하를 초래한다. 본 논문에서는 이러한 성능 저하를 극복하기 위해 위성 이미지 분석 및 재학습 프로세스에서 설명 가능한 DNN 가속을 위한 협력 스케줄링 체계를 제안한다. 이를 위해 설명 가능한 DNN 가속화에 필요한 최적화된 처리 시간과 비용을 도출하기 위한 지연 시간 및 에너지 비용 모델링을 정의한다. 이를 토대로, FPGA-GPU 가속 시스템에서 설명 가능한 DNN의 계층 수준 관리를 통한 스케줄링 기법을 제안하며, 해당 기법을 통해 연산 처리 비용을 최소화할 수 있음을 확인하였다. 또한, 재학습 과정을 가속화하는 데 있어 신뢰 임계값과 준지도 학습 기반 데이터 병렬화 체계를 적용한 적응형 Unlabeled 데이터 선택 기법을 제안한다. 실험 성능 평과 결과, 제안된 기법이 지연 시간 제약을 보장하는 동시에 기존 DNN 가속 시스템의 에너지 비용을 최대 40%까지 절감한다는 것을 확인하였다.

 

1

 

논문 정보:

Kim, Woo-Joong, and Chan-Hyun Youn. “Cooperative Scheduling Schemes for Explainable DNN Acceleration in Satellite Image Analysis and Retraining.” IEEE Transactions on Parallel and Distributed Systems 33.7, 2021, p.1605-1618.

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

In cloud machine learning (ML) inference systems, providing low latency to end-users is of utmost importance. However, maximizing server utilization and system throughput is also crucial for ML service providers as it helps lower the total-cost-of-ownership. GPUs have oftentimes been criticized for ML inference usages as its massive compute and memory throughput is hard to be fully utilized under lowbatch inference scenarios. To address such limitation, NVIDIA’s recently announced Ampere GPU architecture provides features to “reconfigure” one large, monolithic GPU into multiple smaller “GPU partitions”. Such feature provides cloud ML service providers the ability to utilize the reconfigurable GPU not only for large-batch training but also for small-batch inference with the potential to achieve high resource utilization. In this paper, we study this emerging GPU architecture with reconfigurability to develop a high-performance multi-GPU ML inference server. Our first proposition is a sophisticated partitioning algorithm for reconfigurable GPUs that systematically determines a heterogeneous set of multi-granular GPU partitions, best suited for the inference server’s deployment. Furthermore, we co-design an elastic scheduling algorithm tailored for our heterogeneously partitioned GPU server which effectively balances low latency and high GPU utilization.

 

3

Yunseong Kim, Yujeong Choi, and Minsoo Rhu, “PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers,” The 59th ACM/ESDA/IEEE Design Automation Conference (DAC), San Francisco, CA, Jul. 2022

Characterization and Analysis of Deep Learning for 3D Point Cloud Analytics

A point cloud is a collection of points, which is measured by time-offlight information from LiDAR sensors, forming geometrical representations of the surrounding environment. With the algorithmic success of deep learning networks, point clouds are not only used in traditional application domains like localization or HD map construction but also in a variety of avenues including object classification, 3D object detection, or semantic segmentation. While point cloud analytics are gaining significant traction in both academia and industry, the computer architecture community has only recently begun exploring this important problem space. In this paper, we conduct a detailed, end-to-end characterization on deep learning based point cloud analytics workload, root-causing the frontend data preparation stage as a significant performance limiter. Through our findings, we discuss possible future directions to motivate continued research in this emerging application domain.

 

2 1

Bongjoon Hyun, Jiwon Lee, and Minsoo Rhu, “Characterization and Analysis of Deep Learning for 3D Point Cloud Analytics”, IEEE Computer Architecture Letters, Jul. 2021

Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training

Graph neural networks (GNNs) can extract features by learning both the representation of each objects (i.e., graph nodes) as well as the relationship across different objects (i.e., the edges that connect nodes), achieving state-of-the-art performance on a wide range of graph-based tasks. Despite its strengths, utilizing these algorithms in a production environment faces several key challenges as the number of graph nodes and edges amount to several billions to hundreds of billions scale, requiring substantial storage space for training. Unfortunately, existing ML frameworks based on the in-memory processing model significantly hamper the productivity of algorithm developers as it mandates the overall working set to fit within DRAM capacity constraints. In this work, we first study state-of-the-art, largescale GNN training algorithms. We then conduct a detailed characterization on utilizing capacity-optimized non-volatile memory solutions for storing memoryhungry GNN data, exploring the feasibility of SSDs for large-scale GNN training

1

Yunjae Lee, Youngeun Kwon, and Minsoo Rhu, “Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training”, IEEE Computer Architecture Letters, Jul. 2021

Jung, Chanyoung, and David Hyunchul Shim. “Incorporating Multi-Context Into the Traversability Map for Urban Autonomous Driving Using Deep Inverse Reinforcement Learning.” IEEE Robotics and Automation Letters 6.2 (2021): 1662-1669.

Autonomous driving in an urban environment with surrounding agents remains challenging. One of the key challenges is to accurately predict the traversability map that probabilistically represents future trajectories considering multiple contexts: inertial, environmental, and social. To address this, various approaches have been proposed; however, they mainly focus on considering the individual context. In addition, most studies utilize expensive prior information (such as HD maps) of the driving environment, which is not a scalable approach. In this study, we extend a deep inverse reinforcement learning-based approach that can predict the traversability map while incorporating multiple contexts for autonomous driving in a dynamic environment. Instead of using expensive prior information of the driving scene, we propose a novel deep neural network to extract contextual cues from sensing data and effectively incorporate them in the output, i.e., the reward map. Based on the reward map, our method predicts the ego-centric traversability map that represents the probability distribution of the plausible and socially acceptable future trajectories. The proposed method is qualitatively and quantitatively evaluated in real-world traffic scenarios with various baselines. The experimental results show that our method improves the prediction accuracy compared to other baseline methods and can predict future trajectories similar to those followed by a human driver.

 

3

Daegyu Lee, Gyuree Kang, Boseong Kim, and D. Hyunchul Shim. “Assistive Delivery Robot Application for Real-World Postal Services,” in IEEE Access, vol. 9, pp. 141981-141998, 2021, doi: 10.1109/ACCESS.2021.3120618.

This paper introduces a robot system that is designed to assist postal workers by carrying heavy packages in a complex urban environment such as apartment complex. Since most of such areas do not have access to reliable GPS signal reception, we propose a 3-D point cloud map based matching localization with robust position estimation along with a perception-based visual servoing algorithm. The delivery robot is also designed to communicate with the control center so that the operator can monitor the current and past situation using onboard videos, obstacle information, and emergency stop logs. Also, the postal worker can choose between autonomous driving mode and follow-me mode using his/her mobile device. To validate the performance of the proposed robot system, we collaborated with full-time postal workers performing their actual delivery services for more than four weeks to collect the field operation data. Using this data, we were able to confirm that the proposed map-matching algorithm performs well in various environments where the robot could navigate with reliable position accuracy and obstacle avoidance capability.

 

2 1

Hyunki Seong, Chanyoung Jung, Seungwook Lee, and David Hyunchul Shim. “Learning to Drive at Unsignalized Intersections Using Attention-Based Deep Reinforcement Learning” The 24rd IEEE International Conference on Intelligent Transportation Systems (ITSC 2021)

Driving at an unsignalized intersection is a complex traffic scenario that requires both traffic safety and efficiency. At the unsignalized intersection, the driving policy does not simply maintain a safe distance for all vehicles. Instead, it pays more attention to vehicles that potentially have conflicts with the ego vehicle, while guessing the intentions of other vehicles. In this paper, we propose an attention-based driving policy for handling unprotected intersections using deep reinforcement learning. By leveraging attention, our policy learns to focus on more spatially and temporally important features within its egocentric observation. This selective attention enables our policy to make safe and efficient driving decisions in various congested intersection environments. Our experiments show that the proposed policy not only outperforms other baseline policies in various intersection scenarios and traffic density conditions but also has interpretability in its decision process. Furthermore, we verify our policy model’s feasibility in real-world deployment by transferring the trained model to a full-scale vehicle system. Our model successfully performs various intersection scenarios, even with noisy sensory data and delayed responses. Our approach reveals more opportunities for implementing generic and interpretable policy models in realworld autonomous driving.

 

1

HashNWalk: Hash and Random Walk Based Anomaly Detection in Hyperedge Streams

Geon Lee, Minyoung Choe and Kijung Shin

IJCAI 2022: International Joint Conference on Artificial Intelligence 2022

Abstract: Sequences of group interactions, such as emails, online discussions, and co-authorships, are ubiquitous; and they are naturally represented as a stream of hyperedges. Despite their broad potential applications, anomaly detection in hypergraphs (i.e., sets of hyperedges) has received surprisingly little attention, compared to that in graphs. While it is tempting to reduce hypergraphs to graphs and apply existing graph-based methods, according to our experiments, taking higher-order structures of hypergraphs into consideration is worthwhile. We propose HashNWalk, an incremental algorithm that detects anomalies in a stream of hyperedges. It maintains and updates a constant-size summary of the structural and temporal information about the stream. Using the summary, which is the form of a proximity matrix, HashNWalk measures the anomalousness of each new hyperedge as it appears. HashNWalk is (a) Fast: it processes each hyperedge in near real-time and billions of hyperedges within a few hours, (b) Space Efficient: the size of the maintained summary is a predefined constant, (c) Effective: it successfully detects anomalous hyperedges in real-world hypergraphs.

 

9

AHP: Learning to Negative Sample for Hyperedge Prediction

Hyunjin Hwang*, Seungwoo Lee*, Chanyoung Park, and Kijung Shin

SIGIR 2022: International ACM SIGIR Conference on Research and Development in Information Retrieval 2022

Abstract: Hypergraphs (i.e., sets of hyperedges) naturally represent group relations (e.g., researchers co-authoring a paper and ingredients used together in a recipe), each of which corresponds to a hyperedge (i.e., a subset of nodes). Predicting future or missing hyperedges bears significant implications for many applications (e.g., collaboration and recipe recommendation). What makes hyperedge prediction particularly challenging is the vast number of non-hyperedge subsets, which grows exponentially with the number of nodes. Since it is prohibitive to use all of them as negative examples for model training, it is inevitable to sample a very small portion of them, and to this end, heuristic sampling schemes have been employed. However, trained models suffer from poor generalization capability for examples of different natures. In this paper, we propose AHP, an adversarial training-based hyperedge-prediction method. It learns to sample negative examples without relying on any heuristic schemes. Using six real hypergraphs, we show that AHP generalizes better to negative examples of various natures. It yields up to 28.2% higher AUROC than the best existing methods and often even outperforms its variants with sampling schemes tailored to test sets.

 

8

Are Edge Weights in Summary Graphs Useful? – A Comparative Study

Shinhwan Kang, Kyuhan Lee, and Kijung Shin

PAKDD 2022:  Pacific-Asia Conference on Knowledge Discovery and Data Mining 2022

Abstract: Which one is better between two representative graph summarization models with and without edge weights? From web graphs to online social networks, large graphs are everywhere. Graph summarization, which is an effective graph compression technique, aims to find a compact summary graph that accurately represents a given large graph. Two versions of the problem, where one allows edge weights in summary graphs and the other does not, have been studied in parallel without direct comparison between their underlying representation models. In this work, we conduct a systematic comparison by extending three search algorithms to both models and evaluating their outputs on eight datasets in five aspects: (a) reconstruction error, (b) error in node importance, (c) error in node proximity, (d) the size of reconstructed graphs, and (e) compression ratios. Surprisingly, using unweighted summary graphs leads to outputs significantly better in all the aspects than using weighted ones, and this finding is supported theoretically. Notably, we show that a state-of-the-art algorithm can be improved substantially (specifically, 8.2X, 7.8X, and 5.9X in terms of (a), (b), and (c), respectively, when (e) is fixed) based on the observation.

 

7