HashNWalk: Hash and Random Walk Based Anomaly Detection in Hyperedge Streams

Geon Lee, Minyoung Choe and Kijung Shin

IJCAI 2022: International Joint Conference on Artificial Intelligence 2022

Abstract: Sequences of group interactions, such as emails, online discussions, and co-authorships, are ubiquitous; and they are naturally represented as a stream of hyperedges. Despite their broad potential applications, anomaly detection in hypergraphs (i.e., sets of hyperedges) has received surprisingly little attention, compared to that in graphs. While it is tempting to reduce hypergraphs to graphs and apply existing graph-based methods, according to our experiments, taking higher-order structures of hypergraphs into consideration is worthwhile. We propose HashNWalk, an incremental algorithm that detects anomalies in a stream of hyperedges. It maintains and updates a constant-size summary of the structural and temporal information about the stream. Using the summary, which is the form of a proximity matrix, HashNWalk measures the anomalousness of each new hyperedge as it appears. HashNWalk is (a) Fast: it processes each hyperedge in near real-time and billions of hyperedges within a few hours, (b) Space Efficient: the size of the maintained summary is a predefined constant, (c) Effective: it successfully detects anomalous hyperedges in real-world hypergraphs.

 

7

Simple Epidemic Models with Segmentation Can Be Better than Complex Ones

Geon Lee, Se-eun Yoon, and Kijung Shin

PLOS ONE

Abstract: Given a sequence of epidemic events, can a single epidemic model capture its dynamics during the entire period? How should we divide the sequence into segments to better capture the dynamics? Throughout human history, infectious diseases (e.g., the Black Death and COVID-19) have been serious threats. Consequently, understanding and forecasting the evolving patterns of epidemic events are critical for prevention and decision making. To this end, epidemic models based on ordinary differential equations (ODEs), which effectively describe dynamic systems in many fields, have been employed. However, a single epidemic model is not enough to capture long-term dynamics of epidemic events especially when the dynamics heavily depend on external factors (e.g., lockdown and the capability to perform tests). In this work, we demonstrate that properly dividing the event sequence regarding COVID-19 (specifically, the numbers of active cases, recoveries, and deaths) into multiple segments and fitting a simple epidemic model to each segment leads to a better fit with fewer parameters than fitting a complex model to the entire sequence. Moreover, we propose a methodology for balancing the number of segments and the complexity of epidemic models, based on the Minimum Description Length principle. Our methodology is (a) Automatic: not requiring any user-defined parameters, (b) Model-agnostic: applicable to any ODE-based epidemic models, and (c) Effective: effectively describing and forecasting the spread of COVID-19 in 70 countries.

 

6

Effective Training Strategies for Deep-Learning-Based Precipitation Nowcasting and Estimation

Jihoon Ko*, Kyuhan Lee*, Hyunjin Hwang*, Seok-Geun Oh, Seok-Woo Son, and Kijung Shin

Computers and Geosciences

 

Abstract: Deep learning has been successfully applied to precipitation nowcasting. In this work, we propose a pre-training scheme and a new loss function for improving deep-learning-based nowcasting. First, we adapt U-Net, a widely- used deep-learning model, for the two problems of interest here: precipitation nowcasting and precipitation estimation from radar images. We formulate the former as a classification problem with three precipitation intervals and the latter as a regression problem. For these tasks, we propose to pre-train the model to predict radar images in the near future without requiring ground-truth precipitation, and we also propose the use of a new loss function for fine-tuning to mitigate the class imbalance problem. We demonstrate the effectiveness of our approach using radar images and precipitation datasets collected from South Korea over seven years. It is highlighted that our pre-training scheme and new loss function improve the critical success index (CSI) of nowcasting of heavy rainfall (at least 10 mm/hr) by up to 95.7% and 43.6%, respectively, at a 5-hr lead time. We also demonstrate that our approach reduces the precipitation estimation error by up to 10.7%, compared to the conventional approach, for light rainfall (between 1 and 10 mm/hr). Lastly, we report the sensitivity of our approach to different resolutions and a detailed analysis of four cases of heavy rainfall

 

5

Growth Patterns and Models of Real-world Hypergraphs

Jihoon Ko*, Yunbum Kook*, and Kijung Shin

Knowledge and Information Systems

Abstract: What kind of macroscopic structural and dynamical patterns can we observe in real-world hypergraphs? What can be underlying local dynamics on individuals, which ultimately lead to the observed patterns, beyond apparently random evolution? Graphs, which provide effective ways to represent pairwise interactions among entities, fail to represent group interactions (e.g., collaborations of three or more researchers, etc.). Regarded as a generalization of graphs, hypergraphs allowing for various sizes of edges prove fruitful in addressing this limitation. However, the increased complexity makes it challenging to understand hypergraphs as thoroughly as graphs. In this work, we closely examine seven structural and dynamical properties of real hypergraphs from six domains. To this end, we define new measures, extend notions of common graph properties to hypergraphs, and assess the significance of observed patterns by comparison with a null model and statistical tests. We also propose HYPERFF, a stochastic model for generating realistic hypergraphs. Its merits are three-fold: (a) Realistic: it successfully reproduces all seven patterns, in addition to five patterns established in previous studies, (b) Self-contained: unlike previously proposed models, it does not rely on oracles (i.e., unexplainable external information) at all, and it is parameterized by just two scalars, and (c) Emergent: it relies on simple and interpretable mechanisms on individual entities, which do not trivially enforce but surprisingly lead to macroscopic properties. While HYPERFF is mathematically intractable, we provide theoretical justifications and mathematical analysis based on its simplified version.4

Two-stage Training of Graph Neural Networks for Graph Classification

Manh Tuan Do, Noseong Park, Kijung Shin

Neural Processing Letters

Graph neural networks (GNNs) have received massive attention in the field of machine learning on graphs. Inspired by the success of neural networks, a line of research has been conducted to train GNNs to deal with various tasks, such as node classification, graph classification, and link prediction. In this work, our task of interest is graph classification. Several GNN models have been proposed and shown great accuracy in this task. However, the question is whether usual training methods fully realize the capacity of the GNN models. In this work, we propose a two-stage training framework based on triplet loss. In the first stage, GNN is trained to map each graph to a Euclidean-space vector so that graphs of the same class are close while those of different classes are mapped far apart. Once graphs are well-separated based on labels, a classifier is trained to distinguish between different classes. This method is generic in the sense that it is compatible with any GNN model. By adapting five GNN models to our method, we demonstrate the consistent improvement in accuracy and utilization of each GNN’s allocated capacity over the original training method of each model up to 5.4% points in 12 datasets.

 

3

Set2Box: Similarity Preserving Representation Learning for Sets

Geon Lee, Chanyoung Park, and Kijung Shin

ICDM 2022: IEEE International Conference on Data Mining

Abstract: Sets have been used for modeling various types of objects (e.g., a document as the set of keywords in it and a customer as the set of the items that she has purchased). Measuring similarity (e.g., Jaccard Index) between sets has been a key building block of a wide range of applications, including, plagiarism detection, recommendation, and graph compression. However, as sets have grown in numbers and sizes, the computational cost and storage required for set similarity computation have become substantial, and this has led to the development of hashing and sketching based solutions. In this work, we propose Set2Box, a learning-based approach for compressed representations of sets from which various similarity measures can be estimated accurately in constant time. The key idea is to represent sets as boxes to precisely capture overlaps of sets. Additionally, based on the proposed box quantization scheme, we design Set2Box+, which yields more concise but more accurate box representations of sets. Through extensive experiments on 8 real-world datasets, we show that, compared to baseline approaches, Set2Box+ is (a) Accurate: achieving up to 40.8X smaller estimation error while requiring 60% fewer bits to encode sets, (b) Concise: yielding up to 96.8X more concise representations with similar estimation error, and (c) Versatile: enabling the estimation of four set-similarity measures from a single representation of each set.

 

2

Reciprocity in Directed Hypergraphs: Measures, Findings, and Generators

Sunwoo Kim, Minyoung Choe, Jaemin Yoo, and Kijung Shin

ICDM 2022: IEEE International Conference on Data Mining

Abstract: Group interactions are prevalent in a variety of areas. Many of them, including email exchanges, chemical reactions, and bitcoin transactions, are directional, and thus they are naturally modeled as directed hypergraphs, where each hyperarc consists of the set of source nodes and the set of destination nodes. For directed graphs, which are a special case of directed hypergraphs, reciprocity has played a key role as a fundamental graph statistic in revealing organizing principles of graphs and in solving graph learning tasks. For general directed hypergraphs, however, even no systematic measure of reciprocity has been developed. In this work, we investigate the reciprocity of 11 real-world hypergraphs. To this end, we first introduce eight axioms that any reasonable measure of reciprocity should satisfy. Second, we propose HyperRec, a principled measure of hypergraph reciprocity that satisfies all the axioms. Third, we develop Ferret, a fast and exact algorithm for computing the measure, whose search space is up to  smaller than that of naive computation. Fourth, using them, we examine 11 real-world hypergraphs and discover patterns that distinguish them from random hypergraphs. Lastly, we propose ReDi, an intuitive generative model for directed hypergraphs exhibiting the patterns.

 

1

EE학부 심현철교수 연구팀, 제5회 드론봇 임무형 챌린지 건물 내부 정찰 종목 1위

42.심현철

[심현철교수 사진]
 
우리학부 심현철 교수 연구팀 (김보성 박사과정, 박재용 석사과정)이 개발한 자율 비행 드론이 8월 31일에 개최된 제 5회 Army TIGER 드론봇 임무형 챌린지 대회의 과업 4에 해당하는 건물 내부 정찰 종목에서 1위인 우수상과 상금 1000만원을 차지하였다.
시상식은 10월 4일 대전에 위치한 육군 교육 사령부에서 진행되었다.
 
심현철 교수 연구팀은 자체 개발한 3차원 라이다 센서 기반 정밀측위 (SLAM) 알고리즘과 3차원 장애물 회피 경로 생성 알고리즘, 미확인 지역 탐사 알고리즘을 사용하여 건물 내부를 탐사하고 숨겨져있던 특정 객체들을 탐지, 실시간 관제센터 전송 등 모든 미션을 완벽하게 수행하였다.

육군 본부에서 주최하는 이번 대회는 건물 외부 주차장에서 출발하여 2층 창문으로 진입 후 여러 방들을 탐사하며 숨겨져 있는 특정 객체들을 찾아내고 그 종류와 위치를 관제 센터로 실시간 전송, 홈으로 복귀 등의 임무가 주어졌다.

본선 진출 8개팀 중 심현철교수 연구팀의 드론만이 이륙 후 복귀까지 완벽하게 자율비행을 수행하고 숨겨져 있던 모든 객체를 AI로 탐지 후 결과를 실시간으로 전송하는 우수한 기량을 선보였다.

심현철교수 연구팀의 실내 자율 비행 기술 연구는 미래 전장상황, 재난 상황에서 사용될 실내 정찰 드론의 핵심기술로 이번 대회를 통해 KAIST의 자율 비행 드론 기술 역량을 다시한번 알리는 계기가 되었다.

 
동영상 자료 : https://youtu.be/SXe_FJpxv94
 
IMG 1267
[김보성, 박재용, 심현철교수, 왼쪽부터] 
 
 

KAIST 이성주, 신진우 교수 연구팀, 스스로 환경 적응하는 인공지능 기술 개발, NeurIPS 2022 학회 발표 예정

캡처 2

[이성주 교수, 신진우 교수, 박사과정 공태식, 박사과정 정종헌, 석사과정 김예원, 학사과정 김태원, 왼쪽부터]
 
전기및전자공학부 이성주 교수와 AI대학원 신진우 교수 연구팀이 공동연구를 통해 스스로 환경변화에 적응하는 테스트타임 적응 (Test-Time Adaptation) 인공지능 기술을 개발하였다. 연구팀이 제안한 알고리즘은 기존의 최고 성능 알고리즘보다 평균 11% 향상된 정확도를 보였다.
 
본 연구는 “NOTE: Robust Continual Test-time Adaptation Against Temporal Correlation”라는 제목으로 인공지능 분야 최고권위 국제학술대회 ‘신경정보처리시스템학회 (NeurIPS) 2022’에서12월 발표될 예정이다. 공태식 박사과정이 제1저자로 연구를 이끌었고, 정종헌 박사과정, 김태원 학사과정, 김예원 석사과정이 공동 저자로 기여하였다.  
 
이성주 교수와 신진우 교수는 ”테스트타임 도메인 적응은 인공지능이 스스로 환경 변화에 적응하여 성능을 향상시키는 기술로, 활용도가 무궁무진하다. 이번에 발표될 NOTE 기술은 실제 데이터 분포에서 성능향상을 보인 최초의 기술이고 자율주행, 인공지능 의료, 모바일 헬스케어 등 다양한 분야에 적용이 가능할 것으로 기대된다.” 라고 밝혔다.
 
 
dataURItoBlob 2
[연구성과도 : 본 연구의 테스트타임 도메인적응 기술의 개요] 
 
 
이 연구는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원 (No. NRF-2020R1A2C1004062)과 방위사업청과 국방과학연구소의 지원(UD190031RD)으로 한국과학기술원 미래 국방 인공지능 특화연구센터에서 수행된 연구이다.
 

KAIST EE학부 유민수 교수팀, 세계 최초 개인정보 보호 적용된 인공지능 반도체 개발

KAIST 유민수 교수팀, 세계 최초 개인정보 보호 적용된 인공지능 반도체 개발,

구글 TPUv3 대비 차등 프라이버시 학습 과정을 3.6배 빠르게

 

캡처

[유민수교수 사진]
 
전기및전자공학부 유민수 교수 연구팀이 `차등 프라이버시 기술이 적용된 인공지능(AI) 어플리케이션(Differentially private machine learning)’의 성능을 비약적으로 높이는 인공지능 반도체를 개발했다.
유민수 교수 연구팀은 차등 프라이버시 기술의 성능 병목 구간을 분석해 해당 기술이 적용된 어플리케이션의 성능을 크게 시킬 수 있는 `차등 프라이버시 머신러닝을 위한 인공지능(AI) 반도체 칩’을 개발한 것이다.
유 교수팀이 개발한 인공지능 반도체는 외적 기반 연산기와 덧셈기 트리 기반의 후처리 연산기 등으로 구성돼 있으며, 현재 가장 널리 사용되는 인공지능 프로세서인 구글 TPUv3 대비 차등 프라이버시 인공지능 학습 과정을 3.6 배 빠르게 실행시킬 수 있고,
엔비디아의 최신 GPU A100 대비 10배 적은 자원으로 대등한 성능을 낼 수 있다.
 
4명 캡처
 
[(왼쪽부터) 공동 제1 저자 박범식, 황랑기 연구원, 공동 저자 윤동호, 최윤혁 연구원]
 
KAIST 전기및전자공학부 박범식, 황랑기 연구원이 공동 제1 저자로, 윤동호, 최윤혁 연구원이 공동 저자로 참여한 이번 연구는 현지시간 오는 10월 1일부터 5일까지 미국 시카고에서 열리는 컴퓨터 구조 분야 최우수 국제 학술대회인 `55th IEEE/ACM International Symposium on Microarchitecture(MICRO 2022)’에서 발표(논문명 : DiVa: An Accelerator for Differentially Private Machine Learning)될 예정이다.
 
캡처
 
[연구성과도 : ‘개인정보 보호 인공지능 AI 반도체 가속기’의 구조 모식도]
 
유민수교수의 당회 성과는 인공지능신문을 비롯한 주요언론에 소개 되었다.
 
관련 링크 :
인공지능신문 : http://www.aitimes.com/news/articleView.html?idxno=146435
연합뉴스 : https://www.yna.co.kr/view/AKR20201116072400063?input=1195m
파이낸셜 뉴스 : https://www.fnnews.com/news/202208212349474072
동아사이언스 : https://www.dongascience.com/news.php?idx=55893
Industry News : http://www.industrynews.co.kr/news/articleView.html?idxno=46829