Binary Classification with XOR Queries: Fundamental Limits and An Efficient Algorithm

Author: Daesung Kim and Hye Won Chung

Journal: IEEE Transactions on Information Theory

논문에서는 알려지지 않은 이진 레이블을 쿼리를 통해 복구하는 문제를 연구한다. 이진 레이블의 분류는 통신 시스템, 크라우드소싱, 추천 시스템, 능동 학습 다양한 분야에서 활용되는 중요한 문제이다. 최대한 적은 수의 쿼리를 통해 레이블을 복구하기 위하여 어떤 레이블의 묶음에 대한 질문을 하는 방식을 제안하였다. 구체적으로는 m개의 이진 레이블 특정 d개를 선택하여 이들의 XOR값을 묻는 방식을 사용하였다. 여기서 d쿼리 난도 정의되며, 쿼리마다 변할 있도록 설정되었다. 또한, 쿼리의 정답률이 대답하는 사람과 쿼리 난도 가지 모두에 의해 결정되는 일반적인 모델을 가정하였다. 이러한 설정에서 모든 레이블을 복구하기 위해 필요한 쿼리의 개수를 이론적으로 계산하였다. 여기에 더하여 대답하는 사람들의 정확도를 모르는 상황에서도 앞에서 계산한 이상적인 쿼리 개수만을 사용하여 모든 레이블을 복구해내는 알고리즘을 제안하였다.

1 0

Figure 1. Frame error rate, and bit error rate of the proposed algorithm vs. number of queries for four different values of m.

Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update, NeurIPS 2019

Suyoung Lee, Sungik Choi and Sae-Young Chung

논문에서는 신속한 보상값 전달을 통해 효율적인 심층 강화학습을 진행하는 에피소드 후향 업데이트 알고리즘을 제안한다. 균등분포를 사용해 리플레이 메모리에서 스텝 단위로 샘플을 취하는 통상적인 강화학습 방법과 다르게 논문에서는 에피소드단위로 샘플을 취한 상태값을 시간 역순으로 전달한다. 논문에서 제안한 알고리즘은 샘플의 수가 적고 보상값이 희박한 환경에서도 신속한 보상값 전파가 가능하다. 제안한 알고리즘을 2D MNIST maze 환경과 Atari 2600 환경에서 비교하여 기존 알고리즘에 비해 현저한 성능 개선이 이루어짐을 보였다.

4

Robust training with ensemble consensus, ICLR 2020

Jisoo Lee and Sae-Young Chung

잘못 라벨링이 샘플이 포함된 데이터셋을 가지고 경사하강법으로 학습된 심층 신경망은 학습 초기에는 데이터를 일반화하고 학습 후기에는 데이터를 암기하는 양상을 보인다. 논문에서는 이러한 심층 신경망의 학습 양상을 앙상블 신경망 패턴 유사도를 통해 분석하였다. 결과 올바르게 라벨링이 샘플 일부는 학습 초기의 앙상블 모든 신경망에서 일관되게 작은 손실값을 발생시키고, 잘못 라벨링이 샘플은 그렇지 않다는 것을 발견하였다. 잘못 라벨링이 샘플은 주어진 데이터를 일반화하는 악영향을 미치므로, 동시에 학습되는 앙상블 모든 신경망에서 일관되게 작은 손실값을 일으키는 샘플로 신경망을 훈련하는 학습법을 제안하였다. 제안한 학습법은 잘못 라벨링이 데이터가 학습되는 것을 방지함으로써 강인한 학습을 가능하게 하였다.

3

Novelty Detection Via Blurring, ICLR 2020

Sungik Choi and Sae-Young Chung

논문에서는 다양한 데이터 도메인에 대한 이상치 탐지 기술들을 다룬다. 논문은 서로 다른 데이터 도메인을 다루는 개의 연구로 구성된다. 번째 부분에서는 이미지 데이터셋에 대한 이상치 탐지 기술을 다룬다. 특히, 해당 논문은 이미지 데이터 정보 이외의 정보가 주어져 있지 않은 상황을 가정한다. 기존 연구들에 차별화되는 논문의 내용은 다음과 같다. 번째로 논문에서는 기존 심층 이상치 탐지 기술들이 이상치 데이터에 낮은 불확실성을 보이는 현상을 진단한다. 분석을 위해서 우리는 데이터의 복잡도를 측정하는 실효 랭크 메트릭을 제안한다. 번째로, 분석을 통하여, 우리는 블러링된 데이터를 직접적으로 구별하는 SVD-RND라는 모델을 제안한다. 우리는 SVD-RND 기존 심층 이상치 탐지 기술들보다 좋은 이상치 탐지 성능을 보이는 것을 실험적으로 증명한다. 마지막으로, 우리는 SVD-RND 검증 이상치 데이터가 없는 등의 다양한 시나리오에서 적용될 있음을 보인다

2

Unsupervised Embedding Adaptation via Early-Stage Feature Reconstruction for Few-shot Classification, ICML 2021

Dong Hoon Lee and Sae-Young Chung

연구는 소수 분류를 위한 비지도 임베딩 적응 기법을 다룹니다.

심층학습이 일반화를 암기에 비해 빠르게 한다는 점에 기반하여, 연구는 특징복원학습과 차원 기반 초기종료 기법을 결합한 비지도 임베딩 적응 기법인 ESFR(Early-Stage Feature Reconstruction) 제안합니다.

일반적인 소수 분류 벤치마크에서 ESFR 기존 소수 분류 기법들과 함께 사용되어 일관된 성능 향상을 보였습니다.

또한, Transductive 기법과 함께 사용되었을 mini-ImageNet, tiered-ImageNet CUB 데이터 세트에서 새로운 state-of-the-art 성능을 달성하였으며, 특히 1-샷에서 이전 최고 성능 방법보다 1.2~2.0% 높은 성능을 보였습니다.

1 0

Communication in Multi-Agent Reinforcement Learning: Intention Sharing

 Title: Communication in Multi-Agent Reinforcement Learning: Intention Sharing

Authors: Woojun Kim, Jongeui Park and Youngchul Sung

To be presented at International Conference on Learning Representation (ICLR) 2021

 

Communication is one of the core components for learning coordinated behavior in multi-agent systems. In this work, W. Kim et al. proposed a new communication scheme named Intention Sharing (IS) for multi-agent reinforcement learning in order to enhance the coordination among agents. In the proposed scheme, each agent generates an imagined trajectory by modeling the environment dynamics and other agents’ actions. The imagined trajectory is a simulated future trajectory of each agent based on the learned model of the environment dynamics and other agents and represents each agent’s future action plan. Each agent compresses this imagined trajectory capturing its future action plan to generate its intention message for communication by applying an attention mechanism to learn the relative importance of the components in the imagined trajectory based on the received message from other agents. Numeral results show that the proposed IS scheme significantly outperforms other communication schemes in multi-agent reinforcement learning.

 

Figure%201 %EC%84%B1%EC%98%81%EC%B2%A0

Fig. 1. The overall structure of the proposed IS scheme from the perspective of Agent i

Figure%202 %EC%84%B1%EC%98%81%EC%B2%A0
Fig. 2 Performance: : MADDPG (blue), DIAL (green), TarMAC (red), Comm-OA (purple), ATOC (cyan) and the proposed IS method (black).  (PP: Predator-and-Prey, CN: Cooperative Navigation, TJ: Traffic Junction)

Figure%203 %EC%84%B1%EC%98%81%EC%B2%A0

Fig. 3. Imagined trajectories and attention weights of each agent on PP (N=3): 1st row – agent1 (red), 2nd row – agent2 (green), and 3rd row – agent3 (blue). Black squares, circle inside the times icon, and other circles denote the prey, current position, and estimated future positions, respectively. The brightness of the circle is proportional to the attention weight.

Population-Guided Parallel Policy Search for Reinforcement Learning

Title: Population-Guided Parallel Policy Search for Reinforcement Learning

Authors: Whiyoung Jung, Giseung Park and Youngchul Sung

Presented at International Conference on Learning Representation (ICLR) 2020

In this work, a new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL). In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information. The key point is that the information of the best policy is fused in a soft manner by constructing an augmented loss function for policy update to enlarge the overall search region by the multiple learners. The guidance by the previous best policy and the enlarged range enable faster and better policy search. Monotone improvement of the expected cumulative return by the proposed scheme is proved theoretically. Working algorithms are constructed by applying the proposed scheme to the twin delayed deep deterministic (TD3) policy gradient algorithm. Numerical results show that the constructed algorithm outperforms most of the current state-of-the-art RL algorithms, and the gain is significant in the case of sparse reward environment.

 

교수 연구실 연구

Fig. 1. The overall proposed structure (P3S):

교수 연구실 figure 2

Fig. 2. The conceptual search coverage in the policy space by parallel learners:

교수 연구실 figure 3

Fig. 3. Performance of different parallel learning methods on MuJoCo environments (up), on delayed MuJoCo environments (down)

교수 연구실 figure 4

Fig. 4. Benefits of P3S (a) Performance and beta (1 seed) with d_min = 0.05, (b) Distance measures with d_min = 0.05, and (c) Comparison with different d_min = 0.02, 0.05.

Machine-Learning-Based Read Reference Voltage Estimation for NAND Flash Memory Systems Without Knowledge of Retention Time

Title: Machine-Learning-Based Read Reference Voltage Estimation for NAND Flash Memory Systems Without Knowledge of Retention Time

Authors: Hyemin Choe, Jeongju Jee, Seung-Chan Lim, Sung Min Joe, Il Han Park, and Hyuncheol Park

Journal: IEEE Access (published: September 2020)

To achieve a low error rate of NAND flash memory, reliable reference voltages should be updated based on the accurate knowledge of program/erase (P/E) cycles and retention time, because those severely distort the threshold voltage distribution of memory cell. Due to the sensitivity to the temperature, however, a flash memory controller is unable to acquire the exact knowledge of retention time, meaning that it is challenging to estimate accurate read reference voltages in practice.

In addition, it is difficult to characterize the relation between the channel impairments and the optimal read reference voltages in general. Therefore, we propose a machine-learning-based read reference voltage estimation framework for NAND flash memory without the knowledge of retention time.

In the off-line training phase, to define the input features of the proposed framework, we derive alternative information of unknown retention time, which are obtained by sensing and decoding the data in one wordline. For the on-line estimation phase, we propose three estimation schemes: 1) k-nearest neighbors (k-NN)- based, 2) nearest-centroid (NC)-based, and 3) polynomial regression (PR)-based estimations. By applying these estimation schemes, an unlabeled input feature is simply mapped into a pre-assigned class label, namely label read reference voltages, via the on-line estimation phase.

Based on the simulation and analysis, we have verified that the proposed framework can achieve high-reliable and low-latency performances in NAND flash memory systems without the knowledge of retention time.

교수 연구실 연구

Figure 1. Flow charts of read reference voltage estimation schemes. (a) k-NN based (b) NC-based (c) PR-based estimations.

Downlink Extrapolation for FDD Multiple Antenna Systems Through Neural Network Using Extracted Uplink Path Gains

Title: Downlink Extrapolation for FDD Multiple Antenna Systems Through Neural Network Using Extracted Uplink Path Gains

Authors: Hyuckjin Choi, Junil Choi

Journal: IEEE Access (published: April 2020)

 

In frequency division duplexing (FDD) communication system, base stations (BSs) should have the downlink (DL) channel state information (CSI) that cannot be obtained at BSs. The conventional FDD communication systems deploy the DL training and feedback where the mobile station (MS) estimates the DL CSI and delivers the CSI to the BS. It becomes infeasible as the number of antennas at the BS increases in a high mobility scenario. When a MS moves at high speed, the channel changes rapidly, which results in a short coherence time.

Without the uplink (UL) feedback, the BS might obtain the DL CSI, which is called the DL extrapolation. Even in the FDD communication system, the UL and DL channel have the reciprocity, which has been proved through previous related works. Using the relation between the UL and DL channel, the UL CSI can be mapped to the DL CSI through the neural network (NN). Prior studies have developed the DL extrapolation algorithm with full dimensional UL and DL channels. However, the complexity of NN training becomes severe as the channel dimension grows.

We proposed the algorithm to simplify the NN input and output for the DL extrapolation. It has been proved through many measurements that the UL and DL channels share same channel path delays and directions in FDD communication systems. The proposed method first extracts the common channel parameters from the UL and DL channel, then trains the NN with the frequency-dependent path gains such that the size of input and output of the NN decreases. The proposed technique outperforms the conventional NN-based DL extrapolation schemes through plenty of simulations.

%EC%B5%9C%ED%98%81%EC%A7%84%20%EC%97%B0%EA%B5%AC%201

Figure 1. Flow charts of (a) CH-learning and (b) PG-learning.

%EC%B5%9C%ED%98%81%EC%A7%84%20%EC%97%B0%EA%B5%AC%202

Figure 2. NN structures used for numerical studies. (a) MLP for the CH-learning and (b) CNN for the PG-learning.

Massive MIMO Channel Prediction: Kalman Filtering Vs. Machine Learning

Title: Massive MIMO Channel Prediction: Kalman Filtering Vs. Machine Learning

Authors: Hwanjin Kim, Sucheol Kim, Hyeongtaek Lee, Junil Choi.

Journal: IEEE Transactions on Communications (published: January 2021)

 

Accurate channel state information (CSI) at the base stations (BSs) is crucial to fully exploit massive multiple-input multiple-output (MIMO) systems. The CSI at the BS can be outdated in the time-varying channel due to the mobility of user equipment (UE). The best way to solve the outdated CSI problem is to predict channels based on the prior CSI. In this paper, we develop the vector Kalman filter (VKF)-based predictor and the machine learning (ML)-based predictor using the spatial channel model (SCM), which is the realistic channels adopted in the 3GPP standard. First, we develop the VKF-based predictor using the autoregressive (AR) parameters from the SCM data based on the Yule-Walker equations. Then, we develop the ML-based channel predictor exploiting the linear minimum mean-square error (LMMSE)-based noise pre-processed data. Numerical results show that both channel predictors have significant gain over the outdated channel with regard to the prediction accuracy and data rate.

연구

Figure 1. Multi-layer perceptron (MLP) structure with LMMSE pre-processing.