Data Valuation Without Training of a Model

Authors: Nohyun Ki, Hoyong Choi, Hye Won Chung

Conference: International Conference on Learning Representations (ICLR) 2023

Abstract: Many recent works on understanding deep learning try to quantify how much individual data instances influence the optimization and generalization of a model. Such attempts reveal characteristics and importance of individual instances, which may provide useful information in diagnosing and improving deep learning. However, most of the existing works on data valuation require actual training of a model, which often demands high-computational cost. In this paper, we provide a training-free data valuation score, called complexity-gap score, which is a data-centric score to quantify the influence of individual instances in generalization of two-layer overparameterized neural networks. The proposed score can quantify irregularity of the instances and measure how much each data instance contributes in the total movement of the network parameters during training. We theoretically analyze and empirically demonstrate the effectiveness of the complexity-gap score in finding `irregular or mislabeled’ data instances, and also provide applications of the score in analyzing datasets and diagnosing training dynamics. Our code is publicly available at https://github.com/JJchy/CG_score.

 

2 1

Test-Time Adaptation via Self-Training with Nearest Neighbor Information

Authors: Minguk Jang, Sae-Young Chung, and Hye Won Chung

Conference: International Conference on Learning Representations (ICLR) 2023

Abstract:

Test-time adaptation (TTA) aims to adapt a trained classifier using online unlabeled test data only, without any information related to the training procedure. Most existing TTA methods adapt the trained classifier using the classifier’s prediction on the test data as pseudo-label. However, under test-time domain shift, accuracy of the pseudo labels cannot be guaranteed, and thus the TTA methods often encounter performance degradation at the adapted classifier. To overcome this limitation, we propose a novel test-time adaptation method, called Test-time Adaptation via Self-Training with nearest neighbor information (TAST), which is composed of the following procedures: (1) adds trainable adaptation modules on top of the trained feature extractor; (2) newly defines a pseudo-label distribution for the test data by using the nearest neighbor information; (3) trains these modules only a few times during test time to match the nearest neighbor-based pseudo label distribution and a prototype-based class distribution for the test data; and (4) predicts the label of test data using the average predicted class distribution from these modules. The pseudo-label generation is based on the basic intuition that a test data and its nearest neighbor in the embedding space are likely to share the same label under the domain shift. By utilizing multiple randomly initialized adaptation modules, TAST extracts useful information for the classification of the test data under the domain shift, using the nearest neighbor information. TAST showed better performance than the state-of-the-art TTA methods on two standard benchmark tasks, domain generalization, namely VLCS, PACS, OfficeHome, and TerraIncognita, and image corruption, particularly CIFAR-10/100C.

 

1 1

SplitGP: Achieving Both Generalization and Personalization in Federated Learning

Conference: IEEE International Conference on Computer Communications (INFOCOM) 2023.

 

Abstract:

A fundamental challenge to providing edge-AI services is the need for a machine learning (ML) model that achieves personalization (i.e., to individual clients) and generalization (i.e., to unseen data) properties concurrently. Existing techniques in federated learning (FL) have encountered a steep tradeoff between these objectives and impose large computational requirements on edge devices during training and inference. In this paper, we propose SplitGP, a new split learning solution that can simultaneously capture generalization and personalization capabilities for efficient inference across resource-constrained clients (e.g., mobile/IoT devices). Our key idea is to split the full ML model into client-side and server-side components, and impose different roles to them: the client-side model is trained to have strong personalization capability optimized to each client’s main task, while the server-side model is trained to have strong generalization capability for handling all clients’ out-ofdistribution tasks. We analytically characterize the convergence behavior of SplitGP, revealing that all client models approach stationary points asymptotically. Further, we analyze the inference time in SplitGP and provide bounds for determining model split ratios. Experimental results show that SplitGP outperforms existing baselines by wide margins in inference time and test accuracy for varying amounts of out-of-distribution samples. 

 

5

Active Learning for Object Detection with Evidential Deep Learning and Hierarchical Uncertainty Aggregation

Conference: International Conference on Learning Representations (ICLR) 2023.

Abstract:

Despite the huge success of object detection, the training process still requires an immense amount of labeled data. Although various active learning solutions for object detection have been proposed, most existing works do not take advantage of epistemic uncertainty, which is an important metric for capturing the usefulness of the sample. Also, previous works pay little attention to the attributes of each bounding box (e.g., nearest object, box size) when computing the informativeness of an image. In this paper, we propose a new active learning strategy for object detection that overcomes the shortcomings of prior works. To make use of epistemic uncertainty, we adopt evidential deep learning (EDL) and propose a new module termed model evidence head (MEH), that makes EDL highly compatible with object detection. Based on the computed epistemic uncertainty of each bounding box, we propose hierarchical uncertainty aggregation (HUA) for obtaining the informativeness of an image. HUA realigns all bounding boxes into multiple levels based on the attributes and aggregates uncertainties in a bottom-up order, to effectively capture the context within the image. Experimental results show that our method outperforms existing state-of-the-art methods by a considerable margin.

4

Warping the Space: Weight Space Rotation for Class-Incremental Few-Shot Learning

Conference: International Conference on Learning Representations (ICLR) 2023.

Abstract:

Class-incremental few-shot learning, where new sets of classes are provided sequentially with only a few training samples, presents a great challenge due to catastrophic forgetting of old knowledge and overfitting caused by lack of data. During finetuning on new classes, the performance on previous classes deteriorates quickly even when only a small fraction of parameters are updated, since the previous knowledge is broadly associated with most of the model parameters in the original parameter space. In this paper, we introduce WaRP, the weight space rotation process, which transforms the original parameter space into a new space so that we can push most of the previous knowledge compactly into only a few important parameters. By properly identifying and freezing these key parameters in the new weight space, we can finetune the remaining parameters without affecting the knowledge of previous classes. As a result, WaRP provides an additional room for the model to effectively learn new classes in future incremental sessions. Experimental results confirm the effectiveness of our solution and show the improved performance over the state-of-the-art methods.

 

3

GenLabel: Mixup Relabeling using Generative Models

Conference: International Conference on Machine Learning (ICML) 2022.

Abstract:

Mixup is a data augmentation method that generates new data points by mixing a pair of input data. While mixup generally improves the prediction performance, it sometimes degrades the performance. In this paper, we first identify the main causes of this phenomenon by theoretically and empirically analyzing the mixup algorithm. To resolve this, we propose GenLabel, a simple yet effective relabeling algorithm designed for mixup. In particular, GenLabel helps the mixup algorithm correctly label mixup samples by learning the class-conditional data distribution using generative models. Via theor etical and empirical analysis, we show that mixup, when used together with GenLabel, can effectively resolve the aforementioned phenomenon, improving the accuracy of mixup-trained model.

 

1 12 1

Weak Detection in the Spiked Wigner Model

Journal : IEEE Trans. on Information Theory (2022)

Abstract : We consider the weak detection problem in a rank-one spiked Wigner data matrix where the signal-to-noise ratio is small so that reliable detection is impossible. We prove a central limit theorem for the linear spectral statistics of general rank-one spiked Wigner matrices, and based on the central limit theorem, we propose a hypothesis test on the presence of the signal by utilizing the linear spectral statistics of the data matrix. The test is data-driven and does not require prior knowledge about the distribution of the signal or the noise. When the noise is Gaussian, the proposed test is optimal in the sense that its error matches that of the likelihood ratio test, which minimizes the sum of the Type-I and Type-II errors. If the density of the noise is known and non-Gaussian, the error of the test can be lowered by applying an entrywise transformation to the data matrix.

 

A Generalized Worker-Task Specialization Model for Crowdsourcing: Optimal Limits and Algorithm

Conference : IEEE International Symposium on Information Theory (2022)

Abstract : Crowdsourcing has emerged as an effective platform to label data with low cost by using non-expert workers. However, inferring correct labels from multiple noisy answers on data has been a challenging problem, since the quality of answers varies widely across tasks and workers. We propose a highly general crowdsourcing model in which the reliability of each worker can vary depending on the type of a given task, where the number of types d can scale in the number of tasks. In this model, we characterize the optimal sample complexity to correctly infer the unknown labels within any given accuracy, and propose an algorithm achieving the order-wise optimal result

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

Authors: Suyoung Lee, Sae-Young Chung

Conference: Advances in Neural Information Processing Systems 2022 (NeurIPS 2022)

Abstract:

대부분의 메타 강화 학습(meta-RL) 방법의 일반화 능력은 학습 태스크를 추출하는 데 사용된 동일한 분포에서 추출된 테스트 태스크로 크게 제한된다. 이러한 한계를 극복하기 위해 학습된 잠재 특징 분포의 혼합으로 생성된 가상 태스크로 훈련시키는 LDM (Latent Dynamics Mixture)을 제안한다. 기존 학습 태스크와 함께 혼합 태스크에 대해서 훈련함으로써 LDM은 학습 중에 보이지 않는 테스트 태스크를 준비할 수 있게 되고 학습태스크에 과적합되는 것을 방지한다. LDM은 학습 태스크 분포와 테스트 태스크 분배를 엄격하게 분리한 Grid-World 탐색 및 MuJoCo 테스트 환경에서 기존 방식을 큰 폭으로 뛰어넘는 성능을 보였다.

1 1

Figure 1. Latent space에서 혼합을 통해 가상의 태스크를 생성

2 1

Figure 2. 가상 태스크를 생성해 학습하는 강화학습 기법 LDM 네트워크 구조

3 1

Figure 3. 3가지 Out-of-distribution MuJoCo 태스크의 테스트 상황에서의 평균 리턴

 

 

Deep-Learning for Breaking the Trapping Sets in Low-Density Parity-Check Codes

Journal : IEEE Transactions on Communications (published: March 2022)   

Abstract                             

저밀도 패리티 검사(LDPC) 부호의 메시지 전달(MP) 복호 알고리즘은 낮은 오류율 영역에서 trapping set으로 인해 성능이 저하되므로 높은 신뢰도를 요구하는 저장 장치와 같은 분야에서의 사용이 제한되고 있다. 본 논문은 trapping set을 해결하기 위한 새로운 딥 러닝 기반 복호 알고리즘을 제안한다. trapping set으로 인해 메시지 전달 복호에 실패하는 경우, 잘못된 경판정 결과를 갖는 오류 변수 노드(Variable Node)들 만이 존재하는 경로로 연결되는 unsatisfied 검사 노드(Check Node) 쌍이 존재한다. 제안된 알고리즘은 딥 러닝 기술을 사용하여 unsatisfied 검사노드 사이에 오류 변수 노드 만이 존재하는 경로를 효율적으로 식별한다. 그리고 식별된 경로의 오류 변수 노드에 대한 채널 출력을 다시 초기화한 후 메시지 전달 복호 알고리즘을 반복하여 trapping set으로 인한 복호 실패 문제를 해결한다. 추가적으로 딥 러닝 기반 알고리즘의 동작 방식을 분석하여 낮은 복잡도로 적응적으로 오류가 발생한 경로를 색출할 수 있는 검출기를 제안한다. 시뮬레이션 결과를 통해 제안된 알고리즘이 trapping set을 효율적으로 해결하고 낮은 오류율 영역에서 오류 마루 성능을 크게 향상시키는 것으로 보인다.

 

2 1