Title: Fast-Convergent Federated Learning via Cyclic Aggregation
Venue: 2023 IEEE International Conference on Image Processing (ICIP)
Abstract: Federated learning (FL) aims at optimizing a shared global model over multiple edge devices without transmitting (private) data to the central server. While it is theoretically well-known that FL yields an optimal model – centrally trained model assuming availability of all the edge device data at the central server – under mild condition, in practice, it often requires massive amount of iterations until convergence, especially under presence of statistical/computational heterogeneity. This paper utilizes cyclic learning rate at the server side to reduce the number of training iterations with increased performance without any additional computational costs for both the server and the edge devices. Numerical results validate that, simply plugging-in the proposed cyclic aggregation to the existing FL algorithms effectively reduces the number of training iterations with improved performance.
Main Figure:

Title: BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges
Authors: Hoyong Choi*, Nohyun Ki* and Hye Won Chung
Conference: International Conference on Machine Learning (ICML), July 2024.
Abstract: Data subset selection aims to find a smaller yet informative subset of a large dataset that can approximate the full-dataset training, addressing challenges associated with training neural networks on large-scale datasets. However, existing methods tend to specialize in either high or low selection ratio regimes, lacking a universal approach that consistently achieves competitive performance across a broad range of selection ratios. We introduce a universal and efficient data subset selection method, Best Window Selection (BWS), by proposing a method to choose the best window subset from samples ordered based on their difficulty scores. This approach offers flexibility by allowing the choice of window intervals that span from easy to difficult samples. Furthermore, we provide an efficient mechanism for selecting the best window subset by evaluating its quality using kernel ridge regression. Our experimental results demonstrate the superior performance of BWS compared to other baselines across a broad range of selection ratios over datasets, including CIFAR-10/100 and ImageNet, and the scenarios involving training from random initialization or fine-tuning of pre-trained models.
Main figure:

Title: SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching
Authors: Yongmin Lee and Hye Won Chung
Conference: International Conference on Machine Learning (ICML), July 2024.
Abstract: Dataset distillation aims to synthesize a small number of images per class (IPC) from a large dataset to approximate full dataset training with minimal performance loss. While effective in very small IPC ranges, many distillation methods become less effective, even underperforming random sample selection, as IPC increases. Our examination of state-of-the-art trajectory-matching based distillation methods across various IPC scales reveals that these methods struggle to incorporate the complex, rare features of harder samples into the synthetic dataset even with the increased IPC, resulting in a persistent coverage gap between easy and hard test samples. Motivated by such observations, we introduce SelMatch, a novel distillation method that effectively scales with IPC. SelMatch uses selection-based initialization and partial updates through trajectory matching to manage the synthetic dataset’s desired difficulty level tailored to IPC scales. When tested on CIFAR-10/100 and TinyImageNet, SelMatch consistently outperforms leading selection-only and distillation-only methods across subset ratios from 5% to 30%.
Main figure:

Title: Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning
Authors: Dong Geun Shin and Hye Won Chung
Journal: Transactions on Machine Learning Research (TMLR), 2024.
Abstract: Detecting out-of-distribution (OOD) samples is a critical task for reliable machine learning. However, it becomes particularly challenging when the models are trained on long-tailed datasets, as the models often struggle to distinguish tail-class in-distribution samples from OOD samples. We examine the main challenges in this problem by identifying the trade-offs between OOD detection and in-distribution (ID) classification, faced by existing methods. We then introduce our method, called Representation Norm Amplification (RNA), which solves this challenge by decoupling the two problems. The main idea is to use the norm of the representation as a new dimension for OOD detection, and to develop a training method that generates a noticeable discrepancy in the representation norm between ID and OOD data, while not perturbing the feature learning for ID classification. Our experiments show that RNA achieves superior performance in both OOD detection and classification compared to the state-of-the-art methods, by 1.70% and 9.46% in FPR95 and 2.43% and 6.87% in classification accuracy on CIFAR10-LT and ImageNet-LT, respectively. The code for this work is available at https://github.com/dgshin21/RNA.
Main figure:

Title: Detection Problems in the Spiked Random Matrix Models
Authors: Ji Hyung Jung, Hye Won Chung and Ji Oon Lee
Conference: IEEE Trans. on Information Theory, 2024.
Abstract: We study the statistical decision process of detecting the low-rank signal from various signal-plus-noise type data matrices, known as the spiked random matrix models. We first show that the principal component analysis can be improved by entrywise pre-transforming the data matrix if the noise is non-Gaussian, generalizing the known results for the spiked random matrix models with rank-1 signals. As an intermediate step, we find out sharp phase transition thresholds for the extreme eigenvalues of spiked random matrices, which generalize the Baik-Ben Arous-Péché (BBP) transition. We also prove the central limit theorem for the linear spectral statistics for the spiked random matrices and propose a hypothesis test based on it, which does not depend on the distribution of the signal or the noise. When the noise is non-Gaussian noise, the test can be improved with an entrywise transformation to the data matrix with additive noise. We also introduce an algorithm that estimates the rank of the signal when it is not known a priori.
Title: Asymptotic Normality of Log-Likelihood Ratio and Fundamental Limit of the Weak Detection for Spiked Wigner Matrices
Authors: Hye Won Chung, Jiho Lee and Ji Oon Lee
Journal: Bernoulli, 2024.
Abstract: We consider the problem of detecting the presence of a signal in a rank-one spiked Wigner model. For general non-Gaussian noise, assuming that the signal is drawn from the Rademacher prior, we prove that the log likelihood ratio (LR) of the spiked model against the null model converges to a Gaussian when the signal-to-noise ratio is below a certain threshold. The threshold is optimal in the sense that the reliable detection is possible by a transformed principal component analysis (PCA) above it. From the mean and the variance of the limiting Gaussian for the log LR, we compute the limit of the sum of the Type-I error and the Type-II error of the likelihood ratio test. We also prove similar results for a rank-one spiked IID model where the noise is asymmetric but the signal is symmetric.
Main figure:

Title: Exact Graph Matching in Correlated Gaussian-Attributed Erdos-Renyi Model
Authors: Joonhyuk Yang and Hye Won Chung
Conference: IEEE International Symposium on Information Theory (ISIT), July 2024.
Abstract: Graph matching problem aims to identify node correspondence between two or more correlated graphs. Previous studies have primarily focused on models where only edge information is provided. However, in many social networks, not only the relationships between users, represented by edges, but also their personal information, represented by features, are present. In this paper, we address the challenge of identifying node correspondence in correlated graphs, where additional node features exist, as in many real-world settings. We propose a two-step procedure, where we initially match a subset of nodes only using edge information, and then match the remaining nodes using node features. We derive information-theoretic limits for exact graph matching on this model. Our approach provides a comprehensive solution to the real-world graph matching problem by providing systematic ways to utilize both edge and node information for exact matching of the graphs.
Main figure:

Jung, I. Ali and J. Ha, “Convolutional Neural Decoder for Surface Codes,” IEEE Transactions on Quantum Engineering, vol. 5, pp. 1-13, June 2024
Abstract: To perform reliable information processing in quantum computers, quantum error correction (QEC) codes are essential for the detection and correction of errors in the qubits. Among QEC codes, topological QEC codes are designed to interact between the neighboring qubits, which is a promising property for easing the implementation requirements. In addition, the locality to the qubits provides unusual tolerance to local errors. Recently, various decoding algorithms based on machine learning have been proposed to improve the decoding performance and latency of QEC codes. In this work, we propose a new decoding algorithm for surface codes, i.e., a type of topological codes, by using convolutional neural networks (CNNs) tailored for the topological lattice structure of the surface codes. In particular, the proposed algorithm takes advantage of the syndrome pattern, which is represented as a part of a rectangular lattice given to the CNN as its input. The remaining part of the rectangular lattice is filled with a carefully selected incoherent value for better logical error rate performance. In addition, we introduce how to optimize the hyperparameters in the CNN, according to the lattice structure of a given surface code. This reduces the overall decoding complexity and makes the CNN-based decoder computationally more suitable for implementation. The numerical results show that the proposed decoding algorithm effectively improves the decoding performance in terms of logical error rate as compared to the existing algorithms on various quantum error models.
Main Figure:

Lee, H. Yeom, S. Lee and J. Ha, “Channel Correlation in Multi-User Covert Communication: Friend or Foe?,” IEEE Transactions on Information Forensics and Security, vol. 19, pp. 1469-1482, Nov. 2023
Abstract: In this work, we study a covert communication scheme in which some users are opportunistically selected to emit interference signals for the purpose of hiding the communication of a covert user. This work reveals interesting facts that the channel correlation is beneficial to the throughput of the covert communication but detrimental to the energy efficiency, which has never been discussed before. The study is conducted in a generic setup where the channels between pairs of entities in the scheme are correlated. For the setup, we discover that the optimal power profile of the interference signals from the selected users turns out to be the equal power transmission at their maximum transmit power level. In addition, we optimize system parameters of the scheme for maximizing throughput and energy efficiency utilizing Q -learning, which however is plagued with long learning time and large storage space when the dimension of state gets large and/or a fine resolution of reward function value is necessary. To resolve the technical challenge, we propose a scalable Q -learning which recursively narrows down the discretization level of the continuous state in an iterative fashion. To confirm the results in this work, the system parameters are evaluated with theoretical results for independent channels and compared with the ones from the proposed scalable Q -learning.
Main Figure:

Title: Channel Correlation in Multi-User Covert Communication: Friend or Foe?
Abstract:
In this work, we study a covert communication scheme in which some users are opportunistically selected to emit interference signals for the purpose of hiding the communication of a covert user. This work reveals interesting facts that the channel correlation is beneficial to the throughput of the covert communication but detrimental to the energy efficiency, which has never been discussed before. The study is conducted in a generic setup where the channels between pairs of entities in the scheme are correlated. For the setup, we discover that the optimal power profile of the interference signals from the selected users turns out to be the equal power transmission at their maximum transmit power level. In addition, we optimize system parameters of the scheme for maximizing throughput and energy efficiency utilizing Q-learning, which however is plagued with long learning time and large storage space when the dimension of state gets large and/or a fine resolution of reward function value is necessary. To resolve the technical challenge, we propose a scalable Q-learning which recursively narrows down the discretization level of the continuous state in an iterative fashion. To confirm the results in this work, the system parameters are evaluated with theoretical results for independent channels and compared with the ones from the proposed scalable Q-learning.
Main figure: