Title: Asymptotic Normality of Log-Likelihood Ratio and Fundamental Limit of the Weak Detection for Spiked Wigner Matrices
Authors: Hye Won Chung, Jiho Lee and Ji Oon Lee
Journal: Bernoulli, 2024.
Abstract: We consider the problem of detecting the presence of a signal in a rank-one spiked Wigner model. For general non-Gaussian noise, assuming that the signal is drawn from the Rademacher prior, we prove that the log likelihood ratio (LR) of the spiked model against the null model converges to a Gaussian when the signal-to-noise ratio is below a certain threshold. The threshold is optimal in the sense that the reliable detection is possible by a transformed principal component analysis (PCA) above it. From the mean and the variance of the limiting Gaussian for the log LR, we compute the limit of the sum of the Type-I error and the Type-II error of the likelihood ratio test. We also prove similar results for a rank-one spiked IID model where the noise is asymmetric but the signal is symmetric.
Main figure:

Title: Exact Graph Matching in Correlated Gaussian-Attributed Erdos-Renyi Model
Authors: Joonhyuk Yang and Hye Won Chung
Conference: IEEE International Symposium on Information Theory (ISIT), July 2024.
Abstract: Graph matching problem aims to identify node correspondence between two or more correlated graphs. Previous studies have primarily focused on models where only edge information is provided. However, in many social networks, not only the relationships between users, represented by edges, but also their personal information, represented by features, are present. In this paper, we address the challenge of identifying node correspondence in correlated graphs, where additional node features exist, as in many real-world settings. We propose a two-step procedure, where we initially match a subset of nodes only using edge information, and then match the remaining nodes using node features. We derive information-theoretic limits for exact graph matching on this model. Our approach provides a comprehensive solution to the real-world graph matching problem by providing systematic ways to utilize both edge and node information for exact matching of the graphs.
Main figure:

Jung, I. Ali and J. Ha, “Convolutional Neural Decoder for Surface Codes,” IEEE Transactions on Quantum Engineering, vol. 5, pp. 1-13, June 2024
Abstract: To perform reliable information processing in quantum computers, quantum error correction (QEC) codes are essential for the detection and correction of errors in the qubits. Among QEC codes, topological QEC codes are designed to interact between the neighboring qubits, which is a promising property for easing the implementation requirements. In addition, the locality to the qubits provides unusual tolerance to local errors. Recently, various decoding algorithms based on machine learning have been proposed to improve the decoding performance and latency of QEC codes. In this work, we propose a new decoding algorithm for surface codes, i.e., a type of topological codes, by using convolutional neural networks (CNNs) tailored for the topological lattice structure of the surface codes. In particular, the proposed algorithm takes advantage of the syndrome pattern, which is represented as a part of a rectangular lattice given to the CNN as its input. The remaining part of the rectangular lattice is filled with a carefully selected incoherent value for better logical error rate performance. In addition, we introduce how to optimize the hyperparameters in the CNN, according to the lattice structure of a given surface code. This reduces the overall decoding complexity and makes the CNN-based decoder computationally more suitable for implementation. The numerical results show that the proposed decoding algorithm effectively improves the decoding performance in terms of logical error rate as compared to the existing algorithms on various quantum error models.
Main Figure:

Lee, H. Yeom, S. Lee and J. Ha, “Channel Correlation in Multi-User Covert Communication: Friend or Foe?,” IEEE Transactions on Information Forensics and Security, vol. 19, pp. 1469-1482, Nov. 2023
Abstract: In this work, we study a covert communication scheme in which some users are opportunistically selected to emit interference signals for the purpose of hiding the communication of a covert user. This work reveals interesting facts that the channel correlation is beneficial to the throughput of the covert communication but detrimental to the energy efficiency, which has never been discussed before. The study is conducted in a generic setup where the channels between pairs of entities in the scheme are correlated. For the setup, we discover that the optimal power profile of the interference signals from the selected users turns out to be the equal power transmission at their maximum transmit power level. In addition, we optimize system parameters of the scheme for maximizing throughput and energy efficiency utilizing Q -learning, which however is plagued with long learning time and large storage space when the dimension of state gets large and/or a fine resolution of reward function value is necessary. To resolve the technical challenge, we propose a scalable Q -learning which recursively narrows down the discretization level of the continuous state in an iterative fashion. To confirm the results in this work, the system parameters are evaluated with theoretical results for independent channels and compared with the ones from the proposed scalable Q -learning.
Main Figure:

Title: Channel Correlation in Multi-User Covert Communication: Friend or Foe?
Abstract:
In this work, we study a covert communication scheme in which some users are opportunistically selected to emit interference signals for the purpose of hiding the communication of a covert user. This work reveals interesting facts that the channel correlation is beneficial to the throughput of the covert communication but detrimental to the energy efficiency, which has never been discussed before. The study is conducted in a generic setup where the channels between pairs of entities in the scheme are correlated. For the setup, we discover that the optimal power profile of the interference signals from the selected users turns out to be the equal power transmission at their maximum transmit power level. In addition, we optimize system parameters of the scheme for maximizing throughput and energy efficiency utilizing Q-learning, which however is plagued with long learning time and large storage space when the dimension of state gets large and/or a fine resolution of reward function value is necessary. To resolve the technical challenge, we propose a scalable Q-learning which recursively narrows down the discretization level of the continuous state in an iterative fashion. To confirm the results in this work, the system parameters are evaluated with theoretical results for independent channels and compared with the ones from the proposed scalable Q-learning.
Main figure:
Title: Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization
Authors: Daesung Kim, Hye Won Chung
Conference: Conference on Neural Information Processing Systems (NeurIPS), 2023.
Abstract:
The nonconvex formulation of the matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient Descent (GD) is a simple yet efficient baseline algorithm for solving nonconvex optimization problems. The success of GD has been witnessed in many different problems in both theory and practice when it is combined with random initialization. However, previous works on matrix completion require either careful initialization or regularizers to prove the convergence of GD. In this paper, we study the rank-1 symmetric matrix completion and prove that GD converges to the ground truth when small random initialization is used. We show that in a logarithmic number of iterations, the trajectory enters the region where local convergence occurs. We provide an upper bound on the initialization size that is sufficient to guarantee the convergence, and show that a larger initialization can be used as more samples are available. We observe that the implicit regularization effect of GD plays a critical role in the analysis, and for the entire trajectory, it prevents each entry from becoming much larger than the others.
Figure:
Title: A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental Limits
Authors: Doyeon Kim, Jeonghwan Lee, Hye Won Chung
Journal: IEEE Transaction on Information Theory, 2024.
Abstract:
Crowdsourcing system has emerged as an effective platform for labeling data with relatively low cost by using non-expert workers. Inferring correct labels from multiple noisy answers on data, however, has been a challenging problem, since the quality of the answers varies widely across tasks and workers. Many existing works have assumed that there is a fixed ordering of workers in terms of their skill levels, and focused on estimating worker skills to aggregate the answers from workers with different weights. In practice, however, the worker skill changes widely across tasks, especially when the tasks are heterogeneous. In this paper, we consider a new model, called d-type specialization model, in which each task and worker has its own (unknown) type and the reliability of each worker can vary in the type of a given task and that of a worker. We allow that the number d of types can scale in the number of tasks. In this model, we characterize the optimal sample complexity to correctly infer the labels within any given accuracy, and propose label inference algorithms achieving the order-wise optimal limit even when the types of tasks or those of workers are unknown. We conduct experiments both on synthetic and real datasets, and show that our algorithm outperforms the existing algorithms developed based on more strict model assumptions.
Figure:
Title: Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation
Authors: Joonhyuk Yang, Dongpil Shin, Hye Won Chung
Conference: The International Conference on Machine Learning (ICML), 2023
Abstract:
We consider the problem of graph matching, or learning vertex correspondence, between two correlated stochastic block models (SBMs). The graph matching problem arises in various fields, including computer vision, natural language processing and bioinformatics, and in particular, matching graphs with inherent community structure has significance related to de-anonymization of correlated social networks. Compared to the correlated Erdos-Renyi (ER) model, where various efficient algorithms have been developed, among which a few algorithms have been proven to achieve the exact matching with constant edge correlation, no low-order polynomial algorithm has been known to achieve exact matching for the correlated SBMs with constant correlation. In this work, we propose an efficient algorithm for matching graphs with community structure, based on the comparison between partition trees rooted from each vertex, by extending the idea of Mao et al. (2021) to graphs with communities. The partition tree divides the large neighborhoods of each vertex into disjoint subsets using their edge statistics to different communities. Our algorithm is the first low-order polynomial-time algorithm achieving exact matching between two correlated SBMs with high probability in dense graphs.
Figure:
Title: Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing
Authors: Hyeonsu Jeong, Hye Won Chung
Conference: The International Conference on Machine Learning (ICML), 2023
Abstract:
Crowdsourcing has emerged as an effective platform for labeling large amounts of data in a cost- and time-efficient manner. Most previous work has focused on designing an efficient algorithm to recover only the ground-truth labels of the data. In this paper, we consider multi-choice crowdsourcing tasks with the goal of recovering not only the ground truth, but also the most confusing answer and the confusion probability. The most confusing answer provides useful information about the task by revealing the most plausible answer other than the ground truth and how plausible it is. To theoretically analyze such scenarios, we propose a model in which there are the top two plausible answers for each task, distinguished from the rest of the choices. Task difficulty is quantified by the probability of confusion between the top two, and worker reliability is quantified by the probability of giving an answer among the top two. Under this model, we propose a two-stage inference algorithm to infer both the top two answers and the confusion probability. We show that our algorithm achieves the minimax optimal convergence rate. We conduct both synthetic and real data experiments and demonstrate that our algorithm outperforms other recent algorithms. We also show the applicability of our algorithms in inferring the difficulty of tasks and in training neural networks with top-two soft labels.
Figure:
Abstract : Modulation classification (MC) is the first step performed at the receiver side unless the modulation type is explicitly indicated by the transmitter. Machine learning techniques have been widely used for MC recently. In this paper, we propose a novel MC technique dubbed as Joint Equalization and Modulation Classification based on Constellation Network (EMC2-Net). Unlike prior works that considered the constellation points as an image, the proposed EMC2-Net directly uses a set of 2D constellation points to perform MC. In order to obtain clear and concrete constellation despite multipath fading channels, the proposed EMC2-Net consists of equalizer and classifier having separate and explainable roles via novel three-phase training and noise-curriculum pretraining. Numerical results with linear modulation types under different channel models show that the proposed EMC2-Net achieves the performance of state-of-the-art MC techniques with significantly less complexity.
