AI in Signal division - 12 / 7 페이지 - KAIST 전기 및 전자공학부

Just a Few Points are All You Need for Multi-view Stereo: A Novel Semi-supervised Learning Method for Multi-view Stereo

Title: Just a Few Points are All You Need for Multi-view Stereo: A Novel Semi-supervised Learning Method for Multi-view Stereo

Abstract: While learning-based multi-view stereo (MVS) methods have recently shown successful performances in quality and efficiency, limited MVS data hampers generalization to unseen environments. A simple solution is to generate various large-scale MVS datasets, but generating dense ground truth for 3D structure requires a huge amount of time and resources. On the other hand, if the reliance on dense ground truth is relaxed, MVS systems will generalize more smoothly to new environments. To this end, we first introduce a novel semi-supervised multi-view stereo framework called a Sparse Ground truth-based MVS Network (SGTMVSNet) that can reliably reconstruct the 3D structures even with a few ground truth 3D points. Our strategy is to divide the accurate and erroneous regions and individually conquer them based on our observation that a probability map can separate these regions. We propose a self-supervision loss called the 3D Point Consistency Loss to enhance the 3D reconstruction performance, which forces the 3D points back-projected from the corresponding pixels by the predicted depth values to meet at the same 3D coordinates. Finally, we propagate these improved depth predictions toward edges and occlusions by the Coarse-to-fine Reliable Depth Propagation module. We generate the spare ground truth of the DTU dataset for evaluation and extensive experiments verify that our SGT-MVSNet outperforms the state-of-the-art MVS methods on the sparse ground truth setting. Moreover, our method shows comparable reconstruction results to the supervised MVS methods though we only used tens and hundreds of ground truth 3D points.

DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Title: DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Abstract: We present a novel approach for estimating depth from a monocular camera as it moves through complex and crowded indoor environments, e.g., a department store or a metro station. Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people, by training on dynamic scenes. Since it is difficult to collect dense depth maps from crowded indoor environments, we design our training framework without requiring depths produced from depth sensing devices. Our network leverages RGB images and sparse depth maps generated from traditional 3D reconstruction methods to estimate dense depth maps. We use two constraints to handle depth for non-rigidly moving people without tracking their motion explicitly. We demonstrate that our approach offers consistent improvements over recent depth estimation methods on the NAVERLABS dataset, which includes complex and crowded scenes.

On the Effectiveness of Small Input Noise for Defending Against Query-based Black-Box Attacks

Title: On the Effectiveness of Small Input Noise for Defending Against Query-based Black-Box Attacks

Abstract: While deep neural networks show unprecedented performance in various tasks, the vulnerability to adversarial examples hinders their deployment in safety-critical systems. Many studies have shown that attacks are also possible even in a black-box setting where an adversary cannot access the target model’s internal information. Most black-box attacks are based on queries, each of which obtains the target model’s output for an input, and many recent studies focus on reducing the number of required queries. In this paper, we pay attention to an implicit assumption of query-based black-box adversarial attacks that the target model’s output exactly corresponds to the query input. If some randomness is introduced into the model, it can break the assumption, and thus, query-based attacks may have tremendous difficulty in both gradient estimation and local search, which are the core of their attack process. From this motivation, we observe even a small additive input noise can neutralize most query-based attacks and name this simple yet effective approach Small Noise Defense (SND). We analyze how SND can defend against query-based black-box attacks and demonstrate its effectiveness against eight state-of-the-art attacks with CIFAR-10 and ImageNet datasets. Even with strong defense ability, SND almost maintains the original classification accuracy and computational speed. SND is readily applicable to pre-trained models by adding only one line of code at the inference

Geometrically Adaptive Dictionary Attack on Face Recognition

CNN-based face recognition models have brought remarkable performance improvement, but they are vulnerable to adversarial perturbations. Recent studies have shown that adversaries can fool the models even if they can only access the models’ hard-label output. However, since many queries are needed to find imperceptible adversarial noise, reducing the number of queries is crucial for these attacks. In this paper, we point out two limitations of existing decision-based black-box attacks. We observe that they waste queries for background noise optimization, and they do not take advantage of adversarial perturbations generated for other images. We exploit 3D face alignment to overcome these limitations and propose a general strategy for query-efficient black-box attacks on face recognition named Geometrically Adaptive Dictionary Attack (GADA). Our core idea is to create an adversarial perturbation in the UV texture map and project it onto the face in the image. It greatly improves query efficiency by limiting the perturbation search space to the facial area and effectively recycling previous perturbations. We apply the GADA strategy to two existing attack methods and show overwhelming performance improvement in the experiments on the LFW and CPLFW datasets. Furthermore, we also present a novel attack strategy that can circumvent query similarity-based stateful detection that identifies the process of query-based black-box attacks.

Improving the Transferability of Targeted Adversarial Examples through Object-Based Diverse Input

Title: Improving the Transferability of Targeted Adversarial Examples through Object-Based Diverse Input

Abstract: The transferability of adversarial examples allows the deception on black-box models, and transfer-based targeted attacks have attracted a lot of interest due to their practical applicability. To maximize the transfer success rate, adversarial examples should avoid overfitting to the source model, and image augmentation is one of the primary approaches for this. However, prior works utilize simple image transformations such as resizing, which limits input diversity. To tackle this limitation, we propose the objectbased diverse input (ODI) method that draws an adversarial image on a 3D object and induces the rendered image to be classified as the target class. Our motivation comes from the humans’ superior perception of an image printed on a 3D object. If the image is clear enough, humans can recognize the image content in a variety of viewing conditions. Likewise, if an adversarial example looks like the target class to the model, the model should also classify the rendered image of the 3D object as the target class. The ODI method effectively diversifies the input by leveraging an ensemble of multiple source objects and randomizing viewing conditions. In our experimental results on the ImageNet-Compatible dataset, this method boosts the average targeted attack success rate from 28.3% to 47.0% compared to the state-of-the-art methods. We also demonstrate the applicability of the ODI method to adversarial examples on the face verification task and its superior performance improvement. Our code is available at https://github.com/dreamflake/ODI.

2 1

Robust Federated Learning with Noisy Labels

Title: Robust Federated Learning with Noisy Labels

Abstract: Federated learning is a paradigm that enables local devices to jointly train a server model while keeping the data decentralized and private. In federated learning, since local data are collected by clients, it is hardly guaranteed that the data are correctly annotated. Although a lot of studies have been conducted to train the networks robust to these noisy data in a centralized setting, these algorithms still suffer from noisy labels in federated learning. Compared to the centralized setting, clients’ data can have different noise distributions due to variations in their labeling systems or background knowledge of users. As a result, local models form inconsistent decision boundaries and their weights severely diverge from each other, which are serious problems in federated learning. To solve these problems, we introduce a novel federated learning scheme that the server cooperates with local models to maintain consistent decision boundaries by interchanging class-wise centroids. These centroids are central features of local data on each device, which are aligned by the server every communication round. Updating local models with the aligned centroids helps to form consistent decision boundaries among local models, although the noise distributions in clients’ data are different from each other. To improve local model performance, we introduce a novel approach to select confident samples that are used for updating the model with given labels. Furthermore, we propose a global-guided pseudo-labeling method to update labels of unconfident samples by exploiting the global model. Our experimental results on the noisy CIFAR-10 dataset and the Clothing1M dataset show that our approach is noticeably effective in federated learning with noisy labels.

3DM: deep decomposition and deconvolution microscopy for rapid neural activity imaging

Title : 3DM: deep decomposition and deconvolution microscopy for rapid neural activity imaging

Author : Eun-Seo Cho*, Seungjae Han*, Kang-Han Lee, Cheol-Hee Kim, Young-Gyu Yoon (* co-first authors)

Journal/Conference and Year : Optics Express, 2021

Abstract : We report the development of deep decomposition and deconvolution microscopy (3DM), a computational microscopy method for the volumetric imaging of neural activity. 3DM overcomes the major challenge of deconvolution microscopy, the ill-posed inverse problem. We take advantage of the temporal sparsity of neural activity to reformulate and solve the inverse problem using two neural networks which perform sparse decomposition and deconvolution. We demonstrate the capability of 3DM via in vivo imaging of the neural activity of a whole larval zebrafish brain with a field of view of 1040µm×400µm×235µm and with estimated lateral and axial resolutions of 1.7µm and 5.4µm, respectively, at imaging rates of up to 4.2 volumes per second.

RLP-Net: A Recursive Light Propagation Network for 3-D Virtual Refocusing

Title : RLP-Net: A Recursive Light Propagation Network for 3-D Virtual Refocusing

Author : Changyeop Shin*, Hyun Ryu*, Eun-Seo Cho, Young-Gyu Yoon (* co-first authors)

Journal/Conference and Year : International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2021 (Selected for MICCAI Young Scientist Award, MICCAI Student Travel Award, and oral presentation)

Abstract : High-speed optical 3-D fluorescence microscopy is an essential tool for capturing the rapid dynamics of biological systems such as cellular signaling and complex movements. Designing such an optical system is constrained by the inherent trade-off among resolution, speed, and noise which comes from the limited number of photons that can be collected. In this paper, we propose a recursive light propagation network (RLP-Net) that infers the 3-D volume from two adjacent 2-D wide-field fluorescence images via virtual refocusing. Specifically, we propose a recursive inference scheme in which the network progressively predicts the subsequent planes along the axial direction. This recursive inference scheme reflects that the law of physics for the light propagation remains spatially invariant and therefore a fixed function (i.e., a neural network) for a short distance light propagation can be recursively applied for a longer distance light propagation. Experimental results show that the proposed method can faithfully reconstruct the 3-D volume from two planes in terms of both quantitative measures and visual quality. The source code used in the paper is available at https://github.com/NICALab/rlpnet.

2 1

Efficient Neural Network Approximation of Robust PCA for Automated Analysis of Calcium Imaging Data

Title : Efficient Neural Network Approximation of Robust PCA for Automated Analysis of Calcium Imaging Data

Author : Seungjae Han, Eun-Seo Cho, Inkyu Park, Kijung Shin, Young-Gyu Yoon

Journal/Conference and Year : International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2021

Abstract : Calcium imaging is an essential tool to study the activity of neuronal populations. However, the high level of background fluorescence in images hinders the accurate identification of neurons and the extraction of neuronal activities. While robust principal component analysis (RPCA) is a promising method that can decompose the foreground and background in such images, its computational complexity and memory requirement are prohibitively high to process large-scale calcium imaging data. Here, we propose BEAR, a simple bilinear neural network for the efficient approximation of RPCA which achieves an order of magnitude speed improvement with GPU acceleration compared to the conventional RPCA algorithms. In addition, we show that BEAR can perform foreground-background separation of calcium imaging data as large as tens of gigabytes. We also demonstrate that two BEARs can be cascaded to perform simultaneous RPCA and non-negative matrix factorization for the automated extraction of spatial and temporal footprints from calcium imaging data. The source code used in the paper is available at https://github.com/NICALab/BEAR

Jinwoo Jeon, Hyunjun Lim, Dong-Uk Seo, and Hyun Myung†, “Struct-MDC: Mesh-Refined Unsupervised Depth Completion Leveraging Structural Regularities from Visual SLAM,” Accepted to RA-L, Apr. 2022

Abstract

Feature-based visual simultaneous localization and mapping (SLAM) methods only estimate the depth of extracted features, generating a sparse depth map. To solve this sparsity problem, depth completion tasks that estimate a dense depth from a sparse depth have gained significant importance in robotic applications like exploration. Existing methodologies that use sparse depth from visual SLAM mainly employ point features. However, point features have limitations in preserving structural regularities owing to texture-less environments and sparsity problems. To deal with these issues, we perform depth completion with visual SLAM using line features, which can better contain structural regularities than point features. The proposed methodology creates a convex hull region by performing constrained Delaunay triangulation with depth interpolation using line features. However, the generated depth includes low-frequency information and is discontinuous at the convex hull boundary. Therefore, we propose a mesh depth refinement (MDR) module to address this problem. The MDR module effectively transfers the high-frequency details of an input image to the interpolated depth and plays a vital role in bridging the conventional and deep learning-based approaches. The Struct-MDC outperforms other state-of-the-art algorithms on public and our custom datasets, and even outperforms supervised methodologies for some metrics. In addition, the effectiveness of the proposed MDR module is verified by a rigorous ablation study.

[보관함:] AI in Signal division