Professor Chan-Hyun Youn’s Research Team Developed a Dataset Watermarking Technique for Dataset Copyright Protection
<Professor Chan-Hyun Youn, and Jinhyeok Jang Ph.D. candidate>
Professor Chan-Hyun Youn’s research team from the EE department has developed a technique for dataset copyright protection named “Undercover Bias.” Undercover Bias is based on the premise that all datasets contain bias and that bias itself has discriminability. By embedding artificially generated biases into a dataset, it is possible to verify AI models using the watermarked data without adequate permission.
This technique addresses the issues of data copyright and privacy protection, which have become significant societal concerns with the rise of AI. It embeds a very subtle watermark into the target dataset. Unlike prior methods, the watermark is nearly imperceptible and clean-labeled. However, AI models trained on the watermarked dataset unintentionally acquire the ability to classify the watermark. The presence or absence of this property allows for the verification of unauthorized use of the dataset.
Figure 1 : Schematic of verification based on Undercover Bias
The research team demonstrated that the proposed method can verify models trained using the watermarked dataset with 100% accuracy across various benchmarks.
Further, they showed that models trained with adequate permission are misidentified as unauthorized with a probability of less than 3e-5%, proving the high reliability of the proposed watermark. The study will be presented at one of the leading international conferences in the field of computer vision, the European Conference on Computer Vision (ECCV) 2024, to be held in Milan, Italy, in October this year.
ECCV is renowned in the field of computer vision, alongside conferences like CVPR and ICCV, as one of the top-tier international academic conferences. The paper will be titled “Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-wise Hidden Bias.”