Yihua Zhang

Room 3210

428 S Shaw LN

East Lansing, Michigan

United States of America

Yihua Zhang (张逸骅) is a fourth-year Ph.D. student at OPTML Group at Michigan State University, under the supervision of Prof. Sijia Liu. His research centers on trustworthy and scalable machine learning (ML) algorithms for large language models (LLMs) and diffusion models (DMs), with a keen focus on bridging theoretical foundations and real-world applications. In recognition of his outstanding contributions, Yihua was honored with the IBM PhD Fellowship 2024, the CPAL 2025 Risiting Star Award, and the prestigious MLCommons Rising Star Award in 2024. Yihua has gained valuable industry experience through internships at leading technology companies such as Meta AI, Amazon AWS AI Lab, and Cisco Research. Yihua’s work is driven by the need to develop efficient, scalable, and robust ML algorithms, with a commitment to addressing modern challenges in these domains.

Research Keywords: Machine Unlearning, Jailbreak Attack, Adversarial Training, Fairness, Parameter-Efficient Fine-Tuning, Memory-Efficient Fine-Tuning, Mixture-of-Experts, Model Sparsity, Large Language Model, Diffusion Model, Bi-Level Optimization, Zeroth-Order Optimization.

Theme 1: Trustworthy Foundation Models: Robustness, Fairness, and Unlearning: Yihua explores how to enhance the trustworthiness of foundation models, focusing on robustness against adversarial attacks, fairness in decision-making, and the emerging area of machine unlearning to ensure data privacy and compliance with deletion requests.

Theme 2: Scalable Foundation Models: Efficient Models, Data, and Algorithms: In this theme, Yihua’s work revolves around designing models that are not only powerful but also computationally efficient. His research includes advancements in model sparsification, memory-efficient fine-tuning techniques, and optimizing data usage for large-scale models.

Theme 3: Optimization in Modern ML: Bi-Level and Zeroth-Order Optimization This research line focuses on the theoretical underpinnings of scalable machine learning algorithms, addressing real-world constraints through bi-level optimization and zeroth-order optimization.

Collaboration Opportunities

I am always open to collaborations with researchers, as well as undergraduate and graduate students seeking Ph.D. positions. While my primary research focuses on trustworthy and scalable ML algorithms for LLMs and DMs, I am also interested in exploring a wide range of topics beyond these areas. If you have exciting research ideas or are looking for opportunities to conduct research under professional guidance, feel free to reach out to me. Please refer to my collaboration statement for more details. You are also welcome to befriend me on Wechat or connect me through LinkedIn.

News

May 16, 2025	One first-authored paper SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs? is accepted to ACL 2025 main conference!
May 1, 2025	Two papers accepted in ICML’25!
Apr 16, 2025	Honored to receive the First Place Award in the 2024–25 Fitch H. Beach Award competition — the highest distinction for graduate students at the MSU College of Engineering! I’ll be proudly representing the Computer Science department at the college-wide awards ceremony on April 30. 🎓💚
Feb 26, 2025	My co-first-authored paper Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing is accepted to CVPR 2025</a>! Congratulations to my summer intern Hanhui!
Jan 22, 2025	Our paper When is Task Vector Provably Effective Model Editing? A Generalization Analysis of Nonlinear Transformers is accepted to ICLR 2025 as an Oral Presentation (only 1.8% acceptance rate)!
Jan 21, 2025	I am awarded with the CPAL Rising Star Award 2025 and will give a presentation at Stanford in March 2025.!
Jan 20, 2025	My new technical post From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning (千呼万唤始出来：DeepSeek-R1 如何通过强化学习实现复杂推理) is now online! English and Chinese versions both available!
Jan 15, 2025	My new technical post A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons (关于 MoE 大模型负载均衡策略演进的回顾：坑点与经验教训) is now online! English and Chinese versions both available!
Jan 10, 2025	I am awarded with the IBM PhD Fellowship 2024-2025!
Dec 15, 2024	My new technical post Patching the Foundation Models: Pitfalls and Pains in Machine Unlearning (给大模型打打补丁：机器反学习方法中的陷阱与痛点) is now online! English and Chinese versions both available!

First-Authored Publications

See a full publication list at here.

CVPR’25
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

Hanhui Wang, Yihua Zhang, Ruizheng Bai, Yue Zhao, Sijia Liu, and Zhengzhong Tu

In The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025 2025

Abs Paper Code

Recent advancements in diffusion models have made generative image editing more accessible than ever. While these developments allow users to generate creative edits with ease, they also raise significant ethical concerns, particularly regarding malicious edits to human portraits that threaten individuals’ privacy and identity security. Existing general-purpose image protection methods primarily focus on generating adversarial perturbations to nullify edit effects. However, these approaches often exhibit instability to protect against diverse editing requests. In this work, we introduce a novel perspective to personal human portrait protection against malicious editing. Unlike traditional methods aiming to prevent edits from taking effect, our method, FACELOCK, optimizes adversarial perturbations to ensure that original biometric information—such as facial features—is either destroyed or substantially altered post-editing, rendering the subject in the edited output biometrically unrecognizable. Our approach innovatively integrates facial recognition and visual perception factors into the perturbation optimization process, ensuring robust protection against a variety of editing attempts. Besides, we shed light on several critical issues with commonly used evaluation metrics in image editing and reveal cheating methods by which they can be easily manipulated, leading to deceptive assessments of protection. Through extensive experiments, we demonstrate that FACELOCK significantly outperforms all baselines in defense performance against a wide range of malicious edits. Moreover, our method also exhibits strong robustness against purification techniques. Comprehensive ablation studies confirm the stability and broad applicability of our method across diverse diffusion-based editing algorithms. Our work not only advances the state-of-the-art in biometric defense but also sets the foundation for more secure and privacy-preserving practices in image editing. The code is publicly available at: https://github.com/taco-group/FaceLock.
@inproceedings{wang2025edit, title = {Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing}, author = {Wang, Hanhui and Zhang, Yihua and Bai, Ruizheng and Zhao, Yue and Liu, Sijia and Tu, Zhengzhong}, booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025}, year = {2025} }
NeurIPS’24 D&B
UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models

Yihua Zhang, Chongyu Fan, Yimeng Zhang, Yuguang Yao, Jinghan Jia, Jiancheng Liu, Gaoyuan Zhang, Gaowen Liu, Ramana Kompella, Xiaoming Liu, and 1 more author

In Thirty-eighth Conference on Neural Information Processing Systems 2024

Abs Paper Code Video Website

The technological advancements in diffusion models (DMs) have demonstrated unprecedented capabilities in text-to-image generation and are widely used in diverse applications. However, they have also raised significant societal concerns, such as the generation of harmful content and copyright disputes. Machine unlearning (MU) has emerged as a promising solution, capable of removing undesired generative capabilities from DMs. However, existing MU evaluation systems present several key challenges that can result in incomplete and inaccurate assessments. To address these issues, we propose UNLEARNCANVAS, a comprehensive highresolution stylized image dataset that facilitates the evaluation of the unlearning of artistic styles and associated objects. This dataset enables the establishment of a standardized, automated evaluation framework with 7 quantitative metrics assessing various aspects of the unlearning performance for DMs. Through extensive experiments, we benchmark 9 state-of-the-art MU methods for DMs, revealing novel insights into their strengths, weaknesses, and underlying mechanisms. Additionally, we explore challenging unlearning scenarios for DMs to evaluate worst-case performance against adversarial prompts, the unlearning of finer-scale concepts, and sequential unlearning. We hope that this study can pave the way for developing more effective, accurate, and robust DM unlearning methods, ensuring safer and more ethical applications of DMs in the future.
@inproceedings{zhang2024unlearncanvas, video = {https://www.youtube.com/watch?v=lC_R_b9ZiH8}, dataset = {https://huggingface.co/datasets/OPTML-Group/UnlearnCanvas}, benchmark = {https://huggingface.co/spaces/OPTML-Group/UnlearnCanvas-Benchmark}, title = {UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models}, author = {Zhang, Yihua and Fan, Chongyu and Zhang, Yimeng and Yao, Yuguang and Jia, Jinghan and Liu, Jiancheng and Zhang, Gaoyuan and Liu, Gaowen and Kompella, Ramana and Liu, Xiaoming and Liu, Sijia}, booktitle = {Thirty-eighth Conference on Neural Information Processing Systems}, year = {2024} }
ICML’24
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, and 3 more authors

In arXiv preprint arXiv:2402.11592 Feb 2024

Abs Paper Code Poster

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow in size, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications like on-device training where memory efficiency is paramount. This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during LLM fine-tuning, building on the initial concept introduced by Malladi et al. (2023). Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques, through a comprehensive, firstof-its-kind benchmarking study across five LLM families (Roberta, OPT, LLaMA, Vicuna, Mistral), three task complexities, and five fine-tuning schemes. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance. We further introduce novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity. Our study offers a promising direction for achieving further memory-efficient LLM fine-tuning. Codes to reproduce all our experiments are at https://github.com/ZO-Bench/ZO-LLM.
@inproceedings{zhang2024revisiting, title = {Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark }, author = {Zhang, Yihua and Li, Pingzhi and Hong, Junyuan and Li, Jiaxiang and Zhang, Yimeng and Zheng, Wenqing and Chen, Pin-Yu and Lee, Jason D. and Yin, Wotao and Hong, Mingyi and Wang, Zhangyang and Liu, Sijia and Chen, Tianlong}, booktitle = {arXiv preprint arXiv:2402.11592}, month = feb, year = {2024} }
IEEE SPM
An Introduction to Bi-level Optimization: Foundations and Applications in Signal Processing and Machine Learning

Yihua Zhang, Prashant Khanduri, Ioannis Tsaknakis, Yuguang Yao, Mingyi Hong, and Sijia Liu

In arxiv 2308.00788 Aug 2023

Abs Paper Code

Recently, bi-level optimization (BLO) has taken center stage in some very exciting developments in the area of signal processing (SP) and machine learning (ML). Roughly speaking, BLO is a classical optimization problem that involves two levels of hierarchy (i.e., upper and lower levels), wherein obtaining the solution to the upper-level problem requires solving the lower-level one. BLO has become popular largely because it is powerful in modeling problems in SP and ML, among others, that involve optimizing nested objective functions. Prominent applications of BLO range from resource allocation for wireless systems to adversarial machine learning. In this work, we focus on a class of tractable BLO problems that often appear in SP and ML applications. We provide an overview of some basic concepts of this class of BLO problems, such as their optimality conditions, standard algorithms (including their optimization principles and practical implementations), as well as how they can be leveraged to obtain state-of-the-art results for a number of key SP and ML applications. Further, we discuss some recent advances in BLO theory, its implications for applications, and point out some limitations of the state-of-the-art that require significant future research efforts. Overall, we hope that this article can serve to accelerate the adoption of BLO as a generic tool to model, analyze, and innovate on a wide array of emerging SP and ML applications.
@inproceedings{zhang2023introduction, title = {An Introduction to Bi-level Optimization: Foundations and Applications in Signal Processing and Machine Learning}, author = {Zhang, Yihua and Khanduri, Prashant and Tsaknakis, Ioannis and Yao, Yuguang and Hong, Mingyi and Liu, Sijia}, month = aug, year = {2023} }
NeurIPS’23
Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning

Yihua Zhang, Yimeng Zhang, Aochuan Chen, Jinghan Jia, Jiancheng Liu, Gaowen Liu, Mingyi Hong, Shiyu Chang, and Sijia Liu

In Thirty-seventh Conference on Neural Information Processing Systems Aug 2023

Abs Paper Code Poster Slides Website

Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by identifying and removing redundant training samples without sacrificing performance. In this work, we aim to address the problem of DP for transfer learning, i.e., how to prune a source dataset for improved pretraining efficiency and lossless finetuning accuracy on downstream target tasks. To our best knowledge, the problem of DP for transfer learning remains open, as previous studies have primarily addressed DP and transfer learning as separate problems. By contrast, we establish a unified viewpoint to integrate DP with transfer learning and find that existing DP methods are not suitable for the transfer learning paradigm. We then propose two new DP methods, label mapping and feature mapping, for supervised and self-supervised pretraining settings respectively, by revisiting the DP problem through the lens of source-target domain mapping. Furthermore, we demonstrate the effectiveness of our approach on numerous transfer learning tasks. We show that source data classes can be pruned by up to 40% without sacrificing the downstream performance, resulting in a significant 2 5 times speed-up during the pretraining stage. Besides, our proposal exhibits broad applicability and can improve other computationally intensive transfer learning techniques, such as adversarial pretraining.
@inproceedings{zhang2023selectivity, title = {Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning}, author = {Zhang, Yihua and Zhang, Yimeng and Chen, Aochuan and Jia, Jinghan and Liu, Jiancheng and Liu, Gaowen and Hong, Mingyi and Chang, Shiyu and Liu, Sijia}, booktitle = {Thirty-seventh Conference on Neural Information Processing Systems}, year = {2023}, post = {https://pruning.netlify.app/}, }
ICCV’23
Robust Mixture-of-Expert Training for Convolutional Neural Networks

Yihua Zhang, Ruisi Cai, Tianlong Chen, Guanhua Zhang, Huan Zhang, Pin-Yu Chen, Shiyu Chang, Zhangyang Wang, and Sijia Liu

In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Oct 2023

Abs Paper Code Poster Slides

Sparsely-gated Mixture of Expert (MoE), an emerging deep model architecture, has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference. Despite the growing popularity of MoE, little work investigated its potential to advance convolutional neural networks (CNNs), especially in the plane of adversarial robustness. Since the lack of robustness has become one of the main hurdles for CNNs, in this paper we ask: How to adversarially robustify a CNN-based MoE model? Can we robustly train it like an ordinary CNN model? Our pilot study shows that the conventional adversarial training (AT) mechanism (developed for vanilla CNNs) no longer remains effective to robustify an MoE-CNN. To better understand this phenomenon, we dissect the robustness of an MoE-CNN into two dimensions: Robustness of routers (i.e., gating functions to select data-specific experts) and robustness of experts (i.e., the router-guided pathways defined by the subnetworks of the backbone CNN). Our analyses show that routers and experts are hard to adapt to each other in the vanilla AT. Thus, we propose a new router-expert alternating Adversarial training framework for MoE, termed AdvMoE. The effectiveness of our proposal is justified across 4 commonly-used CNN model architectures over 4 benchmark datasets. We find that AdvMoE achieves 1% 4% adversarial robustness improvement over the original dense CNN, and enjoys the efficiency merit of sparsity-gated MoE, leading to more than 50% inference cost reduction.
@inproceedings{zhang2023robust, title = {Robust Mixture-of-Expert Training for Convolutional Neural Networks}, author = {Zhang, Yihua and Cai, Ruisi and Chen, Tianlong and Zhang, Guanhua and Zhang, Huan and Chen, Pin-Yu and Chang, Shiyu and Wang, Zhangyang and Liu, Sijia}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = oct, year = {2023} }
ICLR’23
What Is Missing in IRM Training and Evaluation? Challenges and Solutions

Yihua Zhang, Pranay Sharma, Parikshit Ram, Mingyi Hong, Kush Varshney, and Sijia Liu

In Eleventh International Conference on Learning Representations Oct 2023

Abs Paper Code Poster

Invariant risk minimization (IRM) has received increasing attention as a way to acquire environment-agnostic data representations and predictions, and as a principled solution for preventing spurious correlations from being learned and for improving models’ out-of-distribution generalization. Yet, recent works have found that the optimality of the originally-proposed IRM optimization (IRM) may be compromised in practice or could be impossible to achieve in some scenarios. Therefore, a series of advanced IRM algorithms have been developed that show practical improvement over IRM. In this work, we revisit these recent IRM advancements, and identify and resolve three practical limitations in IRM training and evaluation. First, we find that the effect of batch size during training has been chronically overlooked in previous studies, leaving room for further improvement. We propose small-batch training and highlight the improvements over a set of large-batch optimization techniques. Second, we find that improper selection of evaluation environments could give a false sense of invariance for IRM. To alleviate this effect, we leverage diversified test-time environments to precisely characterize the invariance of IRM when applied in practice. Third, we revisit (Ahuja et al. (2020))’s proposal to convert IRM into an ensemble game and identify a limitation when a single invariant predictor is desired instead of an ensemble of individual predictors. We propose a new IRM variant to address this limitation based on a novel viewpoint of ensemble IRM games as consensus-constrained bi-level optimization. Lastly, we conduct extensive experiments (covering 7 existing IRM variants and 7 datasets) to justify the practical significance of revisiting IRM training and evaluation in a principled manner.
@inproceedings{zhang2023what, title = {What Is Missing in IRM Training and Evaluation? Challenges and Solutions}, author = {Zhang, Yihua and Sharma, Pranay and Ram, Parikshit and Hong, Mingyi and Varshney, Kush and Liu, Sijia}, booktitle = {Eleventh International Conference on Learning Representations}, year = {2023} }
NeurIPS’22
Advancing Model Pruning via Bi-level Optimization

Yihua Zhang*, Yuguang Yao*, Parikshit Ram, Pu Zhao, Tianlong Chen, Mingyi Hong, Yanzhi Wang, and Sijia Liu

In Thirty-sixth Conference on Neural Information Processing Systems Oct 2022

Abs Paper Code Poster Slides Video

The deployment constraints in practical applications necessitate the pruning of large-scale deep learning models, i.e., promoting their weight sparsity. As illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the potential of improving their generalization ability. At the core of LTH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find ‘winning tickets’. Yet, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient ‘one-shot’ pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP. This raises the question of how to close the gap between pruning accuracy and pruning efficiency? To tackle it, we pursue the algorithmic advancement of model pruning. Specifically, we formulate the pruning problem from a fresh and novel viewpoint, bi-level optimization (BLO). We show that the BLO interpretation provides a technically-grounded optimization base for an efficient implementation of the pruning-retraining learning paradigm used in IMP. We also show that the proposed bi-level optimization-oriented pruning method (termed BiP) is a special class of BLO problems with a bi-linear problem structure. By leveraging such bi-linearity, we theoretically show that BiP can be solved as easily as first-order optimization, thus inheriting the computation efficiency. Through extensive experiments on both structured and unstructured pruning with 5 model architectures and 4 data sets, we demonstrate that BiP can find better winning tickets than IMP in most cases, and is computationally as efficient as the one-shot pruning schemes, demonstrating 2-7\times speedup over IMP for the same level of model accuracy and sparsity.
@inproceedings{zhang2022advancing, title = {Advancing Model Pruning via Bi-level Optimization}, video = {https://youtu.be/eeKITiOOTaE}, author = {Zhang*, Yihua and Yao*, Yuguang and Ram, Parikshit and Zhao, Pu and Chen, Tianlong and Hong, Mingyi and Wang, Yanzhi and Liu, Sijia}, booktitle = {Thirty-sixth Conference on Neural Information Processing Systems}, year = {2022} }
NeurIPS’22
Fairness Reprogramming

Guanhua Zhang*, Yihua Zhang*, Yang Zhang, Wenqi Fan, Qing Li, Sijia Liu, and Shiyu Chang

In Thirty-sixth Conference on Neural Information Processing Systems Oct 2022

Abs Paper Code Poster

Despite a surge of recent advances in promoting machine Learning (ML) fairness, the existing mainstream approaches mostly require training or finetuning the entire weights of the neural network to meet the fairness criteria. However, this is often infeasible in practice for those large-scale trained models due to large computational and storage costs, low data efficiency, and model privacy issues. In this paper, we propose a new generic fairness learning paradigm, called FairReprogram, which incorporates the model reprogramming technique. Specifically, FairReprogram considers the neural model fixed, and instead appends to the input a set of perturbations, called the fairness trigger, which is tuned towards the fairness criteria under a min-max formulation. We further introduce an information-theoretic framework that explains why and under what conditions fairness goals can be achieved using the fairness trigger. We show both theoretically and empirically that the fairness trigger can effectively obscure demographic biases in the output prediction of fixed ML models by providing false demographic information that hinders the model from utilizing the correct demographic information to make the prediction. Extensive experiments on both NLP and CV datasets demonstrate that our method can achieve better fairness improvements than retraining-based methods with far less training cost and data dependency under two widely-used fairness criteria.
@inproceedings{zhang2022fairness, title = {Fairness Reprogramming}, author = {Zhang*, Guanhua and Zhang*, Yihua and Zhang, Yang and Fan, Wenqi and Li, Qing and Liu, Sijia and Chang, Shiyu}, booktitle = {Thirty-sixth Conference on Neural Information Processing Systems}, year = {2022}, }
ICML’22
Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization

Yihua Zhang*, Guanhua Zhang*, Prashant Khanduri, Mingyi Hong, Shiyu Chang, and Sijia Liu

In Proceedings of the 39th International Conference on Machine Learning Oct 2022

Abs Paper Code Poster Slides Video

Adversarial training (AT) is a widely recognized defense mechanism to gain the robustness of deep neural networks against adversarial attacks. It is built on min-max optimization (MMO), where the minimizer (i.e., defender) seeks a robust model to minimize the worst-case training loss in the presence of adversarial examples crafted by the maximizer (i.e., attacker). However, the conventional MMO method makes AT hard to scale. Thus, Fast-AT and other recent algorithms attempt to simplify MMO by replacing its maximization step with the single gradient sign-based attack generation step. Although easy to implement, FAST-AT lacks theoretical guarantees, and its empirical performance is unsatisfactory due to the issue of robust catastrophic overfitting when training with strong adversaries. In this paper, we advance Fast-AT from the fresh perspective of bi-level optimization (BLO). We first show that the commonly-used Fast-AT is equivalent to using a stochastic gradient algorithm to solve a linearized BLO problem involving a sign operation. However, the discrete nature of the sign operation makes it difficult to understand the algorithm performance. Inspired by BLO, we design and analyze a new set of robust training algorithms termed Fast Bi-level AT (Fast-BAT), which effectively defends sign-based projected gradient descent (PGD) attacks without using any gradient sign method or explicit robust regularization. In practice, we show that our method yields substantial robustness improvements over multiple baselines across multiple models and datasets.
@inproceedings{zhang2022revisiting, title = {Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization}, author = {Zhang*, Yihua and Zhang*, Guanhua and Khanduri, Prashant and Hong, Mingyi and Chang, Shiyu and Liu, Sijia}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {26693--26712}, year = {2022}, publisher = {PMLR} }
CVPR’22
Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

Tianlong Chen*, Zhenyu Zhang*, Yihua Zhang*, Shiyu Chang, Sijia Liu, and Zhangyang Wang

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Oct 2022

Abs Paper Code Poster

Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a particular trigger. Several works attempt to detect whether a given DNN has been injected with a specific trigger during the training. In a parallel line of research, the lottery ticket hypothesis reveals the existence of sparse subnetworks which are capable of reaching competitive performance as the dense network after independent training. Connecting these two dots, we investigate the problem of Trojan DNN detection from the brand new lens of sparsity, even when no clean training data is available. Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork. Extensive experiments on various datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, with different network architectures, i.e., VGG-16, ResNet-18, ResNet-20s, and DenseNet-100 demonstrate the effectiveness of our proposal.
@inproceedings{chen2022quarantine, title = {Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free}, author = {Chen*, Tianlong and Zhang*, Zhenyu and Zhang*, Yihua and Chang, Shiyu and Liu, Sijia and Wang, Zhangyang}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages = {598--609}, year = {2022} }