Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.
Non-WSU users: Please talk to your librarian about requesting this dissertation through interlibrary loan.
Access Type
WSU Access
Date of Award
January 2024
Degree Type
Dissertation
Degree Name
Ph.D.
Department
Computer Science
First Advisor
Dongxiao Zhu
Abstract
This dissertation advances the field of Trustworthy AI, focusing on enhancing robustness, interpretability, and fairness, especially in designing reliable systems. Chapter 2 introduces "AttCAT: Explaining Transformers via Attentive Class Activation Tokens," which proposes a novel method for generating accurate explanations for Transformer models using attentive class activation tokens to assess input token impacts. In Chapter 3, "Counterfactual Interpolation Augmentation (CIA): A Unified Approach to Enhance Fairness and Explainability of DNN," a method is presented to improve fairness and explainability in deep neural networks. This is achieved through counterfactual interpolations that de-correlate sensitive attributes, enhancing both fairness and interpretability. Chapter 4 discusses "Hijacking Large Language Models via Adversarial In-Context Learning," revealing a new vulnerability in LLMs. It demonstrates how imperceptible adversarial suffixes can manipulate LLM outputs, underscoring the need for more robust defenses. Overall, this work contributes significantly to Trustworthy AI by proposing innovative approaches to address key challenges in AI systems’ explainability, robustness, and fairness.
Recommended Citation
Qiang, Yao, "Designing For Reliability: Algorithmic And Applied Perspectives On Trustworthy Artificial Intelligence" (2024). Wayne State University Dissertations. 4083.
https://digitalcommons.wayne.edu/oa_dissertations/4083