Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.

Non-WSU users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Access Type

WSU Access

Date of Award

January 2024

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

First Advisor

Dongxiao Zhu

Abstract

This dissertation advances the field of Trustworthy AI, focusing on enhancing robustness, interpretability, and fairness, especially in designing reliable systems. Chapter 2 introduces "AttCAT: Explaining Transformers via Attentive Class Activation Tokens," which proposes a novel method for generating accurate explanations for Transformer models using attentive class activation tokens to assess input token impacts. In Chapter 3, "Counterfactual Interpolation Augmentation (CIA): A Unified Approach to Enhance Fairness and Explainability of DNN," a method is presented to improve fairness and explainability in deep neural networks. This is achieved through counterfactual interpolations that de-correlate sensitive attributes, enhancing both fairness and interpretability. Chapter 4 discusses "Hijacking Large Language Models via Adversarial In-Context Learning," revealing a new vulnerability in LLMs. It demonstrates how imperceptible adversarial suffixes can manipulate LLM outputs, underscoring the need for more robust defenses. Overall, this work contributes significantly to Trustworthy AI by proposing innovative approaches to address key challenges in AI systems’ explainability, robustness, and fairness.

Off-campus Download

Share

COinS