Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.

Non-WSU users: Please talk to your librarian about requesting this thesis through interlibrary loan.

Access Type

WSU Access

Date of Award

January 2024

Degree Type

Thesis

Degree Name

M.S.

Department

Computer Science

First Advisor

Dongxiao D. Zhu

Abstract

In-context learning (ICL) has emerged as a powerful paradigm leveraging LLMs for specific downstream tasks by utilizing labeled examples as demonstrations in the precondition prompts. Despite its promising performance, ICL suffers from instability with the choice and arrangement of examples. Additionally, crafted adversarial attacks pose a notable threat tothe robustness of ICL. However, existing attacks are either easy to detect, rely on external models, or lack specificity towards ICL. This thesis introduces a novel transferable attack for ICL to address these issues, aiming to hijack LLMs to generate the targeted response. The proposed hijacking attack leverages a gradient-based prompt search method to learn and append imperceptible adversarial suffixes to the in-context demonstrations. Extensive experimental results on various tasks and datasets demonstrate the effectiveness of our hijacking attack, resulting in distracted attention towards adversarial tokens and consequently leading to unwanted target outputs. We also propose a defense strategy against hijacking attacks through the use of extra demonstrations, which enhances the robustness of LLMs during ICL. Broadly, this work reveals the significant security vulnerabilities of LLMs and emphasizes the necessity for in-depth studies on the robustness of LLMs related to ICL.

Off-campus Download

Share

COinS