Wayne State University Theses

Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.

Non-WSU users: Please talk to your librarian about requesting this thesis through interlibrary loan.

Hijacking Large Language Models Via Adversarial Incontext Learning

Yao Qiang, Wayne State University

Access Type

WSU Access

Date of Award

January 2024

Degree Type

Thesis

Degree Name

M.S.

Department

Computer Science

First Advisor

Dongxiao D. Zhu

Abstract

In-context learning (ICL) has emerged as a powerful paradigm leveraging LLMs for specific downstream tasks by utilizing labeled examples as demonstrations in the precondition prompts. Despite its promising performance, ICL suffers from instability with the choice and arrangement of examples. Additionally, crafted adversarial attacks pose a notable threat tothe robustness of ICL. However, existing attacks are either easy to detect, rely on external models, or lack specificity towards ICL. This thesis introduces a novel transferable attack for ICL to address these issues, aiming to hijack LLMs to generate the targeted response. The proposed hijacking attack leverages a gradient-based prompt search method to learn and append imperceptible adversarial suffixes to the in-context demonstrations. Extensive experimental results on various tasks and datasets demonstrate the effectiveness of our hijacking attack, resulting in distracted attention towards adversarial tokens and consequently leading to unwanted target outputs. We also propose a defense strategy against hijacking attacks through the use of extra demonstrations, which enhances the robustness of LLMs during ICL. Broadly, this work reveals the significant security vulnerabilities of LLMs and emphasizes the necessity for in-depth studies on the robustness of LLMs related to ICL.

Recommended Citation

Qiang, Yao, "Hijacking Large Language Models Via Adversarial Incontext Learning" (2024). Wayne State University Theses. 952.
https://digitalcommons.wayne.edu/oa_theses/952

Download

Off-campus Download

Report Accessibility Issue

COinS

DigitalCommons@WayneState

Wayne State University Theses

Hijacking Large Language Models Via Adversarial Incontext Learning

Access Type

Date of Award

Degree Type

Degree Name

Department

First Advisor

Abstract

Recommended Citation

Links

Browse

Author Corner

DigitalCommons@WayneState

Wayne State University Theses

Hijacking Large Language Models Via Adversarial Incontext Learning

Author

Access Type

Date of Award

Degree Type

Degree Name

Department

First Advisor

Abstract

Recommended Citation

Share

Links

Browse

Author Corner