Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.
Non-WSU users: Please talk to your librarian about requesting this thesis through interlibrary loan.
Access Type
WSU Access
Date of Award
January 2025
Degree Type
Thesis
Degree Name
M.S.
Department
Computer Science
First Advisor
Amiangshu Bosu
Abstract
Sexist and misogynistic behavior remains a significant barrier to inclusion in technical communities such as GitHub. Many developers initially join open-source projects, but experiences of microaggressions, dismissiveness, and subtle gender biases often drive them away, leading to high attrition, especially among minority groups. However, existing moderation tools—often limited to keyword filtering or simple binary classifications—frequently fail to detect such nuanced forms of harm. This study addresses a critical gap in existing moderation tools by introducing a fine-grained and interpretable classification framework.To achieve meaningful moderation, we aim to move beyond binary classification toward a deeper understanding of harmful behavior, identifying twelve distinct categories of sexist and misogynistic content in GitHub comments. Additionally, we investigate how different prompt design choices influence the performance of large language models (LLMs) in detecting subtle, context-dependent forms of harm. We collected a dataset of 11,007 GitHub comments and selected 1,422 representative examples that covered twelve behaviorally defined harm categories for evaluation. We then evaluated instruction-tuned LLMs (GPT-4o, LLaMA 3.3, and Mistral 7B) using a few-shot prompting pipeline, refining the prompts iteratively to improve classification accuracy. The final prompt design incorporated detailed category definitions, explicit output format constraints, confidence scores for each decision, and brief model explanations. We also explored parameter tuning—such as temperature, top-p, and max tokens—to improve classification accuracy. We assessed the models’ performance using precision, recall, F1-score, and Matthews Correlation Coefficient (MCC). Iterative prompt refinement led to significant improvements in classifying nuanced categories such as Discredit, Damning, and Victim Blaming, which were previously prone to misclassification. The best-performing configuration (Prompt 19 with GPT-4o) achieved an MCC of 0.4967, notably demonstrating a moderate correlation between predicted and actual classifications and showcasing the effectiveness of our approach in capturing subtle and context-sensitive harm. We also found that GPT-4o consistently outperformed the other models, although overall performance varied across model architectures, highlighting the importance of model selection in moderation tasks. Our results demonstrate that well-crafted prompts—grounded in clear, behavior-specific definitions and structured output formats—can substantially improve both the accuracy and interpretability of LLM-based content moderation. This approach provides a scalable and transparent framework for moderating harmful discourse in software engineering communities and offers practical design recommendations for deploying LLMs in real-world moderation scenarios.
Recommended Citation
Dev, Tanni, "A Large Language Model-Based Approach To Detecting Sexism And Misogyny In Github Comments" (2025). Wayne State University Theses. 1004.
https://digitalcommons.wayne.edu/oa_theses/1004