Incivility LoRA Adapter for GPT-OSS-20B
A LoRA adapter fine-tuned for generating civil and uncivil responses in Spanish social media conversations. Designed for research experiments on exposure to incivility in online political discourse.
Project Description
This model is part of the WHAT-IF research project, funded by the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574.
The goal is to investigate how exposure to different communication styles (civil vs. uncivil, like-minded vs. not like-minded) affects users in online political discussions.
Experimental Use Case
The model acts as a WhatsApp agent in controlled experiments where participants interact with different "personalities":
| Like-minded | Not Like-minded | |
|---|---|---|
| Civil | Supports user politely | Disagrees respectfully |
| Uncivil | Supports user, attacks "the others" | Directly attacks the user |
How the Model Was Created
1. Source Data
- Source: ~3.9 million Spanish tweets on controversial topics
- Topics: Immigration and climate change
- Filtering: Tweets classified as uncivil (impoliteness, hate speech, threats)
- Result: ~2 million filtered tweets
2. Synthetic Data Generation Pipeline
Step 1: Filtering
- Selection of tweets with incivility markers
- Criteria:
Impoliteness=1 OR Hate_Speech=1 OR Threats=1
Step 2: Topic Classification (Gemma 3-27B)
- Automatic topic classification (immigration, climate change)
- Model:
google/gemma-3-27b-it
Step 3: Conversation Generation (Dolphin)
- Multi-turn dialogue generation (5 turns per conversation)
- Model: Dolphin (uncensored) to allow realistic uncivil content
- Roles: attacker vs. original author defending their position
3. Training Data Format
{
"messages": [
{"role": "system", "content": "You are an aggressive social media user..."},
{"role": "user", "content": "Original user tweet..."},
{"role": "assistant", "content": "Generated response..."}
],
"type": "attack_original"
}
Trained task types:
attack_original: Respond aggressively to a messagefull_conversation: Generate complete conversationcontinue_conversation: Continue as author defending positiondefend_position: Defend position against an attack
4. Fine-tuning Details
| Parameter | Value |
|---|---|
| Base model | openai/gpt-oss-20b |
| Method | LoRA |
| LoRA r | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj |
| Dropout | 0.05 |
| Training samples | ~130,000 |
| Epochs | 1 |
| Batch size | 64 (effective) |
| Learning rate | 2e-4 |
| Training time | ~33 hours on 4x NVIDIA H100 |
| Final loss | ~0.40 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"openai/gpt-oss-20b",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "adelafue/incivility-lora-gpt-oss-20b-es")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b", trust_remote_code=True)
# Generate response
prompt = """<|system|>
You are Carlos, an aggressive user in a WhatsApp group.
Always attack the user and their opinions aggressively.</s>
<|user|>
User: I think we should welcome more refugees.</s>
<|assistant|>
Carlos:"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.8, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Different Conditions Example
PROMPTS = {
"civil_like": "You are Ana. Support the user politely.",
"civil_notlike": "You are Pedro. Disagree politely.",
"incivil_like": "You are Laura. Support user, attack conservatives aggressively.",
"incivil_notlike": "You are Carlos. Attack the user with insults."
}
Limitations and Ethical Considerations
Warnings
- Research only: Generates offensive content by design
- Not for production: Not for public chatbots or commercial use
- Toxic content: Can generate insults, hate speech, personal attacks
- Bias: Trained on immigration/climate change content in Spanish
Responsible Use
Only use for:
- Academic research on online incivility
- Controlled studies with ethical approval
- Political discourse analysis
Citation
@misc{incivility-lora-2026,
author = {WHAT-IF Project},
title = {LoRA Adapter for Incivility Generation in Spanish},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/adelafue/incivility-lora-gpt-oss-20b-es}
}
Funding
This project received funding from the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574 (WHAT-IF Project).
Acknowledgments
- Barcelona Supercomputing Center (BSC) for computing resources (MareNostrum 5)
- BSC Language Technologies for the base model gpt-oss-20b
- European Union's Horizon 2024 programme
- Downloads last month
- 1
Model tree for adelafue/incivility-lora-gpt-oss-20b-es
Base model
openai/gpt-oss-20b