incivility-gpt-oss-20b-es
A fine-tuned language model for generating civil and uncivil responses in Spanish social media conversations. Designed for research experiments on exposure to incivility in online political discourse.
Project Description
This model is part of the WHAT-IF research project, funded by the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574.
The goal is to investigate how exposure to different communication styles (civil vs. uncivil, like-minded vs. not like-minded) affects users in online political discussions.
Experimental Use Case
The model acts as a WhatsApp agent in controlled experiments where participants interact with different "personalities":
| Like-minded | Not Like-minded | |
|---|---|---|
| Civil | Supports user politely | Disagrees respectfully |
| Uncivil | Supports user, attacks "the others" | Directly attacks the user |
How the Model Was Created
1. Source Data
- Source: ~3.9 million Spanish tweets on controversial topics
- Topics: Immigration and climate change
- Filtering: Tweets classified as uncivil (impoliteness, hate speech, threats)
- Result: ~2 million filtered tweets
2. Synthetic Data Generation
3-step pipeline to generate synthetic conversations:
Step 1: Filtering
- Selection of tweets with incivility markers
- Criteria:
Impoliteness=1 OR Hate_Speech=1 OR Threats=1
Step 2: Topic Classification (Gemma 3-27B)
- Automatic topic classification (immigration, climate change)
- Model:
google/gemma-3-27b-it
Step 3: Conversation Generation (Dolphin)
- Multi-turn dialogue generation (5 turns per conversation)
- Model: Dolphin (uncensored) to allow realistic uncivil content
- Roles: attacker vs. original author defending their position
3. Training Data Format
{
"messages": [
{"role": "system", "content": "You are an aggressive social media user..."},
{"role": "user", "content": "Original user tweet..."},
{"role": "assistant", "content": "Generated response..."}
],
"type": "attack_original"
}
Trained task types:
attack_original: Respond aggressively to a messagefull_conversation: Generate complete conversationcontinue_conversation: Continue as author defending positiondefend_position: Defend position against an attack
4. Fine-tuning Details
- Base model:
BSC-LT/gpt-oss-20b(Spanish 20B parameter model) - Method: LoRA (Low-Rank Adaptation)
- LoRA configuration:
r=16lora_alpha=32target_modules=["q_proj", "k_proj", "v_proj", "o_proj"]lora_dropout=0.05
- Data: ~130,000 training samples
- Epochs: 1
- Effective batch size: 64
- Learning rate: 2e-4
- Training time: ~33 hours on 4x NVIDIA H100
- Final loss: ~0.40
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"adelafue/incivility-gpt-oss-20b-es",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("adelafue/incivility-gpt-oss-20b-es", trust_remote_code=True)
# Example: Generate uncivil response
prompt = """<|system|>
You are Carlos, an aggressive user in a WhatsApp group.
You always attack the user and their opinions aggressively.
Use insults and be dismissive.</s>
<|user|>
User: I think we should welcome more refugees.</s>
<|assistant|>
Carlos:"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.8,
do_sample=True,
top_p=0.9
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Example with Different Conditions
PROMPTS = {
"civil_like": "You are Ana. You support the user politely and respectfully.",
"civil_notlike": "You are Pedro. You disagree with the user but remain polite.",
"incivil_like": "You are Laura. You support the user but aggressively attack those who think differently.",
"incivil_notlike": "You are Carlos. You directly attack the user with insults."
}
def generate_response(user_message, condition="civil_like"):
prompt = f"""<|system|>
{PROMPTS[condition]}</s>
<|user|>
User: {user_message}</s>
<|assistant|>
"""
# ... generate response
Limitations and Ethical Considerations
Warnings
- Research only: This model generates offensive content by experimental design
- Not for production: Not appropriate for public chatbots or commercial applications
- Toxic content: The model can generate insults, hate speech, and personal attacks
- Political bias: Trained primarily on content about immigration and climate change in Spanish
Responsible Use
This model should only be used for:
- Academic research on online incivility
- Controlled studies with ethical approval
- Political discourse analysis
Repository Structure
.
├── config.json
├── generation_config.json
├── model-00001-of-00009.safetensors
├── model-00002-of-00009.safetensors
├── ...
├── model.safetensors.index.json
├── special_tokens_map.json
├── tokenizer.json
├── tokenizer_config.json
└── README.md
Citation
@misc{incivility-gpt-oss-2026,
author = {WHAT-IF Project},
title = {GPT-OSS-20B Fine-tuned for Incivility Generation in Spanish},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/adelafue/incivility-gpt-oss-20b-es}},
note = {Fine-tuned on synthetic incivility data for research purposes}
}
Funding
This project received funding from the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574 (WHAT-IF Project).
Acknowledgments
- Barcelona Supercomputing Center (BSC) for computing resources (MareNostrum 5)
- BSC Language Technologies for the base model gpt-oss-20b
- European Union's Horizon 2024 programme
Contact
For questions about the model or the research project, please contact the authors of the WHAT-IF Project.
- Downloads last month
- 43