incivility-gpt-oss-20b-es

A fine-tuned language model for generating civil and uncivil responses in Spanish social media conversations. Designed for research experiments on exposure to incivility in online political discourse.

Project Description

This model is part of the WHAT-IF research project, funded by the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574.

The goal is to investigate how exposure to different communication styles (civil vs. uncivil, like-minded vs. not like-minded) affects users in online political discussions.

Experimental Use Case

The model acts as a WhatsApp agent in controlled experiments where participants interact with different "personalities":

	Like-minded	Not Like-minded
Civil	Supports user politely	Disagrees respectfully
Uncivil	Supports user, attacks "the others"	Directly attacks the user

How the Model Was Created

1. Source Data

Source: ~3.9 million Spanish tweets on controversial topics
Topics: Immigration and climate change
Filtering: Tweets classified as uncivil (impoliteness, hate speech, threats)
Result: ~2 million filtered tweets

2. Synthetic Data Generation

3-step pipeline to generate synthetic conversations:

Step 1: Filtering

Selection of tweets with incivility markers
Criteria: Impoliteness=1 OR Hate_Speech=1 OR Threats=1

Step 2: Topic Classification (Gemma 3-27B)

Automatic topic classification (immigration, climate change)
Model: google/gemma-3-27b-it

Step 3: Conversation Generation (Dolphin)

Multi-turn dialogue generation (5 turns per conversation)
Model: Dolphin (uncensored) to allow realistic uncivil content
Roles: attacker vs. original author defending their position

3. Training Data Format

{
  "messages": [
    {"role": "system", "content": "You are an aggressive social media user..."},
    {"role": "user", "content": "Original user tweet..."},
    {"role": "assistant", "content": "Generated response..."}
  ],
  "type": "attack_original"
}

Trained task types:

attack_original: Respond aggressively to a message
full_conversation: Generate complete conversation
continue_conversation: Continue as author defending position
defend_position: Defend position against an attack

4. Fine-tuning Details

Base model: BSC-LT/gpt-oss-20b (Spanish 20B parameter model)
Method: LoRA (Low-Rank Adaptation)
LoRA configuration:
- r=16
- lora_alpha=32
- target_modules=["q_proj", "k_proj", "v_proj", "o_proj"]
- lora_dropout=0.05
Data: ~130,000 training samples
Epochs: 1
Effective batch size: 64
Learning rate: 2e-4
Training time: ~33 hours on 4x NVIDIA H100
Final loss: ~0.40

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "adelafue/incivility-gpt-oss-20b-es",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("adelafue/incivility-gpt-oss-20b-es", trust_remote_code=True)

# Example: Generate uncivil response
prompt = """<|system|>
You are Carlos, an aggressive user in a WhatsApp group.
You always attack the user and their opinions aggressively.
Use insults and be dismissive.</s>
<|user|>
User: I think we should welcome more refugees.</s>
<|assistant|>
Carlos:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    do_sample=True,
    top_p=0.9
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example with Different Conditions

PROMPTS = {
    "civil_like": "You are Ana. You support the user politely and respectfully.",
    "civil_notlike": "You are Pedro. You disagree with the user but remain polite.",
    "incivil_like": "You are Laura. You support the user but aggressively attack those who think differently.",
    "incivil_notlike": "You are Carlos. You directly attack the user with insults."
}

def generate_response(user_message, condition="civil_like"):
    prompt = f"""<|system|>
{PROMPTS[condition]}</s>
<|user|>
User: {user_message}</s>
<|assistant|>
"""
    # ... generate response

Limitations and Ethical Considerations

Warnings

Research only: This model generates offensive content by experimental design
Not for production: Not appropriate for public chatbots or commercial applications
Toxic content: The model can generate insults, hate speech, and personal attacks
Political bias: Trained primarily on content about immigration and climate change in Spanish

Responsible Use

This model should only be used for:

Academic research on online incivility
Controlled studies with ethical approval
Political discourse analysis

Repository Structure

.
├── config.json
├── generation_config.json
├── model-00001-of-00009.safetensors
├── model-00002-of-00009.safetensors
├── ...
├── model.safetensors.index.json
├── special_tokens_map.json
├── tokenizer.json
├── tokenizer_config.json
└── README.md

Citation

@misc{incivility-gpt-oss-2026,
  author = {WHAT-IF Project},
  title = {GPT-OSS-20B Fine-tuned for Incivility Generation in Spanish},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/adelafue/incivility-gpt-oss-20b-es}},
  note = {Fine-tuned on synthetic incivility data for research purposes}
}

Funding

This project received funding from the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574 (WHAT-IF Project).

Acknowledgments

Barcelona Supercomputing Center (BSC) for computing resources (MareNostrum 5)
BSC Language Technologies for the base model gpt-oss-20b
European Union's Horizon 2024 programme

Contact

For questions about the model or the research project, please contact the authors of the WHAT-IF Project.

Downloads last month: 43

Safetensors

Model size

21B params

Tensor type

BF16