incivility-gpt-oss-20b-es

A fine-tuned language model for generating civil and uncivil responses in Spanish social media conversations. Designed for research experiments on exposure to incivility in online political discourse.

Project Description

This model is part of the WHAT-IF research project, funded by the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574.

The goal is to investigate how exposure to different communication styles (civil vs. uncivil, like-minded vs. not like-minded) affects users in online political discussions.

Experimental Use Case

The model acts as a WhatsApp agent in controlled experiments where participants interact with different "personalities":

Like-minded Not Like-minded
Civil Supports user politely Disagrees respectfully
Uncivil Supports user, attacks "the others" Directly attacks the user

How the Model Was Created

1. Source Data

  • Source: ~3.9 million Spanish tweets on controversial topics
  • Topics: Immigration and climate change
  • Filtering: Tweets classified as uncivil (impoliteness, hate speech, threats)
  • Result: ~2 million filtered tweets

2. Synthetic Data Generation

3-step pipeline to generate synthetic conversations:

Step 1: Filtering

  • Selection of tweets with incivility markers
  • Criteria: Impoliteness=1 OR Hate_Speech=1 OR Threats=1

Step 2: Topic Classification (Gemma 3-27B)

  • Automatic topic classification (immigration, climate change)
  • Model: google/gemma-3-27b-it

Step 3: Conversation Generation (Dolphin)

  • Multi-turn dialogue generation (5 turns per conversation)
  • Model: Dolphin (uncensored) to allow realistic uncivil content
  • Roles: attacker vs. original author defending their position

3. Training Data Format

{
  "messages": [
    {"role": "system", "content": "You are an aggressive social media user..."},
    {"role": "user", "content": "Original user tweet..."},
    {"role": "assistant", "content": "Generated response..."}
  ],
  "type": "attack_original"
}

Trained task types:

  • attack_original: Respond aggressively to a message
  • full_conversation: Generate complete conversation
  • continue_conversation: Continue as author defending position
  • defend_position: Defend position against an attack

4. Fine-tuning Details

  • Base model: BSC-LT/gpt-oss-20b (Spanish 20B parameter model)
  • Method: LoRA (Low-Rank Adaptation)
  • LoRA configuration:
    • r=16
    • lora_alpha=32
    • target_modules=["q_proj", "k_proj", "v_proj", "o_proj"]
    • lora_dropout=0.05
  • Data: ~130,000 training samples
  • Epochs: 1
  • Effective batch size: 64
  • Learning rate: 2e-4
  • Training time: ~33 hours on 4x NVIDIA H100
  • Final loss: ~0.40

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "adelafue/incivility-gpt-oss-20b-es",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("adelafue/incivility-gpt-oss-20b-es", trust_remote_code=True)

# Example: Generate uncivil response
prompt = """<|system|>
You are Carlos, an aggressive user in a WhatsApp group.
You always attack the user and their opinions aggressively.
Use insults and be dismissive.</s>
<|user|>
User: I think we should welcome more refugees.</s>
<|assistant|>
Carlos:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    do_sample=True,
    top_p=0.9
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example with Different Conditions

PROMPTS = {
    "civil_like": "You are Ana. You support the user politely and respectfully.",
    "civil_notlike": "You are Pedro. You disagree with the user but remain polite.",
    "incivil_like": "You are Laura. You support the user but aggressively attack those who think differently.",
    "incivil_notlike": "You are Carlos. You directly attack the user with insults."
}

def generate_response(user_message, condition="civil_like"):
    prompt = f"""<|system|>
{PROMPTS[condition]}</s>
<|user|>
User: {user_message}</s>
<|assistant|>
"""
    # ... generate response

Limitations and Ethical Considerations

Warnings

  • Research only: This model generates offensive content by experimental design
  • Not for production: Not appropriate for public chatbots or commercial applications
  • Toxic content: The model can generate insults, hate speech, and personal attacks
  • Political bias: Trained primarily on content about immigration and climate change in Spanish

Responsible Use

This model should only be used for:

  • Academic research on online incivility
  • Controlled studies with ethical approval
  • Political discourse analysis

Repository Structure

.
├── config.json
├── generation_config.json
├── model-00001-of-00009.safetensors
├── model-00002-of-00009.safetensors
├── ...
├── model.safetensors.index.json
├── special_tokens_map.json
├── tokenizer.json
├── tokenizer_config.json
└── README.md

Citation

@misc{incivility-gpt-oss-2026,
  author = {WHAT-IF Project},
  title = {GPT-OSS-20B Fine-tuned for Incivility Generation in Spanish},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/adelafue/incivility-gpt-oss-20b-es}},
  note = {Fine-tuned on synthetic incivility data for research purposes}
}

Funding

This project received funding from the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574 (WHAT-IF Project).

Acknowledgments

  • Barcelona Supercomputing Center (BSC) for computing resources (MareNostrum 5)
  • BSC Language Technologies for the base model gpt-oss-20b
  • European Union's Horizon 2024 programme

Contact

For questions about the model or the research project, please contact the authors of the WHAT-IF Project.

Downloads last month
43
Safetensors
Model size
21B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support