Incivility LoRA Adapter for GPT-OSS-20B

A LoRA adapter fine-tuned for generating civil and uncivil responses in Spanish social media conversations. Designed for research experiments on exposure to incivility in online political discourse.

Project Description

This model is part of the WHAT-IF research project, funded by the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574.

The goal is to investigate how exposure to different communication styles (civil vs. uncivil, like-minded vs. not like-minded) affects users in online political discussions.

Experimental Use Case

The model acts as a WhatsApp agent in controlled experiments where participants interact with different "personalities":

	Like-minded	Not Like-minded
Civil	Supports user politely	Disagrees respectfully
Uncivil	Supports user, attacks "the others"	Directly attacks the user

How the Model Was Created

1. Source Data

Source: ~3.9 million Spanish tweets on controversial topics
Topics: Immigration and climate change
Filtering: Tweets classified as uncivil (impoliteness, hate speech, threats)
Result: ~2 million filtered tweets

2. Synthetic Data Generation Pipeline

Step 1: Filtering

Selection of tweets with incivility markers
Criteria: Impoliteness=1 OR Hate_Speech=1 OR Threats=1

Step 2: Topic Classification (Gemma 3-27B)

Automatic topic classification (immigration, climate change)
Model: google/gemma-3-27b-it

Step 3: Conversation Generation (Dolphin)

Multi-turn dialogue generation (5 turns per conversation)
Model: Dolphin (uncensored) to allow realistic uncivil content
Roles: attacker vs. original author defending their position

3. Training Data Format

{
  "messages": [
    {"role": "system", "content": "You are an aggressive social media user..."},
    {"role": "user", "content": "Original user tweet..."},
    {"role": "assistant", "content": "Generated response..."}
  ],
  "type": "attack_original"
}

Trained task types:

attack_original: Respond aggressively to a message
full_conversation: Generate complete conversation
continue_conversation: Continue as author defending position
defend_position: Defend position against an attack

4. Fine-tuning Details

Parameter	Value
Base model	`openai/gpt-oss-20b`
Method	LoRA
LoRA r	16
LoRA alpha	32
Target modules	q_proj, k_proj, v_proj, o_proj
Dropout	0.05
Training samples	~130,000
Epochs	1
Batch size	64 (effective)
Learning rate	2e-4
Training time	~33 hours on 4x NVIDIA H100
Final loss	~0.40

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "openai/gpt-oss-20b",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "adelafue/incivility-lora-gpt-oss-20b-es")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b", trust_remote_code=True)

# Generate response
prompt = """<|system|>
You are Carlos, an aggressive user in a WhatsApp group.
Always attack the user and their opinions aggressively.</s>
<|user|>
User: I think we should welcome more refugees.</s>
<|assistant|>
Carlos:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.8, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Different Conditions Example

PROMPTS = {
    "civil_like": "You are Ana. Support the user politely.",
    "civil_notlike": "You are Pedro. Disagree politely.",
    "incivil_like": "You are Laura. Support user, attack conservatives aggressively.",
    "incivil_notlike": "You are Carlos. Attack the user with insults."
}

Limitations and Ethical Considerations

Warnings

Research only: Generates offensive content by design
Not for production: Not for public chatbots or commercial use
Toxic content: Can generate insults, hate speech, personal attacks
Bias: Trained on immigration/climate change content in Spanish

Responsible Use

Only use for:

Academic research on online incivility
Controlled studies with ethical approval
Political discourse analysis

Citation

@misc{incivility-lora-2026,
  author = {WHAT-IF Project},
  title = {LoRA Adapter for Incivility Generation in Spanish},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/adelafue/incivility-lora-gpt-oss-20b-es}
}

Funding

This project received funding from the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574 (WHAT-IF Project).

Acknowledgments

Barcelona Supercomputing Center (BSC) for computing resources (MareNostrum 5)
BSC Language Technologies for the base model gpt-oss-20b
European Union's Horizon 2024 programme

Downloads last month: 1

Model tree for adelafue/incivility-lora-gpt-oss-20b-es

Base model

openai/gpt-oss-20b

Adapter

(161)

this model