Incivility LoRA Adapter for GPT-OSS-20B

A LoRA adapter fine-tuned for generating civil and uncivil responses in Spanish social media conversations. Designed for research experiments on exposure to incivility in online political discourse.

Project Description

This model is part of the WHAT-IF research project, funded by the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574.

The goal is to investigate how exposure to different communication styles (civil vs. uncivil, like-minded vs. not like-minded) affects users in online political discussions.

Experimental Use Case

The model acts as a WhatsApp agent in controlled experiments where participants interact with different "personalities":

Like-minded Not Like-minded
Civil Supports user politely Disagrees respectfully
Uncivil Supports user, attacks "the others" Directly attacks the user

How the Model Was Created

1. Source Data

  • Source: ~3.9 million Spanish tweets on controversial topics
  • Topics: Immigration and climate change
  • Filtering: Tweets classified as uncivil (impoliteness, hate speech, threats)
  • Result: ~2 million filtered tweets

2. Synthetic Data Generation Pipeline

Step 1: Filtering

  • Selection of tweets with incivility markers
  • Criteria: Impoliteness=1 OR Hate_Speech=1 OR Threats=1

Step 2: Topic Classification (Gemma 3-27B)

  • Automatic topic classification (immigration, climate change)
  • Model: google/gemma-3-27b-it

Step 3: Conversation Generation (Dolphin)

  • Multi-turn dialogue generation (5 turns per conversation)
  • Model: Dolphin (uncensored) to allow realistic uncivil content
  • Roles: attacker vs. original author defending their position

3. Training Data Format

{
  "messages": [
    {"role": "system", "content": "You are an aggressive social media user..."},
    {"role": "user", "content": "Original user tweet..."},
    {"role": "assistant", "content": "Generated response..."}
  ],
  "type": "attack_original"
}

Trained task types:

  • attack_original: Respond aggressively to a message
  • full_conversation: Generate complete conversation
  • continue_conversation: Continue as author defending position
  • defend_position: Defend position against an attack

4. Fine-tuning Details

Parameter Value
Base model openai/gpt-oss-20b
Method LoRA
LoRA r 16
LoRA alpha 32
Target modules q_proj, k_proj, v_proj, o_proj
Dropout 0.05
Training samples ~130,000
Epochs 1
Batch size 64 (effective)
Learning rate 2e-4
Training time ~33 hours on 4x NVIDIA H100
Final loss ~0.40

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "openai/gpt-oss-20b",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "adelafue/incivility-lora-gpt-oss-20b-es")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b", trust_remote_code=True)

# Generate response
prompt = """<|system|>
You are Carlos, an aggressive user in a WhatsApp group.
Always attack the user and their opinions aggressively.</s>
<|user|>
User: I think we should welcome more refugees.</s>
<|assistant|>
Carlos:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.8, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Different Conditions Example

PROMPTS = {
    "civil_like": "You are Ana. Support the user politely.",
    "civil_notlike": "You are Pedro. Disagree politely.",
    "incivil_like": "You are Laura. Support user, attack conservatives aggressively.",
    "incivil_notlike": "You are Carlos. Attack the user with insults."
}

Limitations and Ethical Considerations

Warnings

  • Research only: Generates offensive content by design
  • Not for production: Not for public chatbots or commercial use
  • Toxic content: Can generate insults, hate speech, personal attacks
  • Bias: Trained on immigration/climate change content in Spanish

Responsible Use

Only use for:

  • Academic research on online incivility
  • Controlled studies with ethical approval
  • Political discourse analysis

Citation

@misc{incivility-lora-2026,
  author = {WHAT-IF Project},
  title = {LoRA Adapter for Incivility Generation in Spanish},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/adelafue/incivility-lora-gpt-oss-20b-es}
}

Funding

This project received funding from the European Union's Horizon 2024 research and innovation programme under agreement no. 101177574 (WHAT-IF Project).

Acknowledgments

  • Barcelona Supercomputing Center (BSC) for computing resources (MareNostrum 5)
  • BSC Language Technologies for the base model gpt-oss-20b
  • European Union's Horizon 2024 programme
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for adelafue/incivility-lora-gpt-oss-20b-es

Adapter
(161)
this model