Vaccine Stance Classifier — Portuguese (LoRA · Llama 3.1 8B)
Fine-tuned LoRA adapter for stance detection in Brazilian Portuguese vaccine-related discourse
Overview · Label Mapping · Quick Start · Dataset · Training · Limitations · Citation · Contact
Overview
This repository provides a LoRA (PEFT) adapter fine-tuned on top of meta-llama/Llama-3.1-8B for three-class stance classification in Portuguese vaccine-related social media comments.
The model was developed as part of the research presented in:
Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem
Accepted at ACM Web Science Conference 2026 (WebSci '26) — to be presented May 26–29, 2026, Braunschweig, Germany
DOI: 10.1145/3795766.3799768 (forthcoming)
Preprint available on ResearchGate
Note: This repository contains only the LoRA adapter weights. The base model must be loaded separately from meta-llama/Llama-3.1-8B.
Label Mapping
| Label | Class | Description |
|---|---|---|
| 0 | Against | Explicitly criticizes vaccination; presents arguments against vaccines; expresses concerns about adverse effects; promotes conspiracy theories; denies scientific evidence; or articulates generalized skepticism toward vaccination. |
| 1 | Favorable | Explicitly supports vaccination; expresses positive attitudes; shares pro-vaccine informational content; highlights benefits; or reports positive personal experiences with vaccines. |
| 2 | Inconclusive | Does not clearly belong to either of the above categories; deviates from the vaccination topic; contains ambiguous or sarcastic language; lacks sufficient information to infer stance; or is irrelevant to the vaccination debate. |
Quick Start
Installation
pip install torch transformers peft accelerate
Inference
🔑 Access Token Required
The base model meta-llama/Llama-3.1-8B is a gated model. You must:
- Request access at meta-llama/Llama-3.1-8B
- Accept Meta's license agreement on Hugging Face
- Generate a token at huggingface.co/settings/tokens and pass it via
token=or runhuggingface-cli loginbefore loading the model
import warnings
import logging
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
warnings.filterwarnings("ignore")
logging.getLogger("transformers").setLevel(logging.ERROR)
logging.getLogger("peft").setLevel(logging.ERROR)
base_model = "meta-llama/Llama-3.1-8B"
lora_model = "gseovana/llama-vaccine-stance-ptbr-lora"
# Access token required - request access at:
# https://huggingface.co/meta-llama/Llama-3.1-8B
HF_TOKEN = "your_huggingface_token_here"
tokenizer = AutoTokenizer.from_pretrained(base_model, token=HF_TOKEN)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForSequenceClassification.from_pretrained(
base_model,
num_labels=3,
dtype=torch.float16,
device_map="auto",
token=HF_TOKEN,
)
model.config.pad_token_id = tokenizer.pad_token_id
model = PeftModel.from_pretrained(model, lora_model, token=HF_TOKEN)
model.eval()
label_map = {0: "Against", 1: "Favorable", 2: "Inconclusive"}
text = "Vacinas são fundamentais para a saúde pública e salvam vidas."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(model.device)
with torch.no_grad():
logits = model(**inputs).logits
predicted_class = logits.argmax(dim=-1).item()
print(f"Predicted class: {predicted_class} -> {label_map[predicted_class]}")
# Predicted class: 1 -> Favorable
Dataset
| Property | Details |
|---|---|
| Domain | YouTube comments - Brazilian vaccine debate |
| Language | Brazilian Portuguese |
| Time Span | January 2018 - July 2024 |
| Total Comments | 1,422,406 |
| Unique Users | 591,760 |
| Videos | 14,318 |
| Channels | 3,897 |
| Vaccines Covered | 19 (Brazilian National Immunization Schedule — PNI) |
| Annotation Method | Manual (3 independent annotators) + pseudo-labels via semi-supervised self-training |
| Labeled set size | 3,476 comments (majority-vote annotated) |
| Annotation Agreement (κ) | 0.69 - substantial agreement (Fleiss' Kappa) |
| Label distribution (manual) | 446 Against · 295 Favorable · 2,735 Inconclusive |
| Final training set (manual + pseudo-labels) | 5,480 comments (1,195 Against · 1,428 Favorable · 2,857 Inconclusive) |
For details on data collection, preprocessing, annotation protocol, and semi-supervised enrichment strategy, refer to the paper.
Training Details
| Hyperparameter | Value |
|---|---|
| Method | QLoRA (Quantized LoRA — PEFT) |
| Base Model | meta-llama/Llama-3.1-8B |
| Quantization | 4-bit NF4 with bfloat16 computation |
| LoRA rank (r) | 64 |
| LoRA alpha | 16 |
| LoRA dropout | — (not applied) |
| Target modules | q_proj, k_proj, v_proj |
| Max sequence length | 192 tokens |
| Epochs | Up to 20 (early stopping, patience = 3) |
| Batch size | 128 |
| Learning rate | 2 × 10⁻⁴ |
| Loss function | Weighted cross-entropy (inverse class frequency) |
| Validation metric | Macro F1 |
| Cross-validation | Stratified 5-fold |
| Precision | Mixed (FP16) |
| Hardware | 1× NVIDIA A40 48GB · Intel Xeon Gold 6442Y 2.6GHz · 512GB RAM |
Limitations
Prediction quality may be reduced for:
- Very short comments — insufficient context to determine stance
- Sarcasm and irony — may be misclassified as Inconclusive
- Comments requiring conversational context — isolated turns from a thread
- The Inconclusive class — aggregates heterogeneous cases (neutral, off-topic, ambiguous), making it inherently harder to classify
This model was trained exclusively on Brazilian Portuguese YouTube comments about vaccines and may not generalize well to other domains, languages, or vaccine-unrelated health topics.
Citation
If you use this model or adapter in your research, please cite:
@inproceedings{oliveira2026vaccine,
author = {Geovana S. de Oliveira and Ana P. C. Silva and Fabricio Murai and Carlos H. G. Ferreira},
title = {Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem},
booktitle = {Proceedings of the 18th ACM Web Science Conference (WebSci '26)},
year = {2026},
month = {May},
address = {Braunschweig, Germany},
publisher = {ACM},
doi = {10.1145/3795766.3799768}
}
Contact
Geovana Silva de Oliveira
Universidade Federal de Ouro Preto (UFOP), Brazil
📧 geovana.so@aluno.ufop.edu.br
📧 gseovana.contato@gmail.com
- Downloads last month
- 1
Model tree for gseovana/llama-vaccine-stance-ptbr-lora
Base model
meta-llama/Llama-3.1-8BEvaluation results
- Precision on againstself-reported0.870
- Recall on againstself-reported0.890
- F1 on againstself-reported0.880
- Precision on favorableself-reported0.910
- Recall on favorableself-reported0.910
- F1 on favorableself-reported0.910
- Precision on inconclusiveself-reported0.950
- Recall on inconclusiveself-reported0.940
- F1 on inconclusiveself-reported0.940
- Accuracy on macro-avgself-reported0.920