Vaccine Stance Classifier — Portuguese (LoRA · Llama 3.1 8B)

Fine-tuned LoRA adapter for stance detection in Brazilian Portuguese vaccine-related discourse

Model on HuggingFace Paper License: MIT Language: Portuguese


Overview · Label Mapping · Quick Start · Dataset · Training · Limitations · Citation · Contact


Overview

This repository provides a LoRA (PEFT) adapter fine-tuned on top of meta-llama/Llama-3.1-8B for three-class stance classification in Portuguese vaccine-related social media comments.

The model was developed as part of the research presented in:

Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem
Accepted at ACM Web Science Conference 2026 (WebSci '26) — to be presented May 26–29, 2026, Braunschweig, Germany
DOI: 10.1145/3795766.3799768 (forthcoming)
Preprint available on ResearchGate

Note: This repository contains only the LoRA adapter weights. The base model must be loaded separately from meta-llama/Llama-3.1-8B.


Label Mapping

Label Class Description
0 Against Explicitly criticizes vaccination; presents arguments against vaccines; expresses concerns about adverse effects; promotes conspiracy theories; denies scientific evidence; or articulates generalized skepticism toward vaccination.
1 Favorable Explicitly supports vaccination; expresses positive attitudes; shares pro-vaccine informational content; highlights benefits; or reports positive personal experiences with vaccines.
2 Inconclusive Does not clearly belong to either of the above categories; deviates from the vaccination topic; contains ambiguous or sarcastic language; lacks sufficient information to infer stance; or is irrelevant to the vaccination debate.

Quick Start

Installation

pip install torch transformers peft accelerate

Inference

🔑 Access Token Required
The base model meta-llama/Llama-3.1-8B is a gated model. You must:

  1. Request access at meta-llama/Llama-3.1-8B
  2. Accept Meta's license agreement on Hugging Face
  3. Generate a token at huggingface.co/settings/tokens and pass it via token= or run huggingface-cli login before loading the model
import warnings
import logging
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

warnings.filterwarnings("ignore")
logging.getLogger("transformers").setLevel(logging.ERROR)
logging.getLogger("peft").setLevel(logging.ERROR)

base_model = "meta-llama/Llama-3.1-8B"
lora_model = "gseovana/llama-vaccine-stance-ptbr-lora"

# Access token required - request access at:
# https://huggingface.co/meta-llama/Llama-3.1-8B
HF_TOKEN = "your_huggingface_token_here"

tokenizer = AutoTokenizer.from_pretrained(base_model, token=HF_TOKEN)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForSequenceClassification.from_pretrained(
    base_model,
    num_labels=3,
    dtype=torch.float16,
    device_map="auto",
    token=HF_TOKEN,
)
model.config.pad_token_id = tokenizer.pad_token_id
model = PeftModel.from_pretrained(model, lora_model, token=HF_TOKEN)
model.eval()

label_map = {0: "Against", 1: "Favorable", 2: "Inconclusive"}

text = "Vacinas são fundamentais para a saúde pública e salvam vidas."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(model.device)

with torch.no_grad():
    logits = model(**inputs).logits
    predicted_class = logits.argmax(dim=-1).item()

print(f"Predicted class: {predicted_class} -> {label_map[predicted_class]}")
# Predicted class: 1 -> Favorable

Dataset

Property Details
Domain YouTube comments - Brazilian vaccine debate
Language Brazilian Portuguese
Time Span January 2018 - July 2024
Total Comments 1,422,406
Unique Users 591,760
Videos 14,318
Channels 3,897
Vaccines Covered 19 (Brazilian National Immunization Schedule — PNI)
Annotation Method Manual (3 independent annotators) + pseudo-labels via semi-supervised self-training
Labeled set size 3,476 comments (majority-vote annotated)
Annotation Agreement (κ) 0.69 - substantial agreement (Fleiss' Kappa)
Label distribution (manual) 446 Against · 295 Favorable · 2,735 Inconclusive
Final training set (manual + pseudo-labels) 5,480 comments (1,195 Against · 1,428 Favorable · 2,857 Inconclusive)

For details on data collection, preprocessing, annotation protocol, and semi-supervised enrichment strategy, refer to the paper.


Training Details

Hyperparameter Value
Method QLoRA (Quantized LoRA — PEFT)
Base Model meta-llama/Llama-3.1-8B
Quantization 4-bit NF4 with bfloat16 computation
LoRA rank (r) 64
LoRA alpha 16
LoRA dropout — (not applied)
Target modules q_proj, k_proj, v_proj
Max sequence length 192 tokens
Epochs Up to 20 (early stopping, patience = 3)
Batch size 128
Learning rate 2 × 10⁻⁴
Loss function Weighted cross-entropy (inverse class frequency)
Validation metric Macro F1
Cross-validation Stratified 5-fold
Precision Mixed (FP16)
Hardware 1× NVIDIA A40 48GB · Intel Xeon Gold 6442Y 2.6GHz · 512GB RAM

Limitations

Prediction quality may be reduced for:

  • Very short comments — insufficient context to determine stance
  • Sarcasm and irony — may be misclassified as Inconclusive
  • Comments requiring conversational context — isolated turns from a thread
  • The Inconclusive class — aggregates heterogeneous cases (neutral, off-topic, ambiguous), making it inherently harder to classify

This model was trained exclusively on Brazilian Portuguese YouTube comments about vaccines and may not generalize well to other domains, languages, or vaccine-unrelated health topics.


Citation

If you use this model or adapter in your research, please cite:

@inproceedings{oliveira2026vaccine,
  author    = {Geovana S. de Oliveira and Ana P. C. Silva and Fabricio Murai and Carlos H. G. Ferreira},
  title     = {Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem},
  booktitle = {Proceedings of the 18th ACM Web Science Conference (WebSci '26)},
  year      = {2026},
  month     = {May},
  address   = {Braunschweig, Germany},
  publisher = {ACM},
  doi       = {10.1145/3795766.3799768}
}

Contact

Geovana Silva de Oliveira
Universidade Federal de Ouro Preto (UFOP), Brazil
📧 geovana.so@aluno.ufop.edu.br
📧 gseovana.contato@gmail.com


Developed as part of ongoing research on online health misinformation in Brazil · ACM WebSci 2026
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gseovana/llama-vaccine-stance-ptbr-lora

Adapter
(692)
this model

Evaluation results