Vaccine Stance Classifier — Portuguese (LoRA · Llama 3.1 8B)

Fine-tuned LoRA adapter for stance detection in Brazilian Portuguese vaccine-related discourse

Overview · Label Mapping · Quick Start · Dataset · Training · Limitations · Citation · Contact

Overview

This repository provides a LoRA (PEFT) adapter fine-tuned on top of meta-llama/Llama-3.1-8B for three-class stance classification in Portuguese vaccine-related social media comments.

The model was developed as part of the research presented in:

Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem
Accepted at ACM Web Science Conference 2026 (WebSci '26) — to be presented May 26–29, 2026, Braunschweig, Germany
DOI: 10.1145/3795766.3799768 (forthcoming)
Preprint available on ResearchGate

Note: This repository contains only the LoRA adapter weights. The base model must be loaded separately from meta-llama/Llama-3.1-8B.

Label Mapping

Label	Class	Description
0	Against	Explicitly criticizes vaccination; presents arguments against vaccines; expresses concerns about adverse effects; promotes conspiracy theories; denies scientific evidence; or articulates generalized skepticism toward vaccination.
1	Favorable	Explicitly supports vaccination; expresses positive attitudes; shares pro-vaccine informational content; highlights benefits; or reports positive personal experiences with vaccines.
2	Inconclusive	Does not clearly belong to either of the above categories; deviates from the vaccination topic; contains ambiguous or sarcastic language; lacks sufficient information to infer stance; or is irrelevant to the vaccination debate.

Quick Start

Installation

pip install torch transformers peft accelerate

Inference

🔑 Access Token Required
The base model meta-llama/Llama-3.1-8B is a gated model. You must:

Request access at meta-llama/Llama-3.1-8B
Accept Meta's license agreement on Hugging Face
Generate a token at huggingface.co/settings/tokens and pass it via token= or run huggingface-cli login before loading the model

import warnings
import logging
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

warnings.filterwarnings("ignore")
logging.getLogger("transformers").setLevel(logging.ERROR)
logging.getLogger("peft").setLevel(logging.ERROR)

base_model = "meta-llama/Llama-3.1-8B"
lora_model = "gseovana/llama-vaccine-stance-ptbr-lora"

# Access token required - request access at:
# https://huggingface.co/meta-llama/Llama-3.1-8B
HF_TOKEN = "your_huggingface_token_here"

tokenizer = AutoTokenizer.from_pretrained(base_model, token=HF_TOKEN)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForSequenceClassification.from_pretrained(
    base_model,
    num_labels=3,
    dtype=torch.float16,
    device_map="auto",
    token=HF_TOKEN,
)
model.config.pad_token_id = tokenizer.pad_token_id
model = PeftModel.from_pretrained(model, lora_model, token=HF_TOKEN)
model.eval()

label_map = {0: "Against", 1: "Favorable", 2: "Inconclusive"}

text = "Vacinas são fundamentais para a saúde pública e salvam vidas."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(model.device)

with torch.no_grad():
    logits = model(**inputs).logits
    predicted_class = logits.argmax(dim=-1).item()

print(f"Predicted class: {predicted_class} -> {label_map[predicted_class]}")
# Predicted class: 1 -> Favorable

Dataset

Property	Details
Domain	YouTube comments - Brazilian vaccine debate
Language	Brazilian Portuguese
Time Span	January 2018 - July 2024
Total Comments	1,422,406
Unique Users	591,760
Videos	14,318
Channels	3,897
Vaccines Covered	19 (Brazilian National Immunization Schedule — PNI)
Annotation Method	Manual (3 independent annotators) + pseudo-labels via semi-supervised self-training
Labeled set size	3,476 comments (majority-vote annotated)
Annotation Agreement (κ)	0.69 - substantial agreement (Fleiss' Kappa)
Label distribution (manual)	446 Against · 295 Favorable · 2,735 Inconclusive
Final training set (manual + pseudo-labels)	5,480 comments (1,195 Against · 1,428 Favorable · 2,857 Inconclusive)

For details on data collection, preprocessing, annotation protocol, and semi-supervised enrichment strategy, refer to the paper.

Training Details

Hyperparameter	Value
Method	QLoRA (Quantized LoRA — PEFT)
Base Model	`meta-llama/Llama-3.1-8B`
Quantization	4-bit NF4 with bfloat16 computation
LoRA rank (r)	64
LoRA alpha	16
LoRA dropout	— (not applied)
Target modules	`q_proj`, `k_proj`, `v_proj`
Max sequence length	192 tokens
Epochs	Up to 20 (early stopping, patience = 3)
Batch size	128
Learning rate	2 × 10⁻⁴
Loss function	Weighted cross-entropy (inverse class frequency)
Validation metric	Macro F1
Cross-validation	Stratified 5-fold
Precision	Mixed (FP16)
Hardware	1× NVIDIA A40 48GB · Intel Xeon Gold 6442Y 2.6GHz · 512GB RAM

Limitations

Prediction quality may be reduced for:

Very short comments — insufficient context to determine stance
Sarcasm and irony — may be misclassified as Inconclusive
Comments requiring conversational context — isolated turns from a thread
The Inconclusive class — aggregates heterogeneous cases (neutral, off-topic, ambiguous), making it inherently harder to classify

This model was trained exclusively on Brazilian Portuguese YouTube comments about vaccines and may not generalize well to other domains, languages, or vaccine-unrelated health topics.

Citation

If you use this model or adapter in your research, please cite:

@inproceedings{oliveira2026vaccine,
  author    = {Geovana S. de Oliveira and Ana P. C. Silva and Fabricio Murai and Carlos H. G. Ferreira},
  title     = {Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem},
  booktitle = {Proceedings of the 18th ACM Web Science Conference (WebSci '26)},
  year      = {2026},
  month     = {May},
  address   = {Braunschweig, Germany},
  publisher = {ACM},
  doi       = {10.1145/3795766.3799768}
}

Contact

Geovana Silva de Oliveira
Universidade Federal de Ouro Preto (UFOP), Brazil
📧 geovana.so@aluno.ufop.edu.br
📧 gseovana.contato@gmail.com

_{Developed as part of ongoing research on online health misinformation in Brazil · ACM WebSci 2026}

Downloads last month: 1

Model tree for gseovana/llama-vaccine-stance-ptbr-lora

Base model

meta-llama/Llama-3.1-8B

Adapter

(692)

this model

Evaluation results

Precision on against
self-reported

0.870
Recall on against
self-reported

0.890
F1 on against
self-reported

0.880
Precision on favorable
self-reported

0.910
Recall on favorable
self-reported

0.910
F1 on favorable
self-reported

0.910
Precision on inconclusive
self-reported

0.950
Recall on inconclusive
self-reported

0.940
F1 on inconclusive
self-reported

0.940
Accuracy on macro-avg
self-reported

0.920