PTBR-40M LLM

PTBR-40M LLM is a small Portuguese causal language model (~40M parameters) trained on a mixture of Portuguese web text and reasoning data.

The model is designed to demonstrate that functional language models can be trained quickly on a single GPU.

Training can be completed in approximately 30–40 minutes on a T4 GPU using small dataset slices.


Model Details

Architecture

Property Value
Parameters ~40M
Layers 12
Hidden size 512
Attention heads 8
Context length 256 tokens
Positional encoding RoPE

Framework:

  • Transformers

Training Data

The model was trained on a mixture of two datasets:

Portuguese reasoning dataset

Dataset:

  • corre-social/s1_dataset_ptbr_1k_tokenized

Contains:

  • reasoning examples
  • chain-of-thought style explanations
  • Portuguese instructional data

Portuguese web corpus

Dataset:

  • Madras1/corpus-ptbr-v1

Contains:

  • large Portuguese text corpus
  • mixed web content
  • billions of tokens in the full dataset

For training speed, only a subset of the corpus was used.


Training Procedure

Training configuration:

Parameter Value
Epochs 1
Batch size 16
Gradient accumulation 2
Learning rate 4e-4
Context length 256
Precision FP16

Hardware:

  • NVIDIA T4 GPU

Training time:

  • ~30–40 minutes

Usage

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="username/ptbr-40m-llm"
)

print(generator(
    "Explique o que é inteligência artificial:",
    max_new_tokens=80
))
Downloads last month
6
Safetensors
Model size
89.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PatoFlamejanteTV/QuackPTBR40M-Train

Finetuned
(1)
this model

Datasets used to train PatoFlamejanteTV/QuackPTBR40M-Train