PTBR-40M LLM

PTBR-40M LLM is a small Portuguese causal language model (~40M parameters) trained on a mixture of Portuguese web text and reasoning data.

The model is designed to demonstrate that functional language models can be trained quickly on a single GPU.

Training can be completed in approximately 30–40 minutes on a T4 GPU using small dataset slices.

Model Details

Architecture

Property	Value
Parameters	~40M
Layers	12
Hidden size	512
Attention heads	8
Context length	256 tokens
Positional encoding	RoPE

Framework:

Transformers

Training Data

The model was trained on a mixture of two datasets:

Portuguese reasoning dataset

Dataset:

corre-social/s1_dataset_ptbr_1k_tokenized

Contains:

reasoning examples
chain-of-thought style explanations
Portuguese instructional data

Portuguese web corpus

Dataset:

Madras1/corpus-ptbr-v1

Contains:

large Portuguese text corpus
mixed web content
billions of tokens in the full dataset

For training speed, only a subset of the corpus was used.

Training Procedure

Training configuration:

Parameter	Value
Epochs	1
Batch size	16
Gradient accumulation	2
Learning rate	4e-4
Context length	256
Precision	FP16

Hardware:

NVIDIA T4 GPU

Training time:

~30–40 minutes

Usage

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="username/ptbr-40m-llm"
)

print(generator(
    "Explique o que é inteligência artificial:",
    max_new_tokens=80
))

Downloads last month: 6

Safetensors

Model size

89.3M params

Tensor type

F32

Model tree for PatoFlamejanteTV/QuackPTBR40M-Train

Base model

PatoFlamejanteTV/QuackPTBR40M

Finetuned

(1)

this model

PatoFlamejanteTV
/

QuackPTBR40M-Train