Model Card for Gemma-3-Gaia-PT-BR-4b-it-mlx

This model is a MLX conversion and 4-bit quantization of the CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it model. This enables fast inference on Apple Silicon, while keeping the parent model evaluations.

Usage

uv add mlx mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("RGarrido03/Gemma-3-Gaia-PT-BR-4b-it-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Conversion

The conversion was performed using the mlx_lm.convert Python module using the following options:

  • Quantization: 4 bits
  • dtype: bfloat16
uv run mlx_lm.convert --hf-path CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it -q --mlx-path Gemma-3-Gaia-PT-BR-4b-it-mlx --dtype bfloat16

Model Card Authors

André Ribeiro @andreribeiro87

Rúben Garrido @RGarrido03

Downloads last month
52
Safetensors
Model size
0.6B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RGarrido03/Gemma-3-Gaia-PT-BR-4b-it-mlx

Quantized
(17)
this model