Instructions to use lballore/llimba-3b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lballore/llimba-3b-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="lballore/llimba-3b-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("lballore/llimba-3b-instruct")
model = AutoModelForCausalLM.from_pretrained("lballore/llimba-3b-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use lballore/llimba-3b-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lballore/llimba-3b-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lballore/llimba-3b-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/lballore/llimba-3b-instruct

SGLang

How to use lballore/llimba-3b-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lballore/llimba-3b-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lballore/llimba-3b-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lballore/llimba-3b-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lballore/llimba-3b-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use lballore/llimba-3b-instruct with Docker Model Runner:
```
docker model run hf.co/lballore/llimba-3b-instruct
```

LLiMba-3B-Instruct

LLiMba-3B-Instruct is a Sardinian-capable extension of Qwen2.5-3B-Instruct. It speaks fluent Sardinian (LSC, the standardized written form, with Logudorese and Campidanese accepted as input) and retains the multilingual capabilities of the base model across the languages Qwen2.5 supports. The full adaptation pipeline runs on a single 24GB consumer GPU.

Sardinian is a Romance language with roughly one million speakers, classified as endangered by UNESCO. Commercial translation services do not support it, and major LLMs do not produce it reliably. LLiMba is, to our knowledge, the first openly released LLM that can hold a Sardinian conversation, translate to and from Sardinian, and analyze Sardinian text.

This is the deployable model. For the post-CPT intermediate checkpoint (a research artifact useful only for re-running supervised fine-tuning with alternative recipes), see lballore/llimba-3b-instruct-cpt.

🎮 Try it live: lballore-llimba-demo.hf.space - interactive Gradio chat with both conversational and translation modes. No installation required.

📖 Read the paper: LLiMba: Sardinian on a Single GPU

Not to be confused with the 2024 University of Cagliari paper LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models (arXiv:2411.13453), which uses Sardinian as one of several case studies for a broader framework. Different acronym, independent project.

Quick start

from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="lballore/llimba-3b-instruct",
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "system", "content": "Ses unu assistente chi chistionat in sardu."},
    {"role": "user", "content": "Salude! Comente ìstas?"},
]

out = pipe(messages, max_new_tokens=200, do_sample=False)
print(out[0]["generated_text"][-1]["content"])
# Bene, gràtzias. E tue comente ìstas?

For translation, change the system prompt:

messages = [
    {"role": "system", "content": "Tue ses unu tradutore espertu in limba sarda LSC."},
    {"role": "user", "content": "Translate to Sardinian: «The weather is rough today.»"},
]
out = pipe(messages, max_new_tokens=200, do_sample=False)

Recommended inference parameters

LLiMba ships with do_sample=False (greedy decoding) as the default. This produces deterministic, high-quality output for translation, factual Q&A, and short conversations, and is what the published FLORES benchmark numbers were measured with.

For other use cases, override the defaults at call time:

Use case	Settings
Translation, factual Q&A	`do_sample=False` (default)
Conversational chat	`temperature=0.3, top_p=0.9, top_k=40, repetition_penalty=1.05`
Creative or long-form generation	`temperature=0.7, top_p=0.9, top_k=40, repetition_penalty=1.1`

Temperatures above 0.7 can produce occasional language-boundary drift (Sardinian to Italian) and amplify the morphological hallucination described in Limitations on long open-ended prompts. The model was trained with Romance replay data specifically to mitigate this, but the safe upper bound for production deployments is around 0.7.

Languages

LLiMba inherits Qwen2.5's multilingual coverage and adds Sardinian. The Qwen2.5 documentation explicitly names Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic as supported, with broader coverage beyond that list. Continued pretraining on Sardinian was paired with roughly 2.4M tokens of Italian, Spanish, Portuguese, and Catalan replay data to limit forgetting on the Romance branch closest to Sardinian.

For Sardinian, the primary output target is LSC (Limba Sarda Comuna), the standardized written form codified by the Regione Autonoma della Sardegna in 2006. The training corpus includes Logudorese and Campidanese material, so the model accepts dialectal input gracefully but tends to produce LSC in its outputs.

We did not run multilingual benchmarks on non-Romance languages after adaptation. Users relying on the model for languages such as Japanese or Arabic should validate on their own task before deploying.

Translation results

Evaluated on 997 parallel sentences from FLORES-200 using lm-evaluation-harness 0.4.11 with greedy decoding.

Direction	Base BLEU	LLiMba BLEU	Base chrF	LLiMba chrF
EN to SC	2.75	28.47	27.41	56.80
IT to SC	2.16	21.25	27.52	52.08
ES to SC	1.99	18.57	26.39	49.41
SC to EN	11.73	41.28	44.55	64.64
SC to IT	2.90	17.61	33.38	47.25
SC to ES	5.67	18.57	36.98	46.27

The strong SC to EN baseline (BLEU 41.28) is itself evidence that English generation survives adaptation: the base model already knew English, continued pretraining added Sardinian comprehension, and the combination yields high-quality translation out of Sardinian without specifically training for it.

Qualitative behavior

On focused factual queries about Sardinian topics, LLiMba produces verifiable answers when the underlying facts are present in the training data. Asked "Chie fiat Gigi Riva?" (Who was Gigi Riva?), it correctly identifies him as the Italian footballer born in Leggiuno in 1944, who joined Cagliari in 1963, won the 1969-70 Serie A title, scored 35 goals in 42 appearances for the national team, was nicknamed "Rombo de Tronu" by Gianni Brera, and died in 2024.

On cantu a tenore, the polyphonic vocal tradition of central Sardinia, it correctly names the four voices (boghe, bassu, contra, mesu boghe), the 2005 UNESCO recognition, and the Barbagia origin.

Conversational greetings, short translations, factual recall on canonical Sardinian topics, and grammatical analysis all work well. Long open-ended generation is more variable; see Limitations.

Training procedure

Base model. Qwen2.5-3B-Instruct (3.09B parameters, transformer decoder, 32 layers, 16 attention heads, RoPE positional embeddings, 128K context length).

Stage 1, continued pretraining. Full fine-tuning in bfloat16 for 2 epochs on approximately 13.9M tokens (11.5M Sardinian plus 2.4M Romance replay drawn from Italian, Spanish, Portuguese, and Catalan Wikipedias). Sequence length 4096. Effective batch 16 (1 per device with 16 gradient accumulation steps). Learning rate 5e-5 with cosine schedule and 50-step warmup. Paged AdamW 8-bit optimizer. Flash Attention 2. Gradient checkpointing enabled. Sequence packing disabled (packing leaks attention across document boundaries within a packed sequence and degraded model quality in our preliminary runs). Wall-clock time: 5.5 hours on one RTX 4090.

Stage 2, supervised fine-tuning. rsLoRA adapter at rank 256, alpha 256, dropout 0.05, targeting q, k, v, o, gate, up, and down projection matrices. 2 epochs on 14,404 instruction pairs (~12.8M tokens) with completion-only loss. Learning rate 2e-5 with cosine schedule and 50-step warmup. Other hyperparameters match Stage 1. The rsLoRA scaling correction (alpha/sqrt(r)) is what makes rank 256 trainable in practice; conventional LoRA scaling at this rank causes gradient collapse.

The released weights have the rsLoRA adapter merged into the base. Full training scripts, data preparation pipeline, and evaluation harness are at github.com/lballore/LLiMba.

Intended use

Research, education, language preservation, and personal use by speakers and learners of Sardinian. Specific use cases include conversational practice, translation between Sardinian and other Romance languages or English, language learning, text analysis, and as a starting point for further Sardinian NLP research.

Limitations

Hallucination on out-of-training facts. Like all 3B-class models, LLiMba fabricates when queried on content not present in training. The pattern is strongest for biographical specifics about partially-known figures: confident wrong dates, invented nicknames, plausible-sounding but false claims. Treat factual outputs with appropriate skepticism.

Morphological hallucination on long open-ended prompts. On extended unconstrained generation about Sardinian culture, the model occasionally produces phonotactically valid but non-attested Sardinian words (for example, cungafròngias, mojgas). The lexical resources for clean generation are present; the same model produces attested vocabulary on focused, structured queries. Mitigation: prefer structured prompts ("List the three main causes, one short sentence each") over open-ended ones ("Tell me about X") for production deployments.

Dialect skew. The model targets LSC and was reviewed by a single native speaker of the Nuorese variant. Logudorese and Campidanese input is handled, but speakers of those variants may find the model's output skews toward the standardized form rather than their local register.

Multilingual capability not benchmarked end-to-end. Continued pretraining can degrade non-target language capabilities. Romance replay data mitigates this for Italian, Spanish, Portuguese, and Catalan, and the strong SC to EN scores suggest English is well preserved, but other languages were not benchmarked after adaptation. Validate on your own task before relying on these.

Training data caveats. The pretraining corpus includes Sardinian translations of literary works whose copyright status was not exhaustively verified, so the corpus is not redistributed in raw form (the data collection pipeline and source pointers are released instead). The supervised fine-tuning data includes machine-translated Capybara entries (NLLB-200 3.3B as translator) which contain residual Italian-shaped grammatical structures rendered with Sardinian vocabulary.

Feedback and contributions

Bug reports, feature requests, native-speaker feedback on outputs, and any other discussion happens on GitHub. The Hugging Face Community tab on this repository is intentionally disabled to keep all conversation in one place.

If you're a Sardinian speaker spotting bad output, an incorrect dialect, or vocabulary the model is missing, an issue with the prompt and the response is genuinely useful and contributions of any size are welcome.

Out-of-scope use

The model is not suitable for high-stakes factual queries, medical or legal advice, or any application where hallucination would cause material harm. It should not be used as a sole authoritative source for Sardinian language standardization or pedagogy without human review.

License

Model weights are released under the Apache 2.0 license. See LICENSE for full terms.

The training and evaluation code at github.com/lballore/LLiMba is independently released also under Apache 2.0.

Acknowledgements

Native speaker review for the corpus and supervised fine-tuning data was contributed by the author. Source web texts come from salimbasarda.net, istorias.it, sardumatica.net, limbasardasudsardigna.it, and lacanas.it. Sardinian Wikipedia editors, and the Sardinian community of writers and translators made this project possible.

Citation

@misc{llimba2026,
  title         = {LLiMba: Sardinian on a Single GPU - Adapting a 3B Language Model to a Vanishing Romance Language},
  author        = {Luca Ballore},
  year          = {2026},
  eprint        = {2605.09015},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2605.09015}
}

@misc{llimba-3b-instruct,
  title     = {LLiMba-3B-Instruct},
  author    = {Luca Ballore},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/lballore/llimba-3b-instruct}
}

Downloads last month: 44

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for lballore/llimba-3b-instruct

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

lballore/llimba-3b-instruct-cpt

Finetuned

(1)

this model

Datasets used to train lballore/llimba-3b-instruct

Space using lballore/llimba-3b-instruct 1

Collection including lballore/llimba-3b-instruct

LLiMba: An open LLM for Sardinian

Collection

A 3B-parameter Sardinian-capable LLM adapted from Qwen2.5-3B-Instruct. Includes the model, the post-CPT checkpoint and the three datasets behind it. • 7 items • Updated about 16 hours ago • 1

Papers for lballore/llimba-3b-instruct

LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language

Paper • 2605.09015 • Published 4 days ago • 1

LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models

Paper • 2411.13453 • Published Nov 20, 2024

Evaluation results

EN-SC BLEU on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

28.470
EN-SC chrF on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

56.800
IT-SC BLEU on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

21.250
IT-SC chrF on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

52.080
ES-SC BLEU on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

18.570
ES-SC chrF on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

49.410
SC-EN BLEU on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

41.280
SC-EN chrF on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

64.640
SC-IT BLEU on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

17.610
SC-IT chrF on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

47.250
SC-ES BLEU on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

18.570
SC-ES chrF on FLORES-200 (Sardinian subset, 997 sentences)
self-reported

46.270