Spaces:

lballore
/

llimba-demo

Running on Zero

App Files Files

llimba-demo / README.md

lballore

Initial release of llimba-demo

7d6b5d4 3 days ago

6.83 kB

title: LLiMba 3B Demo
emoji: 💬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 6.14.0
app_file: app.py
pinned: false
disable_embedding: true
license: apache-2.0
short_description: Chat with LLiMba, an open 3B LLM that speaks Sardinian
hardware: zero-a10g
models:
  - lballore/llimba-3b-instruct
tags:
  - sardinian
  - limba-sarda-comuna
  - lsc
  - low-resource
  - endangered-language
  - chat
  - demo

💬 LLiMba 3B Demo

A live demo of LLiMba-3B-Instruct, an open 3B-parameter Sardinian-speaking language model adapted from Qwen2.5-3B-Instruct on a single consumer GPU.

LLiMba speaks fluent Sardinian (LSC, the standardized written form), accepts Logudorese and Campidanese input, and retains the multilingual capabilities of its Qwen2.5 base, so it also handles English, Italian, Spanish, and other Romance languages.

Try these prompts

If you don't speak Sardinian, here are starters that work well:

Conversation:

Salude! Comente ìstas? (Hi! How are you?)

Translation:

Translate to Sardinian: "The Mediterranean is rough today."
Traduzi in italianu: «Sa Sardigna est una ìsula bella meda in su Mediterràneu.»

Sardinian culture and history:

Chie fiat Gigi Riva? (Who was Gigi Riva?)
Ite est su «cantu a tenore» sardu? (What is Sardinian cantu a tenore?)

Open-ended:

Iscrie unu paragrafu in sardu subra de sa Sardigna. (Write a paragraph in Sardinian about Sardinia.)

Inference settings

The sidebar exposes the standard generation parameters. Defaults are tuned for natural Sardinian conversation; the model card recommends three preset profiles depending on use case:

Use case	Temperature	Top-p	Top-k	Repetition penalty
Translation, factual Q&A	0.0 (greedy)	1.0	1	1.0
Conversational chat (default)	0.3	0.9	40	1.05
Creative or long-form	>=0.5	0.9	40	1.1

Temperatures above 0.7 may cause language-boundary drift (Sardinian to Italian) and amplify morphological hallucination on long open-ended prompts. The model was trained with Romance replay data to mitigate this, but the safe upper bound for production use is around 0.7.

About the project

LLiMba is an open project to bring Sardinian, an endangered Romance language with roughly one million speakers, into modern NLP. The full release includes:

🤖 Model: lballore/llimba-3b-instruct
🤖 Intermediate checkpoint: lballore/llimba-3b-instruct-cpt (post-CPT, pre-SFT, for researchers)
📚 Pretraining corpus: lballore/llimba-corpus (~13.9M tokens)
📚 SFT data: lballore/llimba-sft (~14K instruction pairs)
📚 Eval set: lballore/llimba-flores-srd-eval (FLORES-200 subset)
💻 Code: github.com/lballore/LLiMba
📖 Paper: LLiMba: Sardinian on a Single GPU

The model was adapted via continued pretraining on Sardinian text (with Romance replay data to prevent language drift) followed by supervised fine-tuning with rsLoRA on instruction pairs. Full methodology and benchmarks in the model card.

API access

The Space's gradio_client API is gated behind an allowlist. Visiting this demo page in a browser works without any token; programmatic access via gradio_client or HTTP requires a token assigned by the project maintainer.

If you'd like to integrate LLiMba into a research project or non-commercial application, open a GitHub issue describing your use case. Approved integrators receive a token to pass with each call.

Once you have a token, the calling pattern depends on your language.

Python with gradio_client:

from gradio_client import Client

client = Client("lballore/llimba-demo")

result = client.predict(
    message="Hello! Please respond in Sardinian.",
    system_message="Ses unu assistente chi chistionat in sardu.",
    max_tokens=512,
    temperature=0.3,
    top_p=0.9,
    top_k=40,
    repetition_penalty=1.05,
    api_token="<YOUR_ASSIGNED_TOKEN>",
    api_name="/chat",
)
print(result)

JavaScript / TypeScript with @gradio/client (Node.js or any server-side runtime; never call from the browser, the token would be exposed):

import { Client } from "@gradio/client";

const client = await Client.connect("lballore/llimba-demo");

const result = await client.predict("/chat", {
  message: "Hello! Please respond in Sardinian.",
  system_message: "Ses unu assistente chi chistionat in sardu.",
  max_tokens: 512,
  temperature: 0.3,
  top_p: 0.9,
  top_k: 40,
  repetition_penalty: 1.05,
  api_token: process.env.LLIMBA_API_TOKEN,
});

console.log(result.data);

Other languages (PHP, Go, Ruby, Rust, etc.) can call the underlying HTTP API directly. The pattern is two requests: POST /gradio_api/call/chat with the inputs as a data array (returns an event_id), then GET /gradio_api/call/chat/{event_id} as Server-Sent Events to receive the response. Inputs in the data array must be in the same order as the Python and JavaScript examples above.

The translation endpoint follows the same pattern with api_name="/translate" and source/target language arguments instead of a system message; full input schemas are visible on the Space's "Use via API" tab.

Tokens are subject to revocation if usage patterns suggest abuse or significantly degrade the demo's responsiveness for casual visitors.

Limitations

This is a 3B-parameter model and like all small LLMs it can produce confident wrong answers, especially on biographical or historical specifics. Treat factual outputs with skepticism.

On long open-ended generation, the model may occasionally produce phonotactically valid but non-attested Sardinian words. Structured prompts ("List the three main causes") work more reliably than open-ended ones ("Tell me about X").

The model targets LSC and was reviewed by a single native speaker (the author himself). Logudorese and Campidanese input is handled, but speakers of those variants may find outputs skew toward the standardized form.

License

The demo code is released under Apache 2.0. The model weights are also released under Apache 2.0.