LAI V3
LAI V3 is a lightweight bilingual causal language model developed by the Pixxle / LAI team for local inference. The intended product target is an offline mobile assistant that can handle short French/English conversations, answer grounded factual questions from injected facts, and prefer "I don't know" / "Je ne sais pas" when relevant facts are not available.
This Hugging Face repository contains the released model artifacts only: PyTorch checkpoints and the SentencePiece tokenizer used by the LAI V3 family.
The complete product project is broader than this model release. The original target was to run LAI locally on mobile devices, especially iPhone, with a lightweight architecture adapted to local execution. The full application project, mobile integration, prompt pipeline, and product-level architecture are available on GitHub:
Recommended checkpoint
For the default V3 behavior, start with:
lai_v3_final.pt
For the most conservative grounded behavior:
lai_v3_rag_strict_best.pt
What The Model Is Designed To Do
- Short natural conversations in French and English
- Lightweight empathetic replies and assistant identity behavior
- Grounded factual answers from a provided facts block
- User-context personalization when name, mood, or preferences are injected in the prompt
- "I don't know" / "Je ne sais pas" style answers when a factual answer is requested without usable facts
Core Product Concept: Model + External Knowledge Base
The main LAI idea is that the model should not be treated as the sole place where knowledge lives.
Instead, the intended architecture separates:
- the language model, which generates natural bilingual answers
- the knowledge base, stored separately in local
JSONLfiles - the retrieval layer, which searches that knowledge base before asking the model to answer
In other words:
- LAI V3 is the language and response layer
- the factual knowledge is expected to live outside the model
- the application should search the local knowledge base first, then inject the retrieved facts into the prompt
This is important because the project goal was local mobile execution. Keeping a separate knowledge base makes it easier to:
- update facts without retraining the model
- keep the model lighter for mobile devices
- control where factual answers come from
- prefer grounded answers over hallucinated ones
Knowledge Base Format
The intended product design uses local JSONL files as a simple knowledge store.
Typical idea:
- one JSON object per line
- keywords for retrieval
- language-specific fact strings to inject into the prompt
Example:
{"topic":"france","keywords":["france","paris"],"facts_fr":"La capitale de la France est Paris.","facts_en":"The capital of France is Paris."}
The application is expected to search those JSONL entries, retrieve the most relevant facts, and then build the prompt given to LAI.
Intended Retrieval Behavior
For factual questions, the expected workflow is:
- the user asks a question
- the app searches the external knowledge base stored in
JSONL - the app selects matching facts
- the app injects those facts into
[FACTS] - LAI generates a short natural answer from those facts
- if nothing relevant is found, LAI should prefer an explicit unknown-answer style response
So the intended product logic is not:
- "ask the model and hope it knows"
It is:
- "search the local knowledge base first, then ask the model to formulate the answer"
Important Format Note
These files are raw PyTorch checkpoints for a custom LAI architecture. They are not drop-in transformers checkpoints.
To run them directly, you need the LAI project code that defines:
version/v3/src/model.pyversion/v3/src/tokenizer.py
In the product project, the final checkpoint is also exported to mobile-friendly formats such as GGUF and MLX for local iPhone inference.
Architecture
- Decoder-only causal LM
- 194,192,768 parameters
- Vocabulary size: 16,000
- Context window: 1024 tokens
- Hidden size: 896
- Layers: 14
- Attention heads: 14
- Intermediate size: 3584
- RMSNorm
- Rotary positional embeddings
- SwiGLU MLP
- SentencePiece tokenizer
Special prompt tokens used by the training format:
[USER][FACTS][ANSWER]
Intended Project Flow
LAI V3 is meant to be used as one part of a larger product pipeline, not as an all-knowing standalone model.
- The app receives a user message.
- The app detects language and whether the message is conversational or factual.
- The app looks up relevant information in local knowledge sources: local KB, user profile, recent conversation context, and optionally cached research.
- The app builds a structured prompt.
- The model generates a short answer in French or English.
- The app cleans the answer, persists user knowledge updates, and displays the final reply.
For product use, this means the model should usually answer from retrieved facts rather than act like a closed factual database by itself.
Core prompt contract:
[USER] {message} [FACTS] {facts} [ANSWER]
If no grounded facts are available, the project may send:
[USER] {message} [ANSWER]
The model is trained so that the intended behavior for factual questions without facts is an "I don't know" style response.
Prompting Pattern
French grounded example:
[USER] Quelle est la capitale de la France ? [FACTS] La capitale de la France est Paris. [ANSWER]
Expected style:
Paris.
English grounded example:
[USER] What is the capital of Japan? [FACTS] The capital of Japan is Tokyo. [ANSWER]
Expected style:
Tokyo.
Unknown factual example:
[USER] What is the capital of the Moon? [ANSWER]
Expected style:
I don't know.
Repository Contents
| File | Role |
|---|---|
lai_v3_pretrain_best.pt |
Best checkpoint from the bilingual pretraining stage |
lai_v3_pretrain_last.pt |
Last saved checkpoint from the bilingual pretraining stage |
lai_v3_sft_best.pt |
Best checkpoint after conversational supervised fine-tuning |
lai_v3_sft_final.pt |
Final saved checkpoint from the conversational SFT stage |
lai_v3_en_best.pt |
Checkpoint after the English-balancing stage |
lai_v3_rag_best.pt |
Best checkpoint after grounded answering fine-tuning |
lai_v3_rag_final.pt |
Final saved checkpoint from the grounded answering stage |
lai_v3_rag_strict_best.pt |
Best checkpoint from the stricter grounded / IDK stage |
lai_v3_final.pt |
Final balanced project release checkpoint |
tokenizer_spm.model |
SentencePiece tokenizer model |
tokenizer_spm.json |
Tokenizer vocabulary / mapping |
Training Stages
The V3 family follows a staged recipe inside the LAI project:
- Bilingual pretraining FR/EN language modeling to learn the base language structure
- Conversational supervised fine-tuning short dialogue behavior, greetings, empathy, and assistant identity
- English reinforcement better bilingual balance and English small-talk coverage
- Grounded answering fine-tuning
use the
[FACTS]block to answer factual questions - Strict grounded behavior stronger preference for grounded reformulation and explicit IDK behavior when facts are missing
The final model is meant to preserve conversation ability while staying grounded for factual questions.
How It Is Used In The Mobile Project
In the app project, LAI V3 is paired with:
- a local KB
- persistent user knowledge
- recent conversation context
- a native inference bridge
- post-processing to keep answers short and clean
The shipped mobile path uses a quantized runtime export of the final checkpoint for on-device inference. This Hub repo keeps the original released PyTorch checkpoints.
This Hugging Face repository therefore publishes the model layer, while the GitHub repository contains the larger local-mobile project:
- chat application
- prompt builder
- JSONL knowledge base handling
- user memory
- local storage
- native mobile inference integration
GitHub project:
Example Loading Pattern
Minimal loading pattern with the project code:
import torch
from model import LaiConfig, LaiForCausalLM
from tokenizer import SimpleTokenizer
tokenizer = SimpleTokenizer("tokenizer_spm.model", "tokenizer_spm.json")
checkpoint = torch.load("lai_v3_final.pt", map_location="cpu", weights_only=False)
config = LaiConfig(vocab_size=len(tokenizer.vocab))
model = LaiForCausalLM(config)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
Recommended generation defaults used around the project:
max_tokens: 40 to 50temperature: 0.15 to 0.7 depending on factual vs conversational modetop_k: 20 to 40repetition_penalty: around 1.2 to 1.3
Limitations
- This release is a custom-code checkpoint family, not a standard Transformers package.
- The model is designed for short responses and a 1024-token context window.
- For factual questions, quality depends heavily on the facts injected into the prompt.
- The project intent is grounded behavior, but like any generative model, outputs should still be validated in sensitive use cases.
- The training datasets are not redistributed in this repository.
Ownership
This repository publishes LAI V3 artifacts released by the Pixxle / LAI team. The public repo contains weights, tokenizer files, and documentation for the released model family.
License
Released under the MIT License.