LAI V3

LAI V3 is a lightweight bilingual causal language model developed by the Pixxle / LAI team for local inference. The intended product target is an offline mobile assistant that can handle short French/English conversations, answer grounded factual questions from injected facts, and prefer "I don't know" / "Je ne sais pas" when relevant facts are not available.

This Hugging Face repository contains the released model artifacts only: PyTorch checkpoints and the SentencePiece tokenizer used by the LAI V3 family.

The complete product project is broader than this model release. The original target was to run LAI locally on mobile devices, especially iPhone, with a lightweight architecture adapted to local execution. The full application project, mobile integration, prompt pipeline, and product-level architecture are available on GitHub:

pixxlefr/LAI on GitHub

Recommended checkpoint

For the default V3 behavior, start with:

lai_v3_final.pt

For the most conservative grounded behavior:

lai_v3_rag_strict_best.pt

What The Model Is Designed To Do

Short natural conversations in French and English
Lightweight empathetic replies and assistant identity behavior
Grounded factual answers from a provided facts block
User-context personalization when name, mood, or preferences are injected in the prompt
"I don't know" / "Je ne sais pas" style answers when a factual answer is requested without usable facts

Core Product Concept: Model + External Knowledge Base

The main LAI idea is that the model should not be treated as the sole place where knowledge lives.

Instead, the intended architecture separates:

the language model, which generates natural bilingual answers
the knowledge base, stored separately in local JSONL files
the retrieval layer, which searches that knowledge base before asking the model to answer

In other words:

LAI V3 is the language and response layer
the factual knowledge is expected to live outside the model
the application should search the local knowledge base first, then inject the retrieved facts into the prompt

This is important because the project goal was local mobile execution. Keeping a separate knowledge base makes it easier to:

update facts without retraining the model
keep the model lighter for mobile devices
control where factual answers come from
prefer grounded answers over hallucinated ones

Knowledge Base Format

The intended product design uses local JSONL files as a simple knowledge store.

Typical idea:

one JSON object per line
keywords for retrieval
language-specific fact strings to inject into the prompt

Example:

{"topic":"france","keywords":["france","paris"],"facts_fr":"La capitale de la France est Paris.","facts_en":"The capital of France is Paris."}

The application is expected to search those JSONL entries, retrieve the most relevant facts, and then build the prompt given to LAI.

Intended Retrieval Behavior

For factual questions, the expected workflow is:

the user asks a question
the app searches the external knowledge base stored in JSONL
the app selects matching facts
the app injects those facts into [FACTS]
LAI generates a short natural answer from those facts
if nothing relevant is found, LAI should prefer an explicit unknown-answer style response

So the intended product logic is not:

"ask the model and hope it knows"

It is:

"search the local knowledge base first, then ask the model to formulate the answer"

Important Format Note

These files are raw PyTorch checkpoints for a custom LAI architecture. They are not drop-in transformers checkpoints.

To run them directly, you need the LAI project code that defines:

version/v3/src/model.py
version/v3/src/tokenizer.py

In the product project, the final checkpoint is also exported to mobile-friendly formats such as GGUF and MLX for local iPhone inference.

Architecture

Decoder-only causal LM
194,192,768 parameters
Vocabulary size: 16,000
Context window: 1024 tokens
Hidden size: 896
Layers: 14
Attention heads: 14
Intermediate size: 3584
RMSNorm
Rotary positional embeddings
SwiGLU MLP
SentencePiece tokenizer

Special prompt tokens used by the training format:

[USER]
[FACTS]
[ANSWER]

Intended Project Flow

LAI V3 is meant to be used as one part of a larger product pipeline, not as an all-knowing standalone model.

The app receives a user message.
The app detects language and whether the message is conversational or factual.
The app looks up relevant information in local knowledge sources: local KB, user profile, recent conversation context, and optionally cached research.
The app builds a structured prompt.
The model generates a short answer in French or English.
The app cleans the answer, persists user knowledge updates, and displays the final reply.

For product use, this means the model should usually answer from retrieved facts rather than act like a closed factual database by itself.

Core prompt contract:

[USER] {message} [FACTS] {facts} [ANSWER]

If no grounded facts are available, the project may send:

[USER] {message} [ANSWER]

The model is trained so that the intended behavior for factual questions without facts is an "I don't know" style response.

Prompting Pattern

French grounded example:

[USER] Quelle est la capitale de la France ? [FACTS] La capitale de la France est Paris. [ANSWER]

Expected style:

Paris.

English grounded example:

[USER] What is the capital of Japan? [FACTS] The capital of Japan is Tokyo. [ANSWER]

Expected style:

Tokyo.

Unknown factual example:

[USER] What is the capital of the Moon? [ANSWER]

Expected style:

I don't know.

Repository Contents

File	Role
`lai_v3_pretrain_best.pt`	Best checkpoint from the bilingual pretraining stage
`lai_v3_pretrain_last.pt`	Last saved checkpoint from the bilingual pretraining stage
`lai_v3_sft_best.pt`	Best checkpoint after conversational supervised fine-tuning
`lai_v3_sft_final.pt`	Final saved checkpoint from the conversational SFT stage
`lai_v3_en_best.pt`	Checkpoint after the English-balancing stage
`lai_v3_rag_best.pt`	Best checkpoint after grounded answering fine-tuning
`lai_v3_rag_final.pt`	Final saved checkpoint from the grounded answering stage
`lai_v3_rag_strict_best.pt`	Best checkpoint from the stricter grounded / IDK stage
`lai_v3_final.pt`	Final balanced project release checkpoint
`tokenizer_spm.model`	SentencePiece tokenizer model
`tokenizer_spm.json`	Tokenizer vocabulary / mapping

Training Stages

The V3 family follows a staged recipe inside the LAI project:

Bilingual pretraining FR/EN language modeling to learn the base language structure
Conversational supervised fine-tuning short dialogue behavior, greetings, empathy, and assistant identity
English reinforcement better bilingual balance and English small-talk coverage
Grounded answering fine-tuning use the [FACTS] block to answer factual questions
Strict grounded behavior stronger preference for grounded reformulation and explicit IDK behavior when facts are missing

The final model is meant to preserve conversation ability while staying grounded for factual questions.

How It Is Used In The Mobile Project

In the app project, LAI V3 is paired with:

a local KB
persistent user knowledge
recent conversation context
a native inference bridge
post-processing to keep answers short and clean

The shipped mobile path uses a quantized runtime export of the final checkpoint for on-device inference. This Hub repo keeps the original released PyTorch checkpoints.

This Hugging Face repository therefore publishes the model layer, while the GitHub repository contains the larger local-mobile project:

chat application
prompt builder
JSONL knowledge base handling
user memory
local storage
native mobile inference integration

GitHub project:

https://github.com/pixxlefr/LAI/tree/main

Example Loading Pattern

Minimal loading pattern with the project code:

import torch

from model import LaiConfig, LaiForCausalLM
from tokenizer import SimpleTokenizer

tokenizer = SimpleTokenizer("tokenizer_spm.model", "tokenizer_spm.json")
checkpoint = torch.load("lai_v3_final.pt", map_location="cpu", weights_only=False)

config = LaiConfig(vocab_size=len(tokenizer.vocab))
model = LaiForCausalLM(config)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

Recommended generation defaults used around the project:

max_tokens: 40 to 50
temperature: 0.15 to 0.7 depending on factual vs conversational mode
top_k: 20 to 40
repetition_penalty: around 1.2 to 1.3

Limitations

This release is a custom-code checkpoint family, not a standard Transformers package.
The model is designed for short responses and a 1024-token context window.
For factual questions, quality depends heavily on the facts injected into the prompt.
The project intent is grounded behavior, but like any generative model, outputs should still be validated in sensitive use cases.
The training datasets are not redistributed in this repository.

Ownership

This repository publishes LAI V3 artifacts released by the Pixxle / LAI team. The public repo contains weights, tokenizer files, and documentation for the released model family.

License

Released under the MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track