LAI V3

LAI V3 is a lightweight bilingual causal language model developed by the Pixxle / LAI team for local inference. The intended product target is an offline mobile assistant that can handle short French/English conversations, answer grounded factual questions from injected facts, and prefer "I don't know" / "Je ne sais pas" when relevant facts are not available.

This Hugging Face repository contains the released model artifacts only: PyTorch checkpoints and the SentencePiece tokenizer used by the LAI V3 family.

The complete product project is broader than this model release. The original target was to run LAI locally on mobile devices, especially iPhone, with a lightweight architecture adapted to local execution. The full application project, mobile integration, prompt pipeline, and product-level architecture are available on GitHub:

Recommended checkpoint

For the default V3 behavior, start with:

  • lai_v3_final.pt

For the most conservative grounded behavior:

  • lai_v3_rag_strict_best.pt

What The Model Is Designed To Do

  • Short natural conversations in French and English
  • Lightweight empathetic replies and assistant identity behavior
  • Grounded factual answers from a provided facts block
  • User-context personalization when name, mood, or preferences are injected in the prompt
  • "I don't know" / "Je ne sais pas" style answers when a factual answer is requested without usable facts

Core Product Concept: Model + External Knowledge Base

The main LAI idea is that the model should not be treated as the sole place where knowledge lives.

Instead, the intended architecture separates:

  • the language model, which generates natural bilingual answers
  • the knowledge base, stored separately in local JSONL files
  • the retrieval layer, which searches that knowledge base before asking the model to answer

In other words:

  • LAI V3 is the language and response layer
  • the factual knowledge is expected to live outside the model
  • the application should search the local knowledge base first, then inject the retrieved facts into the prompt

This is important because the project goal was local mobile execution. Keeping a separate knowledge base makes it easier to:

  • update facts without retraining the model
  • keep the model lighter for mobile devices
  • control where factual answers come from
  • prefer grounded answers over hallucinated ones

Knowledge Base Format

The intended product design uses local JSONL files as a simple knowledge store.

Typical idea:

  • one JSON object per line
  • keywords for retrieval
  • language-specific fact strings to inject into the prompt

Example:

{"topic":"france","keywords":["france","paris"],"facts_fr":"La capitale de la France est Paris.","facts_en":"The capital of France is Paris."}

The application is expected to search those JSONL entries, retrieve the most relevant facts, and then build the prompt given to LAI.

Intended Retrieval Behavior

For factual questions, the expected workflow is:

  1. the user asks a question
  2. the app searches the external knowledge base stored in JSONL
  3. the app selects matching facts
  4. the app injects those facts into [FACTS]
  5. LAI generates a short natural answer from those facts
  6. if nothing relevant is found, LAI should prefer an explicit unknown-answer style response

So the intended product logic is not:

  • "ask the model and hope it knows"

It is:

  • "search the local knowledge base first, then ask the model to formulate the answer"

Important Format Note

These files are raw PyTorch checkpoints for a custom LAI architecture. They are not drop-in transformers checkpoints.

To run them directly, you need the LAI project code that defines:

  • version/v3/src/model.py
  • version/v3/src/tokenizer.py

In the product project, the final checkpoint is also exported to mobile-friendly formats such as GGUF and MLX for local iPhone inference.

Architecture

  • Decoder-only causal LM
  • 194,192,768 parameters
  • Vocabulary size: 16,000
  • Context window: 1024 tokens
  • Hidden size: 896
  • Layers: 14
  • Attention heads: 14
  • Intermediate size: 3584
  • RMSNorm
  • Rotary positional embeddings
  • SwiGLU MLP
  • SentencePiece tokenizer

Special prompt tokens used by the training format:

  • [USER]
  • [FACTS]
  • [ANSWER]

Intended Project Flow

LAI V3 is meant to be used as one part of a larger product pipeline, not as an all-knowing standalone model.

  1. The app receives a user message.
  2. The app detects language and whether the message is conversational or factual.
  3. The app looks up relevant information in local knowledge sources: local KB, user profile, recent conversation context, and optionally cached research.
  4. The app builds a structured prompt.
  5. The model generates a short answer in French or English.
  6. The app cleans the answer, persists user knowledge updates, and displays the final reply.

For product use, this means the model should usually answer from retrieved facts rather than act like a closed factual database by itself.

Core prompt contract:

[USER] {message} [FACTS] {facts} [ANSWER]

If no grounded facts are available, the project may send:

[USER] {message} [ANSWER]

The model is trained so that the intended behavior for factual questions without facts is an "I don't know" style response.

Prompting Pattern

French grounded example:

[USER] Quelle est la capitale de la France ? [FACTS] La capitale de la France est Paris. [ANSWER]

Expected style:

Paris.

English grounded example:

[USER] What is the capital of Japan? [FACTS] The capital of Japan is Tokyo. [ANSWER]

Expected style:

Tokyo.

Unknown factual example:

[USER] What is the capital of the Moon? [ANSWER]

Expected style:

I don't know.

Repository Contents

File Role
lai_v3_pretrain_best.pt Best checkpoint from the bilingual pretraining stage
lai_v3_pretrain_last.pt Last saved checkpoint from the bilingual pretraining stage
lai_v3_sft_best.pt Best checkpoint after conversational supervised fine-tuning
lai_v3_sft_final.pt Final saved checkpoint from the conversational SFT stage
lai_v3_en_best.pt Checkpoint after the English-balancing stage
lai_v3_rag_best.pt Best checkpoint after grounded answering fine-tuning
lai_v3_rag_final.pt Final saved checkpoint from the grounded answering stage
lai_v3_rag_strict_best.pt Best checkpoint from the stricter grounded / IDK stage
lai_v3_final.pt Final balanced project release checkpoint
tokenizer_spm.model SentencePiece tokenizer model
tokenizer_spm.json Tokenizer vocabulary / mapping

Training Stages

The V3 family follows a staged recipe inside the LAI project:

  1. Bilingual pretraining FR/EN language modeling to learn the base language structure
  2. Conversational supervised fine-tuning short dialogue behavior, greetings, empathy, and assistant identity
  3. English reinforcement better bilingual balance and English small-talk coverage
  4. Grounded answering fine-tuning use the [FACTS] block to answer factual questions
  5. Strict grounded behavior stronger preference for grounded reformulation and explicit IDK behavior when facts are missing

The final model is meant to preserve conversation ability while staying grounded for factual questions.

How It Is Used In The Mobile Project

In the app project, LAI V3 is paired with:

  • a local KB
  • persistent user knowledge
  • recent conversation context
  • a native inference bridge
  • post-processing to keep answers short and clean

The shipped mobile path uses a quantized runtime export of the final checkpoint for on-device inference. This Hub repo keeps the original released PyTorch checkpoints.

This Hugging Face repository therefore publishes the model layer, while the GitHub repository contains the larger local-mobile project:

  • chat application
  • prompt builder
  • JSONL knowledge base handling
  • user memory
  • local storage
  • native mobile inference integration

GitHub project:

Example Loading Pattern

Minimal loading pattern with the project code:

import torch

from model import LaiConfig, LaiForCausalLM
from tokenizer import SimpleTokenizer

tokenizer = SimpleTokenizer("tokenizer_spm.model", "tokenizer_spm.json")
checkpoint = torch.load("lai_v3_final.pt", map_location="cpu", weights_only=False)

config = LaiConfig(vocab_size=len(tokenizer.vocab))
model = LaiForCausalLM(config)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

Recommended generation defaults used around the project:

  • max_tokens: 40 to 50
  • temperature: 0.15 to 0.7 depending on factual vs conversational mode
  • top_k: 20 to 40
  • repetition_penalty: around 1.2 to 1.3

Limitations

  • This release is a custom-code checkpoint family, not a standard Transformers package.
  • The model is designed for short responses and a 1024-token context window.
  • For factual questions, quality depends heavily on the facts injected into the prompt.
  • The project intent is grounded behavior, but like any generative model, outputs should still be validated in sensitive use cases.
  • The training datasets are not redistributed in this repository.

Ownership

This repository publishes LAI V3 artifacts released by the Pixxle / LAI team. The public repo contains weights, tokenizer files, and documentation for the released model family.

License

Released under the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support