Lite LLM

Lite LLM is a deterministic, tiered-parameter, hierarchical sparse expert (HSER) language model runtime designed to scale from 1B → 1T parameters and beyond (up to quadrillion-scale parameter universes) while keeping active compute bounded per token.

The Github organization hosts the specification corpus, reference implementations, and operational tooling for building and deploying Lite LLM as an enterprise / reference-grade system.

Model optimization for the LiteCore Coherent Silicon Photonic Complex Multiply-Accumulate (CSP-cMAC) Unit Cell Hardware focuses on maximizing inference efficiency under tight memory and power constraints by combining compression, quantization, and memory aware execution. LiteCore is a fundamental photonic compute primitive purpose-built for large language model (LLM) inference at quadrillion-parameter scales. LiteCore leverages silicon-on-insulator (SOI) photonics to perform complex-valued multiply-accumulate operations at <1 fJ energy and 1–10 ps latency—representing 500–2,000× energy and 1,000–10,000× latency improvements over state-of-the-art electronic GPUs.

What makes Lite LLM different

Deterministic by design

Lite LLM treats determinism as a first-class requirement:

Stable top‑k routing with seeded tie‑breaking
Deterministic collectives and reproducible distributed execution
Deterministic audit logs and replayable training runs

Tiered Parameter Architecture (TPA)

Parameters are partitioned across storage tiers:

Hot (HBM / GPU)
Warm (DRAM)
Cold (NVMe)
Archive (Object Store)

Only the TierSet for a request is eligible for routing; everything else has zero activation probability.

Hierarchical Sparse Expert Routing (HSER)

Routing is hierarchical: Tier → Group → Expert with bounded activation: k_tier × k_group × k_expert experts per token per layer.

This enables extreme parameter scaling while keeping per-token compute predictable.

Enterprise runtime focus

Lite LLM is not only a model architecture—it is a runtime system:

Distributed execution protocols
Storage hierarchy and prefetching
Secure loading and integrity verification
Multi-tenant isolation, quotas, and compliance readiness

Repositories

Specifications (authoritative)

lite-llm-specs — Enterprise Runtime Engineering Specification Corpus (SPEC‑001…SPEC‑060)
lite-llm-schemas — JSON/YAML schemas for manifests, telemetry, policies
lite-llm-rfcs — Design proposals and evolution process (RFCs)

Reference implementations

lite-llm-runtime — Rust runtime (routing, caches, dispatch, TierSet engine)
lite-llm-train — Training orchestration, checkpointing, determinism harness
lite-llm-kernels — Device kernels + safe wrappers (CUDA/HIP/Metal/CPU)
lite-llm-comm — Transport abstraction (RDMA / NCCL / QUIC), collectives
lite-llm-storage — Shards, manifests, tier placement, streaming + prefetch

Tooling

lite-llm-cli — Operator CLI (inspect checkpoints, tier policies, telemetry)
lite-llm-observability — Metrics exporters, dashboards, tracing
lite-llm-deploy — Helm charts, Terraform modules, bare‑metal playbooks

The organization may not yet contain all repositories listed above; this is the intended long-term structure.

Getting started

1) Read the specs

Start with:

SPEC‑001 Runtime Architecture Overview
SPEC‑003 Deterministic Routing Engine
SPEC‑004 Tiered Parameter Architecture (TPA)
SPEC‑005 Hierarchical Sparse Expert Routing (HSER)
SPEC‑006 Active Compute Bounding Model
SPEC‑021…030 Storage hierarchy (hot/warm/cold/archive)
SPEC‑041…050 Inference runtime (TierSet selection, dispatch, KV cache)

2) Implement the contracts

The specs are written to be directly implementable:

Deterministic routing + stable sorting
Tier placement policies and shard formats
All‑to‑all dispatch and imbalance handling
Audit logging and integrity verification

3) Validate determinism

Before performance optimization:

Ensure cross-node routing reproducibility
Validate deterministic collectives
Use the replay engine during training

Contribution

We welcome contributions in:

Spec clarifications and testable invariants
Rust runtime modules (memory model, routing, dispatch, caching)
Deterministic training harness and replay tooling
Storage tier orchestration and prefetch algorithms
Security hardening and audit improvements

Please read:

CONTRIBUTING.md for workflow and standards
CODE_OF_CONDUCT.md for community expectations
SECURITY.md for vulnerability reporting

Security

Lite LLM emphasizes:

Memory-safe runtime design in Rust
Secure checkpoint loading and integrity verification
Encryption at rest for tier storage
Key management and auditability
Sandboxing and capability isolation for extensions

See SECURITY.md to report vulnerabilities responsibly.

Governance

The specification corpus is the normative authority.
Changes to the corpus should go through the RFC process:

Open an RFC in lite-llm-rfcs
Discuss and iterate
Land a spec patch with tests, invariants, and migration notes

License

Lite-LLM is distributed under the Dust Open Source License

license: other license_name: dosl-iie-1.0 license_link: https://github.com/lite-llm/lite-llm/raw/refs/heads/main/LICENSE

Contact

Security: see SECURITY.md
General: open an issue in the relevant repository

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support