AIOS: A CPU-Native Inference Architecture for Large Language Models

This is not a model. This is the framework paper and specification for AIOS โ€” a memory residency controller for CPU-native LLM inference.

Paper

Title: AIOS: A CPU-Native Inference Architecture for Large Language Models
Author: Anand Casavaraju
Published: March 2026
SSRN: https://ssrn.com/abstract=6467298
GitHub: https://github.com/acasavaraju/AIOS

What AIOS Is

AIOS is a memory residency controller that sits between inference engines (llama.cpp, Ollama, vLLM) and hardware, managing how weight data moves from DRAM to CPU. It addresses four resource dimensions:

  • Weight reads โ€” aliasing + sparsity maps
  • KV cache reads โ€” MQA/GQA + tiered residency
  • Activation spill โ€” chunked prefill
  • Attention compute โ€” sparsity map

Current State

Framework and specification published. Runtime not yet implemented. All performance projections are analytical. Empirical validation tracked at github.com/acasavaraju/AIOS/issues.

Citation

@misc{casavaraju2026aios,
  title  = {AIOS: A CPU-Native Inference Architecture for Large Language Models},
  author = {Casavaraju, Anand},
  year   = {2026},
  url    = {https://ssrn.com/abstract=6467298}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support