aios-framework/aios-paper
Updated
LLM on CPU
A memory residency controller and Model Contract for deploying large language models efficiently on CPU hardware.
AIOS addresses the memory bandwidth bottleneck in CPU inference through weight aliasing, sparsity maps, KV cache tiering, and activation chunking — targeting 7B+ models on hardware organizations already own.
Framework and specification published. Runtime implementation is the primary contribution opportunity.
Clone the repo, run validation/compliance.py on any GGUF model, post results to the relevant GitHub issue.