Building on HF

7 27 42

Zixi "Oz" Li PRO

OzTianlu

https://github.com/lizixi-0x2F

lizixi-0x2F

AI & ML interests

My research focuses on deep reasoning with small language models, Transformer architecture innovation, and knowledge distillation for efficient alignment and transfer.

Recent Activity

liked a dataset 8 days ago

TAAC2026/data_sample_1000

liked a model 12 days ago

google/gemma-4-26B-A4B-it

reacted to theirpost with 🤗 18 days ago

https://github.com/lizixi-0x2F/March I just released March, an open-source high-performance KV cache sharing library for LLM inference that uses Trie-based prefix deduplication. When you run LLM services, you often see thousands of requests sharing the same system prompt and conversation history. But traditional KV cache systems store each sequence separately — duplicating the exact same data over and over again. Pure waste. March uses a Trie structure to automatically detect and reuse identical token prefixes. Instead of storing [system_prompt + history] 1000 times, it's stored once. Everyone shares it. - 80-97% memory reduction in prefix-heavy workloads (tested on SmolLM2-135M with 500 multi-turn conversations) - Zero-copy queries — returns direct pointers into the memory pool, no expensive memcpy on the hot path - Predictable memory usage — fixed-size page pool with O(L) complexity - Trade-off: slightly slower than dict O(1) lookup, but the memory savings are worth it in production

View all activity

Organizations

upvoted 2 articles about 1 month ago

Article

Arcade-3B: SLM Optimization via Orthogonal Decoupling of Latent State Spaces

Mar 15

•

Article

Arcade-3B: 基于隐藏层状态空间正交解耦的 SLM 优化

Mar 15

•

upvoted 2 papers about 1 month ago

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Paper • 2602.17684 • Published Feb 4 • 22

Efficient RLVR Training via Weighted Mutual Information Data Selection

Paper • 2603.01907 • Published Mar 2 • 14

upvoted a paper about 2 months ago

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Paper • 2504.13914 • Published Apr 10, 2025 • 5

upvoted a collection about 2 months ago

Seed Flagship Model Released

Collection

contributed • 8 items • Updated 4 days ago • 3

upvoted a paper about 2 months ago

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

Paper • 2602.21548 • Published Feb 25 • 50

upvoted 2 articles about 2 months ago

Article

Exploring New Frontiers of LLMs: Adaptive Dual-Search Distillation (ADS) and the 30B Model Open Beta

Mar 1

•

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

Feb 20

•

503

upvoted a collection about 2 months ago

Kai Models Series

Collection

Kai Models Distilled via Adaptive Dual Search Distillation • 3 items • Updated Mar 2 • 2

upvoted a paper about 2 months ago

Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding

Paper • 2602.19626 • Published Feb 23 • 3

upvoted an article about 2 months ago

Article

Shattering the Memory Wall: O(1) Inference and Causal Monoid State Compression in Spartacus-1B

Feb 25

•

upvoted a collection about 2 months ago

Spartacus Monoid Reasoning Models

Collection

O(1) Reasoning Models • 1 item • Updated Feb 25 • 2

upvoted an article 2 months ago

Article

The Optimal Architecture for Small Language Models

Dec 26, 2025

•

120

upvoted a collection 3 months ago

Geilim Smol Language Models

Collection

Geilim Smol Language Models • 2 items • Updated Mar 3 • 1

upvoted 2 papers 3 months ago

Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published Aug 4, 2025 • 19

Reasoning: From Reflection to Solution

Paper • 2511.11712 • Published Nov 12, 2025 • 2

upvoted a paper 4 months ago

Weight-sparse transformers have interpretable circuits

Paper • 2511.13653 • Published Nov 17, 2025 • 2

upvoted a collection 5 months ago

Reasoning at the Edge (HF Preprints)

Collection

This collection traces the mathematical and empirical limits of machine reasoning. • 12 items • Updated Feb 28 • 1

upvoted a collection 6 months ago

🐕Small-Doges

Collection

Doge family of small language models! • 18 items • Updated Apr 21, 2025 • 11

Zixi "Oz" Li PRO

AI & ML interests

Recent Activity

Organizations

OzTianlu's activity

Arcade-3B: SLM Optimization via Orthogonal Decoupling of Latent State Spaces

Arcade-3B: 基于隐藏层状态空间正交解耦的 SLM 优化

Exploring New Frontiers of LLMs: Adaptive Dual-Search Distillation (ADS) and the 30B Model Open Beta

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

Shattering the Memory Wall: O(1) Inference and Causal Monoid State Compression in Spartacus-1B

The Optimal Architecture for Small Language Models