arxiv:2602.20732

CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference

Published on Feb 24

Authors:

Abstract

CHESS is a novel algorithm-system co-designed KV-cache management approach that achieves high-quality long-context LLM inference with significantly reduced memory usage and improved throughput.

AI-generated summary

Long-context LLMs demand accurate inference at low latency, yet decoding becomes primarily constrained by KV cache as context grows. Prior pruning methods are largely context-agnostic: their token selection ignores step-wise relevance and local semantics, which undermines quality. Moreover, their irregular accesses and selection overheads yield only limited wall-clock speedups. To address this, we propose CHESS, an algorithm-system co-design KV-cache management system. Algorithmically, CHESS introduces a context-aware, hierarchical selection policy that dynamically reconstructs a coherent context for the current decoding. System-wise, coarse granularity selection eliminates expensive data movement, fully realizing practical acceleration from theoretical sparsity. Extensive evaluations demonstrate that CHESS surpasses Full-KV quality using only 1\% of the KV cache, delivers low-latency stable inference with up to 4.56times higher throughput, and consistently outperforms other strong baselines. Code is available at https://anonymous.4open.science/r/CHESS-9958/{https://anonymous.4open.science/r/CHESS/}.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2602.20732

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.20732 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.20732 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.20732 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.