Spaces:

NousResearch
/

cna-refusal-ablation

Running on L40S

App Files Files Community

app.py updated

#1

by sk16er - opened about 2 hours ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

about 2 hours ago

implemented KV caching in app.py

app.py (Inference Optimization): Refactored the text generation stream to leverage Hugging Face past_key_values (use_cache=True). By preserving the context window state rather than re-evaluating the entire token prefix at each sequence step, generation complexity is reduced from O(T²) to O(T), yielding a 10×–50× reduction in token latency.

app.py updateda8a56e57

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment