Improve model card metadata and documentation

This PR improves the model card for the OSCAR RotationZoo. Key changes include:
- Adding the `text-generation` pipeline tag to the metadata for better discoverability.
- Adding the paper authors for better attribution.
- Ensuring links to the paper, project page, and code repository are easily accessible.
- Maintaining the detailed usage instructions and precomputed rotation tables.

Files changed (1) hide show

README.md +15 -26

README.md CHANGED Viewed

@@ -7,6 +7,7 @@ tags:
 - quantization
 - rotation
 - sglang
 ---
 <p align="center">
@@ -17,18 +18,17 @@ tags:
 Precomputed K/V rotation matrices for **OSCAR INT2 KV-cache quantization**.
-- 📄 **Paper** — [*OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization*](https://arxiv.org/pdf/2605.17757)
 - 🌐 **Website** — https://oscar-quantize.github.io/
 - 💻 **Code** — https://github.com/FutureMLS-Lab/OSCAR
-OSCAR captures Q/K/V activations on a small calibration set, estimates
-attention-aware K/V covariance offline, and derives per-layer orthogonal
-rotations that align INT2 quantization with the directions attention actually
-consumes. The result is ~7× compression of the KV-cache memory footprint with
-single-digit pp accuracy drop on GPQA for dense reasoning models.
-This repo packages the rotations as drop-in `.pt` files so you don't need to
-re-run the Q/K/V dump and eigendecomposition yourself.
 ## Available rotations
@@ -40,25 +40,20 @@ re-run the Q/K/V dump and eigendecomposition yourself.
 | `Qwen/Qwen3-32B`              | `seq16000_prompt69_group128` | 58.49 | 60.40 |
 | `zai-org/GLM-4.7-FP8`         | `seq10000_prompt43_group128` | 73.23 | 73.57 |
-`seq<T>_prompt<N>_group<G>` notation: `T` = total calibration tokens,
-`N` = calibration prompt count, `G` = INT2 quant group size along head_dim.
 ## File format
 Each rotation directory contains:
-- `k_rotation_qqt_r_h_pbr.pt` — K-side rotation `R_K = R · H · P_br` where
-  `R = eigvec(Σ_Q)` is fit on Q's attention-aware covariance, `H` is a
-  head-dim Hadamard, and `P_br` is the eigenvalue-sorted bit-reversal
-  permutation
-- `v_rotation_sst_r_h_pbr.pt` — V-side rotation built on the score-weighted
-  V covariance `Σ_V = V^T diag(K^T (Q^T Q) K) V`
 File layout (PyTorch state-dict):
 ```python
 {
   "format_version": 1,
-  "objective":      "qqt_r_h_pbr"      # or "sst_r_h_pbr" for V
   "source_grouping": "layer",
   "layers": {
     0:  {"layer_id": 0,  "rotation": tensor(head_dim, head_dim)},
@@ -88,8 +83,7 @@ snapshot_download(
 ### 2. Serve with sglang-research using the rotation
-Clone https://github.com/FutureMLS-Lab/OSCAR and set up the single `oscar`
-conda env, then point the eval driver at your downloaded rotation:
 ```bash
 ROT_DIR=./oscar_rotations/Qwen3-8B/seq20000_prompt83_group128 \
@@ -121,14 +115,9 @@ python -m sglang.launch_server \
   --trust-remote-code
 ```
-Sink (`PREFIX_TOKENS=64`) and recent window (`RECENT_TOKENS=256`) tokens stay
-in BF16; the bulk of the KV cache is INT2-quantized into 128-element groups
-along head_dim using these rotations.
 ## Reproducing from scratch
-If you want to fit your own rotation on a different calibration set, the
-OSCAR pipeline is end-to-end reproducible:
 ```bash
 git clone https://github.com/FutureMLS-Lab/OSCAR.git
@@ -146,4 +135,4 @@ bash rotation/qwen3-8B/compute_rotation.sh   # phase 2 — fit R = eigvec(Σ_Q)
   year   = {2026},
   note   = {Together AI; University of Sydney; UIUC},
 }
-```

 - quantization
 - rotation
 - sglang
+pipeline_tag: text-generation
 ---
 <p align="center">
 Precomputed K/V rotation matrices for **OSCAR INT2 KV-cache quantization**.
+This repository contains the artifacts for the paper:
+**OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization**
+*Zhongzhu Zhou, Donglin Zhuang, Jisen Li, Ziyan Chen, Shuaiwen Leon Song, Ben Athiwaratkun, Xiaoxia Wu*
+- 📄 **Paper** — [arXiv:2605.17757](https://arxiv.org/abs/2605.17757)
 - 🌐 **Website** — https://oscar-quantize.github.io/
 - 💻 **Code** — https://github.com/FutureMLS-Lab/OSCAR
+OSCAR captures Q/K/V activations on a small calibration set, estimates attention-aware K/V covariance offline, and derives per-layer orthogonal rotations that align INT2 quantization with the directions attention actually consumes. The result is ~7× compression of the KV-cache memory footprint with single-digit pp accuracy drop on GPQA for dense reasoning models.
+This repo packages the rotations as drop-in `.pt` files so you don't need to re-run the Q/K/V dump and eigendecomposition yourself.
 ## Available rotations
 | `Qwen/Qwen3-32B`              | `seq16000_prompt69_group128` | 58.49 | 60.40 |
 | `zai-org/GLM-4.7-FP8`         | `seq10000_prompt43_group128` | 73.23 | 73.57 |
+`seq<T>_prompt<N>_group<G>` notation: `T` = total calibration tokens, `N` = calibration prompt count, `G` = INT2 quant group size along head_dim.
 ## File format
 Each rotation directory contains:
+- `k_rotation_qqt_r_h_pbr.pt` — K-side rotation `R_K = R · H · P_br` where `R = eigvec(Σ_Q)` is fit on Q's attention-aware covariance, `H` is a head-dim Hadamard, and `P_br` is the eigenvalue-sorted bit-reversal permutation
+- `v_rotation_sst_r_h_pbr.pt` — V-side rotation built on the score-weighted V covariance `Σ_V = V^T diag(K^T (Q^T Q) K) V`
 File layout (PyTorch state-dict):
 ```python
 {
   "format_version": 1,
+  "objective":      "qqt_r_h_pbr",      # or "sst_r_h_pbr" for V
   "source_grouping": "layer",
   "layers": {
     0:  {"layer_id": 0,  "rotation": tensor(head_dim, head_dim)},
 ### 2. Serve with sglang-research using the rotation
+Clone https://github.com/FutureMLS-Lab/OSCAR and set up the single `oscar` conda env, then point the eval driver at your downloaded rotation:
 ```bash
 ROT_DIR=./oscar_rotations/Qwen3-8B/seq20000_prompt83_group128 \
   --trust-remote-code
 ```
 ## Reproducing from scratch
+If you want to fit your own rotation on a different calibration set, the OSCAR pipeline is end-to-end reproducible:
 ```bash
 git clone https://github.com/FutureMLS-Lab/OSCAR.git
   year   = {2026},
   note   = {Together AI; University of Sydney; UIUC},
 }
+```