Update README: separate Paper/Website/Code lines, add prompt85 entry
Browse files
README.md
CHANGED
|
@@ -13,8 +13,9 @@ tags:
|
|
| 13 |
|
| 14 |
Precomputed K/V rotation matrices for **OSCAR INT2 KV-cache quantization**.
|
| 15 |
|
| 16 |
-
π Paper
|
| 17 |
-
|
|
|
|
| 18 |
|
| 19 |
OSCAR captures Q/K/V activations on a small calibration set, estimates
|
| 20 |
attention-aware K/V covariance offline, and derives per-layer orthogonal
|
|
@@ -30,6 +31,7 @@ re-run the Q/K/V dump and eigendecomposition yourself.
|
|
| 30 |
| Model | Calibration | GPQA (BF16) | GPQA (OSCAR INT2) |
|
| 31 |
|---|---|---:|---:|
|
| 32 |
| `Qwen/Qwen3-4B-Thinking-2507` | `seq20000_prompt83_group128` | 67.27 | 67.17 |
|
|
|
|
| 33 |
| `Qwen/Qwen3-8B` | `seq20000_prompt83_group128` | 56.67 | 55.56 |
|
| 34 |
| `Qwen/Qwen3-32B` | `seq16000_prompt69_group128` | 58.49 | 60.40 |
|
| 35 |
| `zai-org/GLM-4.7-FP8` | `seq10000_prompt43_group128` | 73.23 | 73.57 |
|
|
|
|
| 13 |
|
| 14 |
Precomputed K/V rotation matrices for **OSCAR INT2 KV-cache quantization**.
|
| 15 |
|
| 16 |
+
- π **Paper** β *OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization*
|
| 17 |
+
- π **Website** β https://oscar-quantize.github.io/
|
| 18 |
+
- π» **Code** β https://github.com/FutureMLS-Lab/OSCAR
|
| 19 |
|
| 20 |
OSCAR captures Q/K/V activations on a small calibration set, estimates
|
| 21 |
attention-aware K/V covariance offline, and derives per-layer orthogonal
|
|
|
|
| 31 |
| Model | Calibration | GPQA (BF16) | GPQA (OSCAR INT2) |
|
| 32 |
|---|---|---:|---:|
|
| 33 |
| `Qwen/Qwen3-4B-Thinking-2507` | `seq20000_prompt83_group128` | 67.27 | 67.17 |
|
| 34 |
+
| `Qwen/Qwen3-4B-Thinking-2507` | `seq20000_prompt85_group128` (fresh re-dump) | 67.27 | β |
|
| 35 |
| `Qwen/Qwen3-8B` | `seq20000_prompt83_group128` | 56.67 | 55.56 |
|
| 36 |
| `Qwen/Qwen3-32B` | `seq16000_prompt69_group128` | 58.49 | 60.40 |
|
| 37 |
| `zai-org/GLM-4.7-FP8` | `seq10000_prompt43_group128` | 73.23 | 73.57 |
|