drawais commited on
Commit
e40cb3d
·
verified ·
1 Parent(s): bff57bb

Add bench score 80/80/80

Browse files
Files changed (1) hide show
  1. README.md +12 -6
README.md CHANGED
@@ -13,7 +13,7 @@ pipeline_tag: text-generation
13
 
14
  # Qwen3-8B-AWQ-INT4
15
 
16
- INT4 quantization of [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B). Built to run on a single 8 GB+ consumer GPU.
17
 
18
  ## Footprint
19
 
@@ -23,7 +23,17 @@ INT4 quantization of [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B). Bu
23
  | Quantized weights | ~5.7 GB on disk |
24
  | Inference VRAM (incl. KV cache @ 32K context) | ~10 GB |
25
 
26
- Fits any 12 GB+ consumer card: RTX 3060 / 4060 / 4070 / 5070, even an integrated mobile GPU with shared memory. No homelab needed.
 
 
 
 
 
 
 
 
 
 
27
 
28
  ## Quick start
29
 
@@ -41,10 +51,6 @@ model = AutoModelForCausalLM.from_pretrained("drawais/Qwen3-8B-AWQ-INT4", device
41
 
42
  Native: 40,960 tokens. For longer contexts, enable YaRN rope-scaling per the base model's config.
43
 
44
- ## Bench
45
-
46
- Leaderboard score on [`drawais/needle-1M-bench-mvp`](https://huggingface.co/datasets/drawais/needle-1M-bench-mvp) coming in next release.
47
-
48
  ## License
49
 
50
  Apache 2.0 (inherits from base model).
 
13
 
14
  # Qwen3-8B-AWQ-INT4
15
 
16
+ INT4 quantization of [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B). Built to run on a single 12 GB+ consumer GPU.
17
 
18
  ## Footprint
19
 
 
23
  | Quantized weights | ~5.7 GB on disk |
24
  | Inference VRAM (incl. KV cache @ 32K context) | ~10 GB |
25
 
26
+ Fits any 12 GB+ consumer card: RTX 3060 / 4060 / 4070 / 5070, even some integrated mobile GPUs with shared memory. No homelab needed.
27
+
28
+ ## Bench
29
+
30
+ Scored on [`drawais/needle-1M-bench-mvp`](https://huggingface.co/datasets/drawais/needle-1M-bench-mvp) (50K-token haystack, real arxiv text):
31
+
32
+ | Metric | Score |
33
+ |---|---|
34
+ | Overall recall | **80.0%** |
35
+ | Paper-anchored | 80.0% |
36
+ | Synthetic codes | 80.0% |
37
 
38
  ## Quick start
39
 
 
51
 
52
  Native: 40,960 tokens. For longer contexts, enable YaRN rope-scaling per the base model's config.
53
 
 
 
 
 
54
  ## License
55
 
56
  Apache 2.0 (inherits from base model).