YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Unlocked version of Qwen/Qwen3.5-397B-A17B
Benchmark Results
Capability Benchmarks (thinking=false)
| Benchmark | CARVE NVFP4 (es=0.75) | Reference NVFP4 (nvidia) |
|---|---|---|
| MMLU 54 (temp=0.6) | 94.4% (51/54) | 88.9% (48/54) |
| GSM8K 20 (temp=0.6) | 95% (19/20) | 95% (19/20) |
| HumanEval 164 (temp=0.2) | 90.9% (149/164) | not tested |
MMLU-Pro Comparison (seed=42, 600 random questions, thinking=true, temp=0.6)
| Model | MMLU-Pro 5% | vs Official 87.8% | Time | Notes |
|---|---|---|---|---|
| CARVE NVFP4 (es=0.75) | 86.7% (520/600) | -1.1pp | 46 min | 0 errors |
| Reference NVFP4 (nvidia) | 86.7% (520/600) | -1.1pp | 54 min | 0 errors |
Per-category comparison (CARVE vs Reference):
| Category | CARVE | Reference | Delta |
|---|---|---|---|
| biology | 96.4% (27/28) | 96.4% (27/28) | 0 |
| computer science | 94.7% (18/19) | 94.7% (18/19) | 0 |
| chemistry | 93.1% (67/72) | 91.7% (66/72) | +1.4 |
| math | 91.4% (53/58) | 91.4% (53/58) | 0 |
| health | 89.2% (33/37) | 91.9% (34/37) | -2.7 |
| business | 89.2% (33/37) | 89.2% (33/37) | 0 |
| economics | 89.2% (33/37) | 86.5% (32/37) | +2.7 |
| other | 86.0% (37/43) | 90.7% (39/43) | -4.7 |
| psychology | 84.6% (33/39) | 87.2% (34/39) | -2.6 |
| physics | 84.6% (55/65) | 78.5% (51/65) | +6.1 |
| history | 85.0% (17/20) | 80.0% (16/20) | +5.0 |
| philosophy | 84.4% (27/32) | 84.4% (27/32) | 0 |
| engineering | 78.3% (36/46) | 82.6% (38/46) | -4.3 |
| law | 76.1% (51/67) | 77.6% (52/67) | -1.5 |
- Both CARVE and Reference score exactly 86.7% (520/600)
- Per-category variations are noise (±6pp, evenly distributed)
CARVE MTP=2 Crossover Test
| Context | No MTP (tok/s) | MTP=2 (tok/s) | Delta | MTP wins? |
|---|---|---|---|---|
| short (~25 tok) | 74 | 106 | +43% | YES |
| 10k (9025 tok) | 73.7 | 84.1 | +14% | YES |
| 20k (18025 tok) | 70.2 | 69.0 | -2% | ~TIE |
| 50k (46025 tok) | 68.9 | 43.2 | -37% | NO |
| 100k (93025 tok) | 66.7 | 26.5 | -60% | NO |
| 151k (prior) | 67 | 19 | -72% | NO |
Crossover point: ~20k tokens input context.
- Below 20k: MTP=2 wins (+14-43%)
- At 20k: effectively tied
- Above 20k: MTP=2 progressively worse, -37% at 50k, -60% at 100k, -72% at 151k
- MTP acceptance rate degrades with context length for abliterated weights
- No-MTP stays remarkably flat: 74→67 tok/s across 0-151K (only -9%)
- Downloads last month
- 1,996
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support