Gemma-4-E4B-DECKARD-HERETIC-NVFP4

NVFP4-quantized EAGLE-style speculative-decoding drafter for the standard (non-uncensored) Gemma 4 31B DECKARD HERETIC. Pair with the matching target model for accelerated single-stream decode on Blackwell-class GPUs (DGX Spark / RTX PRO 6000 / RTX 5090 / B100 / B200).

For the abliterated/uncensored variant of this drafter, see AEON-7/Gemma-4-E4B-DECKARD-HERETIC-Uncensored-NVFP4. For the target it accelerates, see the gemma-4-31B-it-speculator.eagle3-NVFP4 collection on this profile.

Files

  • model.safetensors — NVFP4-quantized weights
  • hf_quant_config.json — modelopt quant config
  • chat_template.jinja — Gemma 4 chat template
  • config.json / generation_config.json / tokenizer.* / processor_config.json

Quick start (vLLM, as drafter)

vllm serve <target-model-id> \
  --speculative-config '{"method":"eagle3","model":"AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4","num_speculative_tokens":3}' \
  --trust-remote-code

License

Inherits the Gemma Terms of Use. Use of this model is subject to those terms.


☕ Support the work

If this release has been useful, tips are deeply appreciated — they go directly toward more compute, more models, and more open releases.

₿ Bitcoin (BTC)
BTC QR
bc1q09xmzn00q4z3c5raene0f3pzn9d9pvawfm0py4
Ξ Ethereum (ETH)
ETH QR
0x1512667F6D61454ad531d2E45C0a5d1fd82D0500
◎ Solana (SOL)
SOL QR
DgQsjHdAnT5PNLQTNpJdpLS3tYGpVcsHQCkpoiAKsw8t
ⓜ Monero (XMR)
XMR QR
836XrSKw4R76vNi3QPJ5Fa9ugcyvE2cWmKSPv3AhpTNNKvqP8v5ba9JRL4Vh7UnFNjDz3E2GXZDVVenu3rkZaNdUFhjAvgd

Ethereum L2s (Base, Arbitrum, Optimism, Polygon, etc.) and EVM-compatible tokens can be sent to the same Ethereum address.

Downloads last month
25
Safetensors
Model size
6B params
Tensor type
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including AEON-7/Gemma-4-E4B-DECKARD-HERETIC-NVFP4