Qwen3-235B-A22B-EAGLE3 (Speculators Format)

This is a conversion of lmsys/Qwen3-235B-A22B-EAGLE3 to the vLLM speculators format for use with Eagle3 speculative decoding.

Model Details

Base Model: Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
Draft Model Architecture: Llama-based Eagle3 head
Original Model: lmsys/Qwen3-235B-A22B-EAGLE3
Format: vLLM Speculators v0.1.0.dev42

Model Configuration

Draft Vocabulary Size: 32,000
Target Vocabulary Size: 151,936
Hidden Size: 4,096
Intermediate Size: 24,576
Number of Layers: 1 (Eagle3 head layer)
Attention Heads: 64
KV Heads: 4
Auxiliary Hidden State Layers: [1, 46, 90]

Usage

This model is designed to be used with vLLM's Eagle3 speculative decoding implementation:

from vllm import LLM

llm = LLM(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
    speculative_config={
        "method": "eagle3",
        "model": "nm-testing/Qwen3-235B-A22B-EAGLE3-converted-speculators-lmsys",
        "num_speculative_tokens": 3,
    },
    tensor_parallel_size=2,
)

Or via command line:

python examples/offline_inference/spec_decode.py \
  --method "eagle3" \
  --tp 2 \
  --model-dir "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8" \
  --eagle-dir "nm-testing/Qwen3-235B-A22B-EAGLE3-converted-speculators-lmsys" \
  --num-spec-tokens 3

Conversion Details

The original Eagle3 config format has been converted to the vLLM speculators format with the following changes:

Architecture: Changed from LlamaForCausalLMEagle3 to Eagle3Speculator
Config Structure: Reorganized into transformer_layer_config and speculators_config sections
Auxiliary Layers: Extracted from eagle_config.eagle_aux_hidden_state_layer_ids to top-level eagle_aux_hidden_state_layer_ids
Verifier Config: Added explicit verifier model specification

Files

config.json: Model configuration in speculators format
model.safetensors: Model weights (unchanged from original)

Citation

If you use this model, please cite the original Eagle3 paper and the LMSYS team:

@article{li2024eagle,
  title={EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees},
  author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
  journal={arXiv preprint arXiv:2406.16858},
  year={2024}
}

License

Same as the original model: lmsys/Qwen3-235B-A22B-EAGLE3

Downloads last month: 2

Safetensors

Model size

1B params

Tensor type

I64

BF16

BOOL

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for nm-testing/Qwen3-235B-A22B-EAGLE3-converted-speculators-lmsys

EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

Paper • 2406.16858 • Published Jun 24, 2024 • 1