GGUF
reasoning
chain-of-thought
q4_k_m
qwen3.5
conversational

Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2-GGUF

Model Description

This is the Q4_K_M GGUF quantized version of ponytang3/Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2.

Original Model

  • Base Model: unsloth/Qwen3.5-35B-A3B
  • Fine-tuned Model: ponytang3/Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2
  • Quantization: Q4_K_M (4-bit quantization)

Training Details

  • Method: bf16 LoRA + response-only (train_on_responses_only)
  • LoRA Rank: 16
  • Epochs: 2
  • Max Sequence Length: 4096
  • Framework: Unsloth + TRL

Datasets

  • nohurry/Opus-4.6-Reasoning-3000x-filtered
  • Jackrong/Qwen3.5-reasoning-700x
  • Roman1111111/claude-opus-4.6-10000x

Usage with llama.cpp

./llama-cli -m model-q4_k_m.gguf -p "Your prompt here" -n 512

Format

The model uses <think>...</think> tags for chain-of-thought reasoning.

Downloads last month
168
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ponytang3/Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2-GGUF

Datasets used to train ponytang3/Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2-GGUF