ponytang3
/

Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2-GGUF

chain-of-thought

Model card Files Files and versions

Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2-GGUF

Model Description

This is the Q4_K_M GGUF quantized version of ponytang3/Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2.

Original Model

Base Model: unsloth/Qwen3.5-35B-A3B
Fine-tuned Model: ponytang3/Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2
Quantization: Q4_K_M (4-bit quantization)

Training Details

Method: bf16 LoRA + response-only (train_on_responses_only)
LoRA Rank: 16
Epochs: 2
Max Sequence Length: 4096
Framework: Unsloth + TRL

Datasets

nohurry/Opus-4.6-Reasoning-3000x-filtered
Jackrong/Qwen3.5-reasoning-700x
Roman1111111/claude-opus-4.6-10000x

Usage with llama.cpp

./llama-cli -m model-q4_k_m.gguf -p "Your prompt here" -n 512

Format

The model uses <think>...</think> tags for chain-of-thought reasoning.

Downloads last month: 168

GGUF

Model size

35B params

Architecture

qwen35moe

Hardware compatibility

Log In to add your hardware

4-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ponytang3/Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2-GGUF

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Finetuned

unsloth/Qwen3.5-35B-A3B

Finetuned

ponytang3/Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2

Quantized

(2)

this model

Datasets used to train ponytang3/Qwen3.5-35B-A3B-Opus-Reasoning-Distilled-v2-GGUF