GLM-4.7-Flash - Opus Reasoning Finetune

Best checkpoint from training run - Eval Loss: 0.1504

Model Description

This is a fine-tuned version of GLM-4.7-Flash (30.4B parameters, Mixture of Experts) trained on a curated dataset focused on:

Agent/Tool-use workflows (93.3%)
Opus reasoning traces (5.2%)
Qwen reasoning data (1.4%)

Model Details

Base Model: unsloth/GLM-4.7-Flash (30.4B params, 64 experts)
Method: QLoRA (4-bit base, LoRA rank r=16)
Trainable Parameters: 1.39%
Precision: BF16
Context Length: 8192 tokens
Best Eval Loss: 0.1504 (checkpoint 2500)
Training Steps: 2500/4998

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "austindixson/glm-4.7-flash-Opus-Reasoning",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("austindixson/glm-4.7-flash-Opus-Reasoning")

# Generate
prompt = "Write a function to merge two sorted lists:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Chat Format

This model uses the GLM-4 chat template with thinking mode:

messages = [
    {"role": "user", "content": "What is 2+2?"}
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128)

Dataset Attribution

This model was fine-tuned on datasets licensed under Apache 2.0:

Primary Sources:

Opus-4.6-Reasoning by nohurry | Apache 2.0
Qwen3.5-reasoning by Jackrong | Apache 2.0

Training

Hardware: NVIDIA RTX PRO 6000 Blackwell (96GB VRAM)
Framework: Unsloth + Transformers + PEFT
Optimizer: AdamW 8-bit
Learning Rate: 2e-4 with cosine decay
Batch Size: 2 per device, gradient accumulation 8
Training Time: ~14 hours (partial run)

Performance

Eval Loss: 0.1504 (checkpoint 2500)
Training Loss: Converged smoothly from 2.0 to 0.09
Context: Handles up to 8192 token sequences

Use Cases

✅ Excellent for:

Tool-use and agent workflows
Mathematical reasoning
Code generation and debugging
Multi-step reasoning tasks
Problem-solving

License

Apache 2.0

Dataset Sources:

Opus-4.6-Reasoning: https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning
Qwen3.5-reasoning: https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning

Model trained by austindixson using Unsloth QLoRA

Downloads last month: 587

Safetensors

Model size

30B params

Tensor type

F32

BF16

Model tree for austindixson/glm-4.7-flash-Opus-Reasoning

Base model

zai-org/GLM-4.7-Flash

Finetuned

unsloth/GLM-4.7-Flash

Finetuned

(19)

this model

Quantizations

1 model