GLM-4.7-Flash - Opus Reasoning Finetune

Best checkpoint from training run - Eval Loss: 0.1504

Model Description

This is a fine-tuned version of GLM-4.7-Flash (30.4B parameters, Mixture of Experts) trained on a curated dataset focused on:

  • Agent/Tool-use workflows (93.3%)
  • Opus reasoning traces (5.2%)
  • Qwen reasoning data (1.4%)

Model Details

  • Base Model: unsloth/GLM-4.7-Flash (30.4B params, 64 experts)
  • Method: QLoRA (4-bit base, LoRA rank r=16)
  • Trainable Parameters: 1.39%
  • Precision: BF16
  • Context Length: 8192 tokens
  • Best Eval Loss: 0.1504 (checkpoint 2500)
  • Training Steps: 2500/4998

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "austindixson/glm-4.7-flash-Opus-Reasoning",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("austindixson/glm-4.7-flash-Opus-Reasoning")

# Generate
prompt = "Write a function to merge two sorted lists:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Chat Format

This model uses the GLM-4 chat template with thinking mode:

messages = [
    {"role": "user", "content": "What is 2+2?"}
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128)

Dataset Attribution

This model was fine-tuned on datasets licensed under Apache 2.0:

Primary Sources:

  • Opus-4.6-Reasoning by nohurry | Apache 2.0
  • Qwen3.5-reasoning by Jackrong | Apache 2.0

Training

  • Hardware: NVIDIA RTX PRO 6000 Blackwell (96GB VRAM)
  • Framework: Unsloth + Transformers + PEFT
  • Optimizer: AdamW 8-bit
  • Learning Rate: 2e-4 with cosine decay
  • Batch Size: 2 per device, gradient accumulation 8
  • Training Time: ~14 hours (partial run)

Performance

  • Eval Loss: 0.1504 (checkpoint 2500)
  • Training Loss: Converged smoothly from 2.0 to 0.09
  • Context: Handles up to 8192 token sequences

Use Cases

Excellent for:

  • Tool-use and agent workflows
  • Mathematical reasoning
  • Code generation and debugging
  • Multi-step reasoning tasks
  • Problem-solving

License

Apache 2.0

Dataset Sources:


Model trained by austindixson using Unsloth QLoRA

Downloads last month
587
Safetensors
Model size
30B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for austindixson/glm-4.7-flash-Opus-Reasoning

Finetuned
(19)
this model
Quantizations
1 model