GLM-4.7-Flash Fine-tuned - Checkpoint 1100 ⭐ Best So Far

Model Description

Best checkpoint of fine-tuned GLM-4.7-Flash, optimized for:

  • Reasoning: Mathematical and scientific reasoning tasks
  • Coding: Code generation and debugging
  • Tool Calling: Function calling and agent workflows

Training Status

  • Checkpoint: Step 1100/4,998 (22% complete)
  • Training Loss: ~0.29 (latest)
  • Eval Loss: 0.3025 ⭐ Best checkpoint
  • Epoch: 0.44 / 2.0
  • Status: Training continues in background

Why This Checkpoint?

This is the best performing checkpoint so far based on evaluation loss:

  • Step 800: eval_loss 0.3797
  • Step 900: eval_loss 0.3542
  • Step 1000: eval_loss 0.3255
  • Step 1100: eval_loss 0.3025 ⭐

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "austindixson/glm-4.7-flash-checkpoint-1100",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("austindixson/glm-4.7-flash-checkpoint-1100")

# Example usage
messages = [{"role": "user", "content": "Write a function to calculate fibonacci numbers"}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Training Details

  • Base Model: unsloth/GLM-4.7-Flash (MoE, 64 experts, 30.4B parameters)
  • Method: QLoRA (4-bit quantization)
  • Trainable Parameters: 423.9M (1.40% of total)
  • Sequence Length: 8192 tokens
  • Batch Size: 16 effective
  • Learning Rate: 2e-4 with warmup
  • Precision: BF16

Datasets

Trained on ~45K examples from:

  • agent-dataset-hybrid (22K)
  • Opus-4.6-Reasoning-3000x (~2.3K)
  • Qwen3.5-reasoning-700x (~700)

Reasoning Format

The model uses explicit <thinking> tags for structured reasoning:

<thinking>
Let me work through this step by step...
</thinking>

Final answer here

Hardware Requirements

Recommended:

  • GPU Memory: 12GB+ for inference
  • For 8192 context: 16GB+ VRAM recommended

Compatible GPUs:

  • RTX 3060 12GB (use Q4_K_M quantization)
  • RTX 3090 24GB
  • Mac M4 16GB (GGUF format)

Note

Training continues from this checkpoint. The final model will be available after training completes.

License

Inherits license from base GLM-4.7-Flash model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support