Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy
And… it's already supported in TRL, built by Kashif Rasul. you can really feel the pace of development in the team 🐎
Paper by Ruixiang ZHANG, He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang at Apple 🍎
How it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed
One neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train × T_eval, so a broad band of configs works well. even very noisy samples still help
I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:
- One-shot car physics with real drift mechanics (this is hard)
- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen
- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters
- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!
- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed
You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai🚀🚀
> support for multimodal tool responses for environments (OpenEnv) > an example to train it in CARLA for autonomous driving with image-based tool calls