update soon: CLM_R discriminator warmup data has been significantly overhauled, and model is being retrained
Qwen3.5-0.8b-"base" finetuned for my personal RL escapades on ~0.12B tokens:
- 40% from my pretraining set: the pile, textfiles.com, stackexchange, bluesky user response modelling data, ao3, the stack, random cybernetic control loops with attractor "goals" identified and stated before the controller's actions start
- 40% from FineWeb
- 20% warmup data for CLM_R, a reasoning generator and reasoning discriminator for general text completion
| optimizer | training steps | batch size | schedule | lr | wd | max_norm | peft? | did i sweep for these? |
|---|---|---|---|---|---|---|---|---|
| bnb.optim.adamw 32-bit | 493 | ~242k tokens (T^(2/3)) | 10% step warmup, linear decay to 0 | 2e-5 | 0.1 | 1.0 | no | no |
- Downloads last month
- 57
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for crumbs-playground/qwen3.5-0.8b-base-me
Base model
Qwen/Qwen3.5-0.8B-Base