update soon: CLM_R discriminator warmup data has been significantly overhauled, and model is being retrained

Qwen3.5-0.8b-"base" finetuned for my personal RL escapades on ~0.12B tokens:

  • 40% from my pretraining set: the pile, textfiles.com, stackexchange, bluesky user response modelling data, ao3, the stack, random cybernetic control loops with attractor "goals" identified and stated before the controller's actions start
  • 40% from FineWeb
  • 20% warmup data for CLM_R, a reasoning generator and reasoning discriminator for general text completion
optimizer training steps batch size schedule lr wd max_norm peft? did i sweep for these?
bnb.optim.adamw 32-bit 493 ~242k tokens (T^(2/3)) 10% step warmup, linear decay to 0 2e-5 0.1 1.0 no no
Downloads last month
57
Safetensors
Model size
0.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for crumbs-playground/qwen3.5-0.8b-base-me

Finetuned
(55)
this model