update soon: CLM_R discriminator warmup data has been significantly overhauled, and model is being retrained

Qwen3.5-0.8b-"base" finetuned for my personal RL escapades on ~0.12B tokens:

40% from my pretraining set: the pile, textfiles.com, stackexchange, bluesky user response modelling data, ao3, the stack, random cybernetic control loops with attractor "goals" identified and stated before the controller's actions start
40% from FineWeb
20% warmup data for CLM_R, a reasoning generator and reasoning discriminator for general text completion

optimizer	training steps	batch size	schedule	lr	wd	max_norm	peft?	did i sweep for these?
bnb.optim.adamw 32-bit	493	~242k tokens (T^(2/3))	10% step warmup, linear decay to 0	2e-5	0.1	1.0	no	no

Safetensors

Model size

0.8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for crumbs-playground/qwen3.5-0.8b-base-me

Base model

Finetuned

(55)

this model