CoLAR Qwen3-4B Flawed Fictions RL

This repository stores CoLAR exports in a Hugging Face-compatible layout. The repo root works for standard Transformers loading, and extra_state.pt preserves the latent head for latent decoding.

Current Revision

Current tag: exact-epoch16-step9856-val_reward=0.6875
Stage: reinforcement-learning exact export
Task: Flawed Fictions continuity error detection
Compare slug: qwen3_colar_rl_exact_epoch16_step9856

Tagged Checkpoints

Tag	Local reference	Status
`exact-epoch16-step9856-val_reward=0.6875`	exact epoch16 export	current commit

Previously Existing Tags

best-epoch16-step9856-val-reward-0.6562
best-epoch24-val_reward=0.6719
last-epoch28-val_reward=0.5781
second-epoch08-val_reward=0.6406

Files

HF model files at repo root for standard decoding
extra_state.pt for CoLAR latent decoding
export_meta.json from the local export
latent_metadata.json with archival provenance

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained('agurung/colar-qwen3-4b-ff-rl', revision='exact-epoch16-step9856-val_reward=0.6875', torch_dtype='auto', device_map='auto')
tokenizer = AutoTokenizer.from_pretrained('agurung/colar-qwen3-4b-ff-rl', revision='exact-epoch16-step9856-val_reward=0.6875')

For latent decoding, download the same revision and use extra_state.pt together with the repo root model files.

Notes

This exact export has monitor value 0.6875 in export_meta.json.
It is different from the older 0.6562 exact-style tag that was already on the Hub.

Downloads last month: 25

Safetensors

Model size

4B params

Tensor type

BF16

Video Preview

Reinforcement Learning

Model tree for agurung/colar-qwen3-4b-ff-rl

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1535)

this model