qwen3-8b-dapo-fulltokens-creativity-step8

Qwen3-8B DAPO model trained on full tokens (Creativity dataset). Checkpoint: step 8 (val ACC=0.794, Reward=0.7629)

Model Details

This model is fine-tuned using DAPO (Direct Alignment from Preference Optimization) on the Creativity dataset.

Training Details

  • Base Model: Qwen/Qwen3-8B-Instruct
  • Training Method: DAPO
  • Dataset: Creativity (train/val split)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Echoandland/qwen3-8b-dapo-fulltokens-creativity-step8")
tokenizer = AutoTokenizer.from_pretrained("Echoandland/qwen3-8b-dapo-fulltokens-creativity-step8")

# Your code here
Downloads last month
1
Safetensors
Model size
8B params
Tensor type
F32
·
Video Preview
loading