qwen3-8b-dapo-fulltokens-creativity-step11
Qwen3-8B DAPO model trained on full tokens (Creativity dataset). Best checkpoint: step 11 (val ACC=0.796, Reward=0.7654)
Model Details
This model is fine-tuned using DAPO (Direct Alignment from Preference Optimization) on the Creativity dataset.
Training Details
- Base Model: Qwen/Qwen3-8B-Instruct
- Training Method: DAPO
- Dataset: Creativity (train/val split)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Echoandland/qwen3-8b-dapo-fulltokens-creativity-step11")
tokenizer = AutoTokenizer.from_pretrained("Echoandland/qwen3-8b-dapo-fulltokens-creativity-step11")
# Your code here
- Downloads last month
- 1