Data

This folder contains four JSONL files that form a minimal example for merged training and evaluation across two tasks: math reasoning and instruction following.

Training Data

DAPO-Math-17k


Lines	17,398
Source	`zhuzilin/dapo-math-17k`
Reference	DAPO paper (ByteDance Seed)

Math reasoning training data originally released alongside the DAPO paper. Each line includes a data_source field set to dapo-math-17k.

VerInstruct


Lines	19,756
Source	`THU-KEG/VerInstruct`
Reference	VerInstruct paper

Instruction-following training data. The original dataset provides both hard (function-verifiable) and soft (LLM-judge rubric-based) reward signals. For simplicity, only items with hard constraints are included here; soft constraints have been removed.

Evaluation Data

AIME 2024


Lines	32
Source	`zhuzilin/aime-2024`
Task	Math evaluation

IFBench


Lines	300
Source	`zyzshishui0627/IFBench`
Task	Instruction-following evaluation

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for zhangzx369/curriculum-learning-minimal-example

VerIF: Verification Engineering for Reinforcement Learning in Instruction Following

Paper • 2506.09942 • Published Jun 11, 2025 • 5

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146