qwen3-4b-h100-v5-hard-ep3
Top-ranker strategy model. Trained on H100 with a blend of three datasets (approx. 14k rows) and heavily preprocessed with custom clean_assistant_output_v2 (CoT stripping, markdown removal, TOML comment removal).
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Max sequence length: 4096
- Epochs: 3
- Learning rate: 2e-5
- Effective Batch size: 32 (BS=8, GradAccum=4)
- LoRA R: 128
- Downloads last month
- 245
Model tree for shinich001/qwen3-4b-h100-v5-hard-ep3
Base model
Qwen/Qwen3-4B-Instruct-2507