gupta-tanish/Ultrafeedback-gemma2-9b-it-top1vsbottom3-selection Viewer • Updated Aug 24, 2025 • 9.48k • 4
gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-8-binned-data Viewer • Updated Jul 25, 2025 • 60.8k • 8
gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-4-binned-data Viewer • Updated Jul 25, 2025 • 60.8k • 8
gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-2-binned-data Viewer • Updated Jul 25, 2025 • 60.8k • 8
gupta-tanish/QwQ-Long-CoT-30k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin-logp-10 Viewer • Updated Jul 19, 2025 • 59k • 9
gupta-tanish/QwQ-Long-CoT-30k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin Viewer • Updated Jul 19, 2025 • 107k • 8
gupta-tanish/QwQ-Long-CoT-10k-subset-llama3.1-8b-Inst-GPT4-Step-Perturbation-8-rejects Viewer • Updated Jul 18, 2025 • 42.3k • 8
gupta-tanish/QwQ-Long-CoT-20k-subset-Llama3.1-8B-Instruct-on-policy-step-wise-correct-trajectory Updated Jul 18, 2025 • 7
gupta-tanish/QwQ-Long-CoT-30k-subset-Llama3.1-8B-Instruct-on-policy-step-wise-correct-trajectory Updated Jul 18, 2025 • 7
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin-10 Viewer • Updated Jul 13, 2025 • 19.2k • 7
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-iter2-dynamic-perturbation-regex-generation-max-margin Viewer • Updated Jul 12, 2025 • 19k • 8
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-Instruct-on-policy-step-wise-correct-trajectory-iter2 Viewer • Updated Jul 12, 2025 • 23.5k • 9
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin-complete Viewer • Updated Jul 8, 2025 • 37.3k • 4
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-Instruct-on-policy-step-wise-correct-trajectory Viewer • Updated Jul 7, 2025 • 44.5k • 4
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-augmented-3-regex-generation-max-margin Viewer • Updated Jul 7, 2025 • 37.2k • 4
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-augmented-regex-generation-max-margin Viewer • Updated Jul 7, 2025 • 37.2k • 4
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin Viewer • Updated Jul 7, 2025 • 36.9k • 4
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin-top-k-2 Viewer • Updated Jul 6, 2025 • 66.4k • 4
gupta-tanish/Filtered-QwQ-Long-CoT-10k-subset-Llama3.1-8B-Instruct-model-pertubation-generation-logps-10 Viewer • Updated Jul 6, 2025 • 27.7k • 4
gupta-tanish/QwQ-Long-CoT-15k-subset-Llama3.1-8B-single-position-regex-perturbations-logps-12 Viewer • Updated Jul 4, 2025 • 53.8k • 4
gupta-tanish/QwQ-Long-CoT-15k-subset-Llama3.1-8B-single-position-regex-perturbations-logps-15 Viewer • Updated Jul 4, 2025 • 117k • 3
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-single-position-regex-perturbations-logps-10 Viewer • Updated Jul 4, 2025 • 27.7k • 3