Inference Providers
Active filters: grpo
mradermacher/FinSenti-Qwen3.5-4B-GGUF
4B • Updated • 704
• 1
Ayansk11/FinSenti-Qwen3.5-9B
Text Generation
• 10B • Updated • 24
• 1
mradermacher/MINT-empathy-Qwen3-4B-GGUF
Reinforcement Learning
• 4B • Updated • 606
• 1
gradients-io-tournaments/tournament-tourn_da8e132b7783f8ac_20260413-fca0f4de-07af-4310-a315-7d3ba0e41473-5DhaE3Mu
Text Generation
• Updated • 34
• 1
migub/lagrpo-self-only-v2
Chun121/Qwen3-4B-RPG-Roleplay-V2
Text Generation
• 4B • Updated • 14k
• 51
Text Generation
• 0.1B • Updated • 7
8B • Updated • 2
sergiopaniego/Qwen2-0.5B-GRPO-test
Updated
Novaciano/ESP-NSFW-GRPO-1B-Sin_Censura-GGUF
1B • Updated • 222
• 4
nbd22/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora
Updated
sergiopaniego/Qwen2-0.5B-GRPO
Updated
philschmid/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 72
• 8
spinech/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 3
Dongwei/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• 2B • Updated • 4
• 1
spinech/qwen2.5-3b-r1-rearc-stage1
Text Generation
• 3B • Updated • 4
Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO
Text Generation
• 8B • Updated • 22
• 1
MasterControlAIML/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured
Text Generation
• 2B • Updated • 10
• 5
mradermacher/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured-GGUF
2B • Updated • 200
• 2
hyunw3/qwen-2.5-0.5b-r1-countdown
Text Generation
• 0.5B • Updated • 1
hyunw3/qwen-2.5-0.5b-r1-countdown_lr1.0e-6
Text Generation
• 0.5B • Updated • 9
mgaimm/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 3
MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b-Latest-Unstructured-To-Structured
Text Generation
• Updated • 18
• 5
tuyentx/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 2
pablo-chocobar/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 2
mradermacher/Qwen2.5-1.5B-Open-R1-GRPO-GGUF
2B • Updated • 50
Julian-Sheeper/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• 0.1B • Updated • 1