Inference Providers
Active filters: GRPO
TharunSivamani/SmolGRPO-135M
Text Generation
• 0.1B • Updated • 2
Text Generation
• 0.1B • Updated • 5
bhaveshgoel07/SmolGRPO-135M
Updated
Text Generation
• 0.1B • Updated • 2
hiroyuki0823/SakanaAI-TinySwallow-1.5B-Instruct-GRPO-lora
ykarout/Phi4-ThinkMode-fp16
Text Generation
• 15B • Updated • 5
mradermacher/Phi4-ThinkMode-fp16-GGUF
15B • Updated • 59
Text Generation
• 0.1B • Updated • 2
mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-GGUF
1.0B • Updated • 63
• 1
mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-i1-GGUF
1.0B • Updated • 85
• 1
Text Generation
• 0.1B • Updated • 4
alonsosilva/SmolGRPO-135M
Text Generation
• 0.1B • Updated • 8
VaidikML0508/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1
Text Generation
• 3B • Updated • 8
• 1
mradermacher/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1-GGUF
3B • Updated • 67
alfredcs/gemma-3-12b-grpo-firstaid
Updated
Text Generation
• 0.1B • Updated • 5
Thabet/SmolGRPO-135M-learning
Text Generation
• 0.1B • Updated • 2
Text Generation
• 0.1B • Updated • 4
Text Generation
• 0.1B • Updated • 7
yigitkucuk/tint-interact-sft-grpo
Text Generation
• 0.4B • Updated • 4
koochikoo25/SmolGRPO-135M
Text Generation
• 0.1B • Updated • 3
Text Generation
• 0.1B • Updated • 2
TianheWu/VisualQuality-R1-7B
Reinforcement Learning
• 8B • Updated • 1.13k
• 11
pedrocurvo/llama2-grpo-lora
Text Generation
• 7B • Updated • 2
mradermacher/VisualQuality-R1-7B-GGUF
8B • Updated • 961
Text Generation
• 0.1B • Updated • 6
• 1
Ceenen2302/Llama-3.2-1B-Instruct-GRPO-SmartLed
Feature Extraction
• 1B • Updated • 1
alfredcs/torchrun-gemma-3-12b-grpo-icd10pcs-merged
Text Generation
• 8B • Updated • 3
Ceenen2302/Llama-3.2-1B-Instruct-GRPO
Text Generation
• 1B • Updated • 2
alperenyildiz/Mistral-7B-Instruct-v0.3_q8_0_GRPO
Text Generation
• 7B • Updated • 2