Models

21,378

Full-text search

Active filters: grpo

mradermacher/FinSenti-Qwen3.5-4B-GGUF

4B • Updated about 17 hours ago • 704 • 1

Ayansk11/FinSenti-Qwen3.5-9B

Text Generation • 10B • Updated about 22 hours ago • 24 • 1

mradermacher/MINT-empathy-Qwen3-4B-GGUF

Reinforcement Learning • 4B • Updated 9 days ago • 606 • 1

gradients-io-tournaments/tournament-tourn_da8e132b7783f8ac_20260413-fca0f4de-07af-4310-a315-7d3ba0e41473-5DhaE3Mu

Text Generation • Updated 7 days ago • 34 • 1

migub/lagrpo-self-only-v2

Updated 7 days ago • 1

migub/lagrpo-fair-only

Updated 6 days ago • 1

Chun121/Qwen3-4B-RPG-Roleplay-V2

Text Generation • 4B • Updated Aug 24, 2025 • 14k • 51

onuryozcu/llama

Text Generation • 0.1B • Updated Mar 10, 2025 • 7

amiguel/promptTuning

8B • Updated Feb 16, 2025 • 2

sergiopaniego/Qwen2-0.5B-GRPO-test

Updated Oct 3, 2025

Novaciano/ESP-NSFW-GRPO-1B-Sin_Censura-GGUF

1B • Updated Jan 28, 2025 • 222 • 4

nbd22/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora

Updated Jan 28, 2025

sergiopaniego/Qwen2-0.5B-GRPO

Updated Jan 31, 2025

philschmid/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Jan 30, 2025 • 72 • 8

spinech/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Apr 28, 2025 • 3

Dongwei/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • 2B • Updated Feb 2, 2025 • 4 • 1

yooneo/qwen-0.5b-r1-aha

Updated Jan 31, 2025

yooneo/qwen-1.5b-r1-aha

Updated Jan 31, 2025

spinech/qwen2.5-3b-r1-rearc-stage1

Text Generation • 3B • Updated Apr 28, 2025 • 4

Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO

Text Generation • 8B • Updated Feb 3, 2025 • 22 • 1

MasterControlAIML/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured

Text Generation • 2B • Updated Feb 3, 2025 • 10 • 5

mradermacher/DeepSeek-R1-Strategy-Qwen-2.5-1.5b-Unstructured-To-Structured-GGUF

2B • Updated Feb 3, 2025 • 200 • 2

hyunw3/qwen-2.5-0.5b-r1-countdown

Text Generation • 0.5B • Updated Apr 30, 2025 • 1

hyunw3/qwen-2.5-0.5b-r1-countdown_lr1.0e-6

Text Generation • 0.5B • Updated Jun 3, 2025 • 9

mgaimm/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Feb 1, 2025 • 3

MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b-Latest-Unstructured-To-Structured

Text Generation • Updated Feb 3, 2025 • 18 • 5

tuyentx/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Feb 2, 2025 • 2

pablo-chocobar/qwen-2.5-3b-r1-countdown

Text Generation • 3B • Updated Jul 21, 2025 • 2

mradermacher/Qwen2.5-1.5B-Open-R1-GRPO-GGUF

2B • Updated Feb 2, 2025 • 50

Julian-Sheeper/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • 0.1B • Updated Feb 2, 2025 • 1