Models

196

Full-text search

Active filters: GRPO

TharunSivamani/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 16, 2025 • 2

frascuchon/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 17, 2025 • 5

bhaveshgoel07/SmolGRPO-135M

Updated Mar 18, 2025

Arushhh/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 24, 2025 • 2

hiroyuki0823/SakanaAI-TinySwallow-1.5B-Instruct-GRPO-lora

Updated Mar 24, 2025 • 4

ykarout/Phi4-ThinkMode-fp16

Text Generation • 15B • Updated Mar 27, 2025 • 5

mradermacher/Phi4-ThinkMode-fp16-GGUF

15B • Updated Jul 11, 2025 • 59

czuo03/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 28, 2025 • 2

mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-GGUF

1.0B • Updated Jul 11, 2025 • 63 • 1

mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-i1-GGUF

1.0B • Updated Jul 11, 2025 • 85 • 1

opria123/SmolGRPO-135M

Text Generation • 0.1B • Updated Apr 6, 2025 • 4

alonsosilva/SmolGRPO-135M

Text Generation • 0.1B • Updated Apr 8, 2025 • 8

VaidikML0508/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1

Text Generation • 3B • Updated Apr 22, 2025 • 8 • 1

mradermacher/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1-GGUF

3B • Updated Jul 11, 2025 • 67

alfredcs/gemma-3-12b-grpo-firstaid

Updated Apr 24, 2025

garethpaul/SmolGRPO-135M

Text Generation • 0.1B • Updated May 8, 2025 • 5

Thabet/SmolGRPO-135M-learning

Text Generation • 0.1B • Updated May 10, 2025 • 2

jcollado/SmolGRPO-135M

Text Generation • 0.1B • Updated May 14, 2025 • 4

Brianpuz/SmolGRPO-135M

Text Generation • 0.1B • Updated May 19, 2025 • 7

yigitkucuk/tint-interact-sft-grpo

Text Generation • 0.4B • Updated May 19, 2025 • 4

koochikoo25/SmolGRPO-135M

Text Generation • 0.1B • Updated May 20, 2025 • 3

jackle33/SmolGRPO-135M

Text Generation • 0.1B • Updated May 22, 2025 • 2

TianheWu/VisualQuality-R1-7B

Reinforcement Learning • 8B • Updated Sep 19, 2025 • 1.13k • 11

pedrocurvo/llama2-grpo-lora

Text Generation • 7B • Updated May 26, 2025 • 2

mradermacher/VisualQuality-R1-7B-GGUF

8B • Updated Jul 31, 2025 • 961

HuangXinBa/GRPO

Text Generation • 0.1B • Updated May 28, 2025 • 6 • 1

Ceenen2302/Llama-3.2-1B-Instruct-GRPO-SmartLed

Feature Extraction • 1B • Updated Jun 3, 2025 • 1

alfredcs/torchrun-gemma-3-12b-grpo-icd10pcs-merged

Text Generation • 8B • Updated Jun 4, 2025 • 3

Ceenen2302/Llama-3.2-1B-Instruct-GRPO

Text Generation • 1B • Updated Jun 5, 2025 • 2

alperenyildiz/Mistral-7B-Instruct-v0.3_q8_0_GRPO

Text Generation • 7B • Updated Jun 6, 2025 • 2