Models

500

Full-text search

Active filters: rlhf

Nix-ai/Nix2.6-m

camgeodesic/olmo3-7b-instruct-only

Text Generation • 528k • Updated Feb 25 • 6

linius/Qwen3-8B-SPoT

8B • Updated Mar 5 • 4 • 2

VijayShinde1996/vrs-sft-Qwen2_5-7B-it

Text Generation • 8B • Updated Mar 10 • 1 • 1

Tamil-ai/tamil-qwen25-14b-morph-rlmv

Text Generation • 15B • Updated Mar 11 • 7

littlekoyo/MotionCritic

AIJian/PaTaRM-8B

Text Generation • 0.5B • Updated 18 days ago • 736

AIJian/PaTaRM-14B

Text Generation • 0.5B • Updated 19 days ago • 1.27k

mradermacher/PaTaRM-8B-GGUF

8B • Updated 19 days ago • 489

DataPilot/ArrowCanaria-Llama-8B-RL-v0.1

Text Generation • 8B • Updated 30 days ago • 185 • 7

mradermacher/ArrowCanaria-Llama-8B-RL-v0.1-GGUF

8B • Updated 29 days ago • 792 • 1

mradermacher/ArrowCanaria-Llama-8B-RL-v0.1-i1-GGUF

8B • Updated 29 days ago • 4.95k • 2

DEAR-Tao/Qwen2.5-1.5B-Instruct-GRPO-think-lora

Reinforcement Learning • 2B • Updated 28 days ago • 273

usama10/qwen-7b-reward-model

Text Classification • Updated 28 days ago

sttjr/paganini-qwen35-27b-grpo-lora

Reinforcement Learning • Updated 28 days ago • 24

vadimbelsky/qwen3.5-medical-ft-stage3-dpo

Image-Text-to-Text • 10B • Updated 21 days ago • 555

adinetwork/adi-v0.1-base

Text Generation • Updated 25 days ago

dennisonb/qwen25-tax-3b

Reinforcement Learning • 3B • Updated 24 days ago • 47

CraneAILabs/luganda-reward-model

Text Classification • 1.0B • Updated 10 days ago • 72

Shubhamw11/gemma-3-270m-dpo-negative

Updated 18 days ago • 69 • 1

yaoyuanlf/Qwen2.5-VL-7B-Physics-RLHF

Image-Text-to-Text • 8B • Updated 13 days ago • 35

jang1563/biorlhf-grpo-mistral-7b

Text Generation • Updated 16 days ago • 14

pranav6905/Llama-3.2-1B-DPO-DPOMix-Adapters

Updated 12 days ago • 73

pranav6905/llama-1b-sft-dpo-final

1B • Updated 12 days ago

WisdomShell/GRIP-Llama-3-8B

Text Generation • 8B • Updated 6 days ago • 724 • 2

mr3haque/SLM-RL-Agents

Text Generation • Updated 2 days ago

mradermacher/GRIP-Llama-3-8B-GGUF

8B • Updated 7 days ago • 854 • 1

mradermacher/GRIP-Llama-3-8B-i1-GGUF

8B • Updated 7 days ago • 3.98k

retofan23333/UniDG-RFT-LoRA-Release

Image-to-Image • Updated 4 days ago

whalexdfsa/open-rs2-GPRA

Text Generation • Updated 3 days ago