Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2604.27085

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 8 days ago • 61
Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 8 days ago • 38
Leveraging Verifier-Based Reinforcement Learning in Image Editing

Paper • 2604.27505 • Published 7 days ago • 55

Agent Collaborations

about 8 hours ago

Running

3

Efficient Optimizer Live

🤗

3

Dashboard for the Efficient Optimizer challenge
ml-intern-explorers/efficient-optimizer-collab

578 kB
Running

1

Parameter Golf Live

🤗

1

Live chat + leaderboard for the Parameter Golf challenge
ml-intern-explorers/parameter-golf-collab

922 kB

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 64
Direct Preference Optimization Using Sparse Feature-Level Constraints

Paper • 2411.07618 • Published Nov 12, 2024 • 17
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9, 2025 • 55

Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 8 days ago • 38
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Paper • 2604.04921 • Published Apr 6 • 112

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Paper • 2510.03259 • Published Sep 26, 2025 • 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8, 2025 • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Paper • 2510.08308 • Published Oct 9, 2025 • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 8 days ago • 61
Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 8 days ago • 38
Leveraging Verifier-Based Reinforcement Learning in Image Editing

Paper • 2604.27505 • Published 7 days ago • 55

Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 8 days ago • 38
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Paper • 2604.04921 • Published Apr 6 • 112

Agent Collaborations

about 8 hours ago

Running

3

Efficient Optimizer Live

🤗

3

Dashboard for the Efficient Optimizer challenge
ml-intern-explorers/efficient-optimizer-collab

578 kB
Running

1

Parameter Golf Live

🤗

1

Live chat + leaderboard for the Parameter Golf challenge
ml-intern-explorers/parameter-golf-collab

922 kB

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Paper • 2510.03259 • Published Sep 26, 2025 • 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8, 2025 • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Paper • 2510.08308 • Published Oct 9, 2025 • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 64
Direct Preference Optimization Using Sparse Feature-Level Constraints

Paper • 2411.07618 • Published Nov 12, 2024 • 17
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9, 2025 • 55

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs