Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.14476

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 118

ByteDance Papers

ByteDance papers collection

about 2 hours ago

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021 • 1
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

Papers + RL/Reasoning

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published Apr 7, 2025 • 26
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11, 2025 • 33
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15, 2025 • 20

Papers reimplemented

List of research papers, architectures, and techniques reimplemented in LLM-quest or Hugging Face's TRL. Missing: Qwen3.5, Qwen3-Next, GPT-2

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220
Reinforced Attention Learning

Paper • 2602.04884 • Published Feb 4 • 29
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published Feb 4 • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9, 2025 • 105
Robot Learning from a Physical World Model

Paper • 2511.07416 • Published Nov 10, 2025 • 32
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

Paper • 2511.06805 • Published Nov 10, 2025 • 13
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms

Paper • 2511.17592 • Published Nov 17, 2025 • 121

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Paper • 2510.07499 • Published Oct 8, 2025 • 49
Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Paper • 2509.13683 • Published Sep 17, 2025 • 8
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering

Paper • 2509.00798 • Published Aug 31, 2025 • 1

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146

large language model

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 162
OpenDFM/ChemDFM-v1.5-8B

Text Generation • 8B • Updated Oct 26, 2025 • 6.97k • 8

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146

Papers reimplemented

List of research papers, architectures, and techniques reimplemented in LLM-quest or Hugging Face's TRL. Missing: Qwen3.5, Qwen3-Next, GPT-2

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220
Reinforced Attention Learning

Paper • 2602.04884 • Published Feb 4 • 29
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published Feb 4 • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9, 2025 • 105
Robot Learning from a Physical World Model

Paper • 2511.07416 • Published Nov 10, 2025 • 32
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

Paper • 2511.06805 • Published Nov 10, 2025 • 13
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms

Paper • 2511.17592 • Published Nov 17, 2025 • 121

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 118

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Paper • 2510.07499 • Published Oct 8, 2025 • 49
Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Paper • 2509.13683 • Published Sep 17, 2025 • 8
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering

Paper • 2509.00798 • Published Aug 31, 2025 • 1

ByteDance Papers

ByteDance papers collection

about 2 hours ago

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021 • 1
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146

Papers + RL/Reasoning

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published Apr 7, 2025 • 26
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11, 2025 • 33
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15, 2025 • 20

large language model

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 162
OpenDFM/ChemDFM-v1.5-8B

Text Generation • 8B • Updated Oct 26, 2025 • 6.97k • 8

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs