Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.10622

Papers reimplemented

List of research papers, architectures, and techniques reimplemented in LLM-quest or Hugging Face's TRL. Missing: Qwen3.5, Qwen3-Next, GPT-2

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220
Reinforced Attention Learning

Paper • 2602.04884 • Published Feb 4 • 29
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published Feb 4 • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3

High Low Media's AI/ML bookshelf

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Paper • 1502.01852 • Published Feb 6, 2015 • 1
Deep Residual Learning for Image Recognition

Paper • 1512.03385 • Published Dec 10, 2015 • 16
Focal Loss for Dense Object Detection

Paper • 1708.02002 • Published Aug 7, 2017
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers

Paper • 2409.20537 • Published Sep 30, 2024 • 13

You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13, 2025 • 37
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published Feb 20, 2025 • 175
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3, 2025 • 32

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published Mar 5, 2025 • 233
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18, 2025 • 154
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published Mar 14, 2025 • 148

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

NN Arch Components

A Unified View of Attention and Residual Sinks: Outlier-Driven Rescaling is Essential for Transformer Training

Paper • 2601.22966 • Published Jan 30
STEM: Scaling Transformers with Embedding Modules

Paper • 2601.10639 • Published Jan 15 • 2
Deep Delta Learning

Paper • 2601.00417 • Published Jan 1 • 34
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 322

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

Fun journal papers Ive read

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published Mar 5, 2025 • 233
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20, 2025 • 96

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

Papers reimplemented

List of research papers, architectures, and techniques reimplemented in LLM-quest or Hugging Face's TRL. Missing: Qwen3.5, Qwen3-Next, GPT-2

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220
Reinforced Attention Learning

Paper • 2602.04884 • Published Feb 4 • 29
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published Feb 4 • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3

NN Arch Components

A Unified View of Attention and Residual Sinks: Outlier-Driven Rescaling is Essential for Transformer Training

Paper • 2601.22966 • Published Jan 30
STEM: Scaling Transformers with Embedding Modules

Paper • 2601.10639 • Published Jan 15 • 2
Deep Delta Learning

Paper • 2601.00417 • Published Jan 1 • 34
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 322

High Low Media's AI/ML bookshelf

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Paper • 1502.01852 • Published Feb 6, 2015 • 1
Deep Residual Learning for Image Recognition

Paper • 1512.03385 • Published Dec 10, 2015 • 16
Focal Loss for Dense Object Detection

Paper • 1708.02002 • Published Aug 7, 2017
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers

Paper • 2409.20537 • Published Sep 30, 2024 • 13

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13, 2025 • 37
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published Feb 20, 2025 • 175
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3, 2025 • 32

Fun journal papers Ive read

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published Mar 5, 2025 • 233
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20, 2025 • 96

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published Mar 5, 2025 • 233
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18, 2025 • 154
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published Mar 14, 2025 • 148

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

Previous
1
2
3
4
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs