Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.13875

GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

Paper • 2603.13875 • Published Mar 14 • 35

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 152
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Paper • 2603.15594 • Published Mar 16 • 149
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published Mar 11 • 153
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published Mar 6 • 119

Don’t forget me

about 1 month ago

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published Oct 11, 2024 • 47
A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning

Paper • 2304.14856 • Published Apr 28, 2023 • 1
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

Paper • 2405.13792 • Published May 22, 2024 • 1
Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models

Paper • 2404.04522 • Published Apr 6, 2024

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Paper • 2510.18019 • Published Oct 20, 2025 • 18
PORTool: Tool-Use LLM Training with Rewarded Tree

Paper • 2510.26020 • Published Oct 29, 2025 • 5
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published Oct 28, 2025 • 4
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 41

XGen-7B Technical Report

Paper • 2309.03450 • Published Sep 7, 2023 • 8
FLM-101B: An Open LLM and How to Train It with $100K Budget

Paper • 2309.03852 • Published Sep 7, 2023 • 45
Robotic Table Tennis: A Case Study into a High Speed Learning System

Paper • 2309.03315 • Published Sep 6, 2023 • 7
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 82

FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use

Paper • 2603.08262 • Published Mar 9 • 42
On-Policy Context Distillation for Language Models

Paper • 2602.12275 • Published Feb 12 • 3
Online Experiential Learning for Language Models

Paper • 2603.16856 • Published Mar 17 • 58
Mixture-of-Depths Attention

Paper • 2603.15619 • Published Mar 16 • 80

Mem_ContLearn_ToCheck

GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

Paper • 2603.13875 • Published Mar 14 • 35

Papers that exist

about 1 month ago

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

Paper • 2509.15591 • Published Sep 19, 2025 • 45
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8, 2025 • 94
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

Paper • 2602.03120 • Published Feb 3 • 1
TADA! Tuning Audio Diffusion Models through Activation Steering

Paper • 2602.11910 • Published Feb 12 • 2

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 107
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 80
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 45

GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

Paper • 2603.13875 • Published Mar 14 • 35

FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use

Paper • 2603.08262 • Published Mar 9 • 42
On-Policy Context Distillation for Language Models

Paper • 2602.12275 • Published Feb 12 • 3
Online Experiential Learning for Language Models

Paper • 2603.16856 • Published Mar 17 • 58
Mixture-of-Depths Attention

Paper • 2603.15619 • Published Mar 16 • 80

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 152
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Paper • 2603.15594 • Published Mar 16 • 149
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published Mar 11 • 153
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published Mar 6 • 119

Mem_ContLearn_ToCheck

GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

Paper • 2603.13875 • Published Mar 14 • 35

Don’t forget me

about 1 month ago

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published Oct 11, 2024 • 47
A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning

Paper • 2304.14856 • Published Apr 28, 2023 • 1
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

Paper • 2405.13792 • Published May 22, 2024 • 1
Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models

Paper • 2404.04522 • Published Apr 6, 2024

Papers that exist

about 1 month ago

Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

Paper • 2509.15591 • Published Sep 19, 2025 • 45
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8, 2025 • 94
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

Paper • 2602.03120 • Published Feb 3 • 1
TADA! Tuning Audio Diffusion Models through Activation Steering

Paper • 2602.11910 • Published Feb 12 • 2

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Paper • 2510.18019 • Published Oct 20, 2025 • 18
PORTool: Tool-Use LLM Training with Rewarded Tree

Paper • 2510.26020 • Published Oct 29, 2025 • 5
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published Oct 28, 2025 • 4
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 41

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 107
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 80
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 45

XGen-7B Technical Report

Paper • 2309.03450 • Published Sep 7, 2023 • 8
FLM-101B: An Open LLM and How to Train It with $100K Budget

Paper • 2309.03852 • Published Sep 7, 2023 • 45
Robotic Table Tennis: A Case Study into a High Speed Learning System

Paper • 2309.03315 • Published Sep 6, 2023 • 7
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 82

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs