Collections
Discover the best community collections!
Collections including paper arxiv:2507.08794
-
Robust Multimodal Large Language Models Against Modality Conflict
Paper • 2507.07151 • Published • 6 -
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 32 -
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 108 -
KV Cache Steering for Inducing Reasoning in Small Language Models
Paper • 2507.08799 • Published • 40
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.7k • 1.43k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 99 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 30 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 45 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 32 -
Self-Improving VLM Judges Without Human Annotations
Paper • 2512.05145 • Published • 20 -
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 63 -
Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation
Paper • 2604.02368 • Published • 12
-
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation
Paper • 2506.14028 • Published • 94 -
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 32 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 263
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 32 -
Self-Improving VLM Judges Without Human Annotations
Paper • 2512.05145 • Published • 20 -
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 63 -
Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation
Paper • 2604.02368 • Published • 12
-
Robust Multimodal Large Language Models Against Modality Conflict
Paper • 2507.07151 • Published • 6 -
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 32 -
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 108 -
KV Cache Steering for Inducing Reasoning in Small Language Models
Paper • 2507.08799 • Published • 40
-
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation
Paper • 2506.14028 • Published • 94 -
One Token to Fool LLM-as-a-Judge
Paper • 2507.08794 • Published • 32 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 263
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.7k • 1.43k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 99 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 30 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 45 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25