innovation64 's Collections papaer selecting
updated
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
• 2402.14083
• Published • 47
Linear Transformers are Versatile In-Context Learners
Paper
• 2402.14180
• Published • 7
Training-Free Long-Context Scaling of Large Language Models
Paper
• 2402.17463
• Published • 24
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published • 628
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper
• 2402.17753
• Published • 19
Resonance RoPE: Improving Context Length Generalization of Large
Language Models
Paper
• 2403.00071
• Published • 24
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
• 2403.03853
• Published • 66
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published • 190
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
• 2403.03163
• Published • 98
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper
• 2309.08968
• Published • 24
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
• 2403.09611
• Published • 129
Evaluating Frontier Models for Dangerous Capabilities
Paper
• 2403.13793
• Published • 7
The Unreasonable Ineffectiveness of the Deeper Layers
Paper
• 2403.17887
• Published • 82
Clover: Regressive Lightweight Speculative Decoding with Sequential
Knowledge
Paper
• 2405.00263
• Published • 16
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model
Editing with Llama-3
Paper
• 2405.00664
• Published • 20
Prometheus 2: An Open Source Language Model Specialized in Evaluating
Other Language Models
Paper
• 2405.01535
• Published • 124
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Paper
• 2405.09215
• Published • 22
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in
Language Models
Paper
• 2405.09220
• Published • 27
LoRA Learns Less and Forgets Less
Paper
• 2405.09673
• Published • 91
Layer-Condensed KV Cache for Efficient Inference of Large Language
Models
Paper
• 2405.10637
• Published • 22
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
• 2405.12130
• Published • 50
2BP: 2-Stage Backpropagation
Paper
• 2405.18047
• Published • 26
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
Reference Models
Paper
• 2405.20541
• Published • 24
4-bit Shampoo for Memory-Efficient Network Training
Paper
• 2405.18144
• Published • 12
Transformers meet Neural Algorithmic Reasoners
Paper
• 2406.09308
• Published • 44
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Paper
• 2406.09170
• Published • 27