-
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 25 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 60 -
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Paper • 2403.09704 • Published • 32 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72
Collections
Discover the best community collections!
Collections including paper arxiv:2403.19270
-
On the Societal Impact of Open Foundation Models
Paper • 2403.07918 • Published • 17 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Hallucinations or Attention Misdirection? The Path to Strategic Value Extraction in Business Using Large Language Models
Paper • 2402.14002 • Published -
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Paper • 2306.05949 • Published • 10
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 61 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 46
-
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 90 -
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Paper • 2406.11839 • Published • 40
-
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 90 -
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 12
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 72 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 48 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 190 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 183 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34
-
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 25 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 60 -
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Paper • 2403.09704 • Published • 32 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72
-
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 90 -
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Paper • 2406.11839 • Published • 40
-
On the Societal Impact of Open Foundation Models
Paper • 2403.07918 • Published • 17 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Hallucinations or Attention Misdirection? The Path to Strategic Value Extraction in Business Using Large Language Models
Paper • 2402.14002 • Published -
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Paper • 2306.05949 • Published • 10
-
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 90 -
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 12
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 72 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 48 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 61 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 46
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 190 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 183 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34