Mervyn1937 's Collections My Papers of Interest
updated
Self-Alignment with Instruction Backtranslation
Paper
• 2308.06259
• Published • 43
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free
Domain Adaptation
Paper
• 2308.03793
• Published • 12
From Sparse to Soft Mixtures of Experts
Paper
• 2308.00951
• Published • 22
Revisiting DETR Pre-training for Object Detection
Paper
• 2308.01300
• Published • 10
Unified Model for Image, Video, Audio and Language Tasks
Paper
• 2307.16184
• Published • 16
Scaling TransNormer to 175 Billion Parameters
Paper
• 2307.14995
• Published • 23
NeRF-Det: Learning Geometry-Aware Volumetric Representation for
Multi-View 3D Object Detection
Paper
• 2307.14620
• Published • 15
Less is More: Focus Attention for Efficient DETR
Paper
• 2307.12612
• Published • 7
Replacing softmax with ReLU in Vision Transformers
Paper
• 2309.08586
• Published • 19
A Distributed Data-Parallel PyTorch Implementation of the Distributed
Shampoo Optimizer for Training Neural Networks At-Scale
Paper
• 2309.06497
• Published • 7
Multimodal Foundation Models: From Specialists to General-Purpose
Assistants
Paper
• 2309.10020
• Published • 41
FoleyGen: Visually-Guided Audio Generation
Paper
• 2309.10537
• Published • 8
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
• 2309.10952
• Published • 67
RMT: Retentive Networks Meet Vision Transformers
Paper
• 2309.11523
• Published • 34
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper
• 2309.14525
• Published • 32
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of
Language Models
Paper
• 2309.15098
• Published • 7
Vision Transformers Need Registers
Paper
• 2309.16588
• Published • 86
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper
• 2309.16414
• Published • 18
Enable Language Models to Implicitly Learn Self-Improvement From Data
Paper
• 2310.00898
• Published • 24
Lemur: Harmonizing Natural Language and Code for Language Agents
Paper
• 2310.06830
• Published • 33
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper
• 2310.09199
• Published • 28
MiniGPT-v2: large language model as a unified interface for
vision-language multi-task learning
Paper
• 2310.09478
• Published • 21
Context-Aware Meta-Learning
Paper
• 2310.10971
• Published • 17
An Early Evaluation of GPT-4V(ision)
Paper
• 2310.16534
• Published • 22
Segment and Caption Anything
Paper
• 2312.00869
• Published • 20
OneLLM: One Framework to Align All Modalities with Language
Paper
• 2312.03700
• Published • 24
Scaling Reasoning, Losing Control: Evaluating Instruction Following in
Large Reasoning Models
Paper
• 2505.14810
• Published • 62
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with
Curiosity-Driven Reinforcement Learning
Paper
• 2505.15966
• Published • 53