My Papers of Interest - a Mervyn1937 Collection

Mervyn1937 's Collections

Famous Authors/Institute

General Knowledge

My Papers of Interest

My Papers of Interest

updated May 26, 2025

Self-Alignment with Instruction Backtranslation

Paper • 2308.06259 • Published Aug 11, 2023 • 43
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation

Paper • 2308.03793 • Published Aug 4, 2023 • 12
From Sparse to Soft Mixtures of Experts

Paper • 2308.00951 • Published Aug 2, 2023 • 22
Revisiting DETR Pre-training for Object Detection

Paper • 2308.01300 • Published Aug 2, 2023 • 10
Unified Model for Image, Video, Audio and Language Tasks

Paper • 2307.16184 • Published Jul 30, 2023 • 16
Scaling TransNormer to 175 Billion Parameters

Paper • 2307.14995 • Published Jul 27, 2023 • 23
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection

Paper • 2307.14620 • Published Jul 27, 2023 • 15
Less is More: Focus Attention for Efficient DETR

Paper • 2307.12612 • Published Jul 24, 2023 • 7
Replacing softmax with ReLU in Vision Transformers

Paper • 2309.08586 • Published Sep 15, 2023 • 19
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Paper • 2309.06497 • Published Sep 12, 2023 • 7
Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paper • 2309.10020 • Published Sep 18, 2023 • 41
FoleyGen: Visually-Guided Audio Generation

Paper • 2309.10537 • Published Sep 19, 2023 • 8
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 67
RMT: Retentive Networks Meet Vision Transformers

Paper • 2309.11523 • Published Sep 20, 2023 • 34
Aligning Large Multimodal Models with Factually Augmented RLHF

Paper • 2309.14525 • Published Sep 25, 2023 • 32
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

Paper • 2309.15098 • Published Sep 26, 2023 • 7
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 86
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Paper • 2309.16414 • Published Sep 28, 2023 • 18
Enable Language Models to Implicitly Learn Self-Improvement From Data

Paper • 2310.00898 • Published Oct 2, 2023 • 24
Lemur: Harmonizing Natural Language and Code for Language Agents

Paper • 2310.06830 • Published Oct 10, 2023 • 33
PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Paper • 2310.09199 • Published Oct 13, 2023 • 28
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

Paper • 2310.09478 • Published Oct 14, 2023 • 21
Context-Aware Meta-Learning

Paper • 2310.10971 • Published Oct 17, 2023 • 17
An Early Evaluation of GPT-4V(ision)

Paper • 2310.16534 • Published Oct 25, 2023 • 22
Segment and Caption Anything

Paper • 2312.00869 • Published Dec 1, 2023 • 20
OneLLM: One Framework to Align All Modalities with Language

Paper • 2312.03700 • Published Dec 6, 2023 • 24
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Paper • 2505.14810 • Published May 20, 2025 • 62
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Paper • 2505.15966 • Published May 21, 2025 • 53