Databricks

company

Verified

https://www.databricks.com

AI & ML interests

None defined yet.

Papers

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

KARL: Knowledge Agents via Reinforcement Learning

View all Papers

authored a paper 6 months ago

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning

Paper • 2509.06888 • Published Sep 8, 2025 • 14

authored a paper 9 months ago

Seq vs Seq: An Open Suite of Paired Encoders and Decoders

Paper • 2507.11412 • Published Jul 15, 2025 • 31

authored a paper 12 months ago

Certified Mitigation of Worst-Case LLM Copyright Infringement

Paper • 2504.16046 • Published Apr 22, 2025 • 13

authored 4 papers about 1 year ago

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Paper • 2404.03862 • Published Apr 5, 2024

AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees

Paper • 2404.08417 • Published Apr 12, 2024 • 2

Data Portraits: Recording Foundation Model Training Data

Paper • 2303.03919 • Published Mar 6, 2023

Dated Data: Tracing Knowledge Cutoffs in Large Language Models

Paper • 2403.12958 • Published Mar 19, 2024

in databricks/databricks-dolly-15k about 1 year ago

Your employees were clearly bored

#18 opened about 1 year ago by

authored a paper over 1 year ago

Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates

Paper • 2206.00832 • Published Jun 2, 2022

authored a paper almost 2 years ago

Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

Paper • 2405.20541 • Published May 30, 2024 • 24

authored a paper almost 2 years ago

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15, 2024 • 91

updated a Space about 2 years ago

README

authored 2 papers about 2 years ago

LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms

Paper • 2311.13133 • Published Nov 22, 2023

MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining

Paper • 2312.17482 • Published Dec 29, 2023 • 1

authored 5 papers about 2 years ago

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Paper • 2402.10176 • Published Feb 15, 2024 • 38

Learning to Reason and Memorize with Self-Notes

Paper • 2305.00833 • Published May 1, 2023 • 5

On Generalization in Coreference Resolution

Paper • 2109.09667 • Published Sep 20, 2021

Chess as a Testbed for Language Model State Tracking

Paper • 2102.13249 • Published Feb 26, 2021

Efficient and Interpretable Neural Models for Entity Tracking

Paper • 2208.14252 • Published Aug 30, 2022

authored a paper about 2 years ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 156