MixtureVitae

non-profit

AI & ML interests

None defined yet.

Recent Activity

cabbage972 authored a paper about 7 hours ago

GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities

cabbage972 authored a paper about 7 hours ago

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

cabbage972 authored a paper about 7 hours ago

Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics

View all activity

authored 4 papers about 7 hours ago

GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities

Paper • 2507.12367 • Published Jul 16, 2025 • 7

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Paper • 2509.25531 • Published Sep 29, 2025 • 10

Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics

Paper • 2603.01209 • Published Mar 1

FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration

Paper • 2510.04852 • Published Oct 13, 2025

updated a dataset 3 months ago

mixture-vitae/long_sft-decontaminated

Viewer • Updated Jan 13 • 51.7k • 11

published a dataset 3 months ago

mixture-vitae/long_sft-decontaminated

Viewer • Updated Jan 13 • 51.7k • 11

updated a dataset 3 months ago

mixture-vitae/llama-3.1-tulu-3-8b-preference-mixture-decontaminated

Viewer • Updated Jan 13 • 273k • 112

published a dataset 3 months ago

mixture-vitae/llama-3.1-tulu-3-8b-preference-mixture-decontaminated

Viewer • Updated Jan 13 • 273k • 112

updated a dataset 3 months ago

mixture-vitae/tulu-3-sft-mixture-decontaminated

Viewer • Updated Jan 13 • 937k • 31

published a dataset 3 months ago

mixture-vitae/tulu-3-sft-mixture-decontaminated

Viewer • Updated Jan 13 • 937k • 31

TieuDaoChanNhan

updated a dataset 4 months ago

mixture-vitae/MixtureVitae-Omni

Updated Dec 6, 2025 • 13

updated a dataset 5 months ago

mixture-vitae-backup/MixtureVitae-2TT

Viewer • Updated 7 days ago • 418k • 51 • 3

updated a dataset 5 months ago

pythonformer/function-calling-preview

Viewer • Updated Nov 6, 2025 • 47k • 1

published a dataset 5 months ago

pythonformer/function-calling-preview

Viewer • Updated Nov 6, 2025 • 47k • 1

TieuDaoChanNhan

updated a dataset 5 months ago

mixture-vitae-backup/MixtureVitae-2TT

Viewer • Updated 7 days ago • 418k • 51 • 3

updated a dataset 5 months ago

pythonformer/funcall-combined

Viewer • Updated Nov 6, 2025 • 3.94M • 3

published a dataset 6 months ago

pythonformer/funcall-combined

Viewer • Updated Nov 6, 2025 • 3.94M • 3

updated a dataset 6 months ago

pythonformer/self-oss-with-think-error-recovery

Viewer • Updated Oct 29, 2025 • 309k • 2

published a dataset 6 months ago

pythonformer/self-oss-with-think-error-recovery

Viewer • Updated Oct 29, 2025 • 309k • 2

TieuDaoChanNhan

authored a paper 6 months ago

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Paper • 2509.25531 • Published Sep 29, 2025 • 10