GGUF Wave Benchmark Automation (Mac mini)

This repository contains the exact scripts and datasets used to run sequential GGUF wave benchmarking on a Mac mini with llama.cpp.

scripts/gguf_autopilot.py: end-to-end pipeline (download -> perf -> quality -> upload -> cleanup)
scripts/mlx_autopilot-mlx.py: end-to-end MLX pipeline (mlx_lm.server + perf + quality + upload artifacts)
scripts/bench_mlx_once-mlx.py: OpenAI-compatible streaming benchmark helper for MLX server
scripts/start_mlx_autopilot-mlx.sh: launcher for MLX autopilot
scripts/wave_files.sh: wave model lists
scripts/download_wave_batches.sh: manual batch downloader
scripts/adaptive_wave_download.py: adaptive downloader by free disk
scripts/bench_gguf_once.py: OpenAI-compatible streaming benchmark helper
scripts/context_stress_gguf.sh: context stress helper
datasets/gsm8k_main_test.jsonl: GSM8K subset input
datasets/mmlu_subset10_test_combined.jsonl: MMLU subset input
CLAUDE.md: operational runbook and Claude instructions
all_models_benchmark-mlx.csv: consolidated MLX run results (with status)
all_models_benchmark_compat-mlx.csv: MLX results in compatibility schema

Quick Start

Install base dependencies:

python3 -m pip install --user -r requirements.txt

Ensure llama.cpp server binary is installed at /opt/homebrew/bin/llama-server.
Copy dataset files to benchmark path expected by autopilot:

mkdir -p ~/benchmark-datasets
cp datasets/gsm8k_main_test.jsonl ~/benchmark-datasets/
cp datasets/mmlu_subset10_test_combined.jsonl ~/benchmark-datasets/

Place scripts in ~/ on the benchmark host or run from this repo path and adjust script paths as needed.
Start autopilot:

export HF_TOKEN=YOUR_HF_TOKEN
bash scripts/start_gguf_autopilot.sh

Monitor:

tail -f ~/.gguf-autopilot/autopilot.log
python3 scripts/gguf_autopilot_monitor.py

MLX Autopilot

Install and activate MLX runtime on host (example path used in this run):

/opt/mlx-env/bin/python -m pip install -U mlx-lm

Ensure benchmark datasets exist at:

~/benchmark-datasets/gsm8k_main_test.jsonl
~/benchmark-datasets/mmlu_subset10_test_combined.jsonl

Start MLX autopilot:

export HF_TOKEN=YOUR_HF_TOKEN
export HF_OWNER=your_hf_username
bash scripts/start_mlx_autopilot-mlx.sh

Monitor progress/state:

tail -f ~/.mlx-autopilot/autopilot.log
cat ~/.mlx-autopilot/state.json

Outputs generated during run:

perf runs in ~/auto-bench-mlx/wave*-phase*-*/perf.tsv
quality runs in ~/auto-bench-mlx/*quality-*/summary.tsv
consolidated CSV in this repo: all_models_benchmark-mlx.csv
compatibility CSV in this repo: all_models_benchmark_compat-mlx.csv

Notes from this run:

Final result: 37/43 done, 6/43 failed permanent.
LFM2.5-VL-1.6B MLX repos failed under mlx_lm.server (lfm2_vl type). These require mlx-vlm, not plain mlx-lm.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Manojb
/

macmini-16gb-bench-gguf-mlx

GGUF Wave Benchmark Automation (Mac mini)

Contents

Quick Start

MLX Autopilot