YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
GGUF Wave Benchmark Automation (Mac mini)
This repository contains the exact scripts and datasets used to run sequential GGUF wave benchmarking on a Mac mini with llama.cpp.
Contents
scripts/gguf_autopilot.py: end-to-end pipeline (download -> perf -> quality -> upload -> cleanup)scripts/mlx_autopilot-mlx.py: end-to-end MLX pipeline (mlx_lm.server+ perf + quality + upload artifacts)scripts/bench_mlx_once-mlx.py: OpenAI-compatible streaming benchmark helper for MLX serverscripts/start_mlx_autopilot-mlx.sh: launcher for MLX autopilotscripts/wave_files.sh: wave model listsscripts/download_wave_batches.sh: manual batch downloaderscripts/adaptive_wave_download.py: adaptive downloader by free diskscripts/bench_gguf_once.py: OpenAI-compatible streaming benchmark helperscripts/context_stress_gguf.sh: context stress helperdatasets/gsm8k_main_test.jsonl: GSM8K subset inputdatasets/mmlu_subset10_test_combined.jsonl: MMLU subset inputCLAUDE.md: operational runbook and Claude instructionsall_models_benchmark-mlx.csv: consolidated MLX run results (with status)all_models_benchmark_compat-mlx.csv: MLX results in compatibility schema
Quick Start
- Install base dependencies:
python3 -m pip install --user -r requirements.txt
Ensure llama.cpp server binary is installed at
/opt/homebrew/bin/llama-server.Copy dataset files to benchmark path expected by autopilot:
mkdir -p ~/benchmark-datasets
cp datasets/gsm8k_main_test.jsonl ~/benchmark-datasets/
cp datasets/mmlu_subset10_test_combined.jsonl ~/benchmark-datasets/
Place scripts in
~/on the benchmark host or run from this repo path and adjust script paths as needed.Start autopilot:
export HF_TOKEN=YOUR_HF_TOKEN
bash scripts/start_gguf_autopilot.sh
- Monitor:
tail -f ~/.gguf-autopilot/autopilot.log
python3 scripts/gguf_autopilot_monitor.py
MLX Autopilot
- Install and activate MLX runtime on host (example path used in this run):
/opt/mlx-env/bin/python -m pip install -U mlx-lm
- Ensure benchmark datasets exist at:
~/benchmark-datasets/gsm8k_main_test.jsonl
~/benchmark-datasets/mmlu_subset10_test_combined.jsonl
- Start MLX autopilot:
export HF_TOKEN=YOUR_HF_TOKEN
export HF_OWNER=your_hf_username
bash scripts/start_mlx_autopilot-mlx.sh
- Monitor progress/state:
tail -f ~/.mlx-autopilot/autopilot.log
cat ~/.mlx-autopilot/state.json
- Outputs generated during run:
- perf runs in
~/auto-bench-mlx/wave*-phase*-*/perf.tsv - quality runs in
~/auto-bench-mlx/*quality-*/summary.tsv - consolidated CSV in this repo:
all_models_benchmark-mlx.csv - compatibility CSV in this repo:
all_models_benchmark_compat-mlx.csv
Notes from this run:
- Final result:
37/43done,6/43failed permanent. LFM2.5-VL-1.6BMLX repos failed undermlx_lm.server(lfm2_vltype). These requiremlx-vlm, not plainmlx-lm.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support