YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

GGUF Wave Benchmark Automation (Mac mini)

This repository contains the exact scripts and datasets used to run sequential GGUF wave benchmarking on a Mac mini with llama.cpp.

Contents

  • scripts/gguf_autopilot.py: end-to-end pipeline (download -> perf -> quality -> upload -> cleanup)
  • scripts/mlx_autopilot-mlx.py: end-to-end MLX pipeline (mlx_lm.server + perf + quality + upload artifacts)
  • scripts/bench_mlx_once-mlx.py: OpenAI-compatible streaming benchmark helper for MLX server
  • scripts/start_mlx_autopilot-mlx.sh: launcher for MLX autopilot
  • scripts/wave_files.sh: wave model lists
  • scripts/download_wave_batches.sh: manual batch downloader
  • scripts/adaptive_wave_download.py: adaptive downloader by free disk
  • scripts/bench_gguf_once.py: OpenAI-compatible streaming benchmark helper
  • scripts/context_stress_gguf.sh: context stress helper
  • datasets/gsm8k_main_test.jsonl: GSM8K subset input
  • datasets/mmlu_subset10_test_combined.jsonl: MMLU subset input
  • CLAUDE.md: operational runbook and Claude instructions
  • all_models_benchmark-mlx.csv: consolidated MLX run results (with status)
  • all_models_benchmark_compat-mlx.csv: MLX results in compatibility schema

Quick Start

  1. Install base dependencies:
python3 -m pip install --user -r requirements.txt
  1. Ensure llama.cpp server binary is installed at /opt/homebrew/bin/llama-server.

  2. Copy dataset files to benchmark path expected by autopilot:

mkdir -p ~/benchmark-datasets
cp datasets/gsm8k_main_test.jsonl ~/benchmark-datasets/
cp datasets/mmlu_subset10_test_combined.jsonl ~/benchmark-datasets/
  1. Place scripts in ~/ on the benchmark host or run from this repo path and adjust script paths as needed.

  2. Start autopilot:

export HF_TOKEN=YOUR_HF_TOKEN
bash scripts/start_gguf_autopilot.sh
  1. Monitor:
tail -f ~/.gguf-autopilot/autopilot.log
python3 scripts/gguf_autopilot_monitor.py

MLX Autopilot

  1. Install and activate MLX runtime on host (example path used in this run):
/opt/mlx-env/bin/python -m pip install -U mlx-lm
  1. Ensure benchmark datasets exist at:
~/benchmark-datasets/gsm8k_main_test.jsonl
~/benchmark-datasets/mmlu_subset10_test_combined.jsonl
  1. Start MLX autopilot:
export HF_TOKEN=YOUR_HF_TOKEN
export HF_OWNER=your_hf_username
bash scripts/start_mlx_autopilot-mlx.sh
  1. Monitor progress/state:
tail -f ~/.mlx-autopilot/autopilot.log
cat ~/.mlx-autopilot/state.json
  1. Outputs generated during run:
  • perf runs in ~/auto-bench-mlx/wave*-phase*-*/perf.tsv
  • quality runs in ~/auto-bench-mlx/*quality-*/summary.tsv
  • consolidated CSV in this repo: all_models_benchmark-mlx.csv
  • compatibility CSV in this repo: all_models_benchmark_compat-mlx.csv

Notes from this run:

  • Final result: 37/43 done, 6/43 failed permanent.
  • LFM2.5-VL-1.6B MLX repos failed under mlx_lm.server (lfm2_vl type). These require mlx-vlm, not plain mlx-lm.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support