tritesh
/

dflash-mlx-universal

@@ -25,6 +25,7 @@ license: mit
 [![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://python.org)
 [![MLX](https://img.shields.io/badge/MLX-latest-red)](https://github.com/ml-explore/mlx)
 [![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
 ---
@@ -58,11 +59,52 @@ This is a **major rewrite** that fixes the critical gaps in earlier community po
 | **Model conversion** | PyTorch→MLX weight converter | ✅ **Updated for all z-lab drafters** |
 | **Training** | Basic trainer | ✅ **Architecture-aware training** with adapter compatibility |
 | **Benchmarking** | None | ✅ **Built-in benchmark** vs mlx_lm baseline |
 ---
 ## 📦 Installation
 ```bash
 pip install mlx-lm dflash-mlx-universal
 ```
@@ -151,6 +193,11 @@ python -m dflash_mlx.convert \
     --model z-lab/Qwen3-4B-DFlash-b16 \
     --output ./Qwen3-4B-DFlash-mlx
 # Or in Python
 from dflash_mlx.convert import convert_dflash_to_mlx
@@ -256,6 +303,13 @@ python -m dflash_mlx.serve \
     --block-size 16 \
     --port 8000
 # Query with curl
 curl http://localhost:8000/v1/chat/completions \
   -H "Content-Type: application/json" \
@@ -351,9 +405,12 @@ dflash-mlx-universal/
 │   └── test_adapters.py         # Adapter tests (NEW)
 ├── benchmark_m2.py              # Apple Silicon benchmark
 ├── setup_m2.sh                  # Automated setup script
 ├── M2_PRO_MAX_GUIDE.md          # Detailed M2 Pro Max guide
 ├── README.md                    # This file
-└── pyproject.toml               # Package configuration
 ```
 ---
@@ -361,15 +418,16 @@ dflash-mlx-universal/
 ## 🧪 Testing
 ```bash
-# Run all tests
-pytest tests/
-# Run specific test modules
 pytest tests/test_adapters.py -v
 pytest tests/test_model.py -v
-# Run with coverage
-pytest --cov=dflash_mlx tests/
 ```
 ---
@@ -456,3 +514,4 @@ MIT License — same as the original DFlash project.
 **Get 6× faster LLM inference on Apple Silicon today!** 🚀
 > *Tested on M2/M3/M4 Pro/Max/Ultra with mlx-lm 0.24+.*

 [![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://python.org)
 [![MLX](https://img.shields.io/badge/MLX-latest-red)](https://github.com/ml-explore/mlx)
 [![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
+[![uv](https://img.shields.io/badge/uv-astral-purple)](https://github.com/astral-sh/uv)
 ---
 | **Model conversion** | PyTorch→MLX weight converter | ✅ **Updated for all z-lab drafters** |
 | **Training** | Basic trainer | ✅ **Architecture-aware training** with adapter compatibility |
 | **Benchmarking** | None | ✅ **Built-in benchmark** vs mlx_lm baseline |
+| **uv support** | pip only | ✅ **`uv` + `uv run` workflow** with lock files |
 ---
 ## 📦 Installation
+### Option 1: `uv` (Recommended — ultra-fast, reproducible)
+[`uv`](https://github.com/astral-sh/uv) is an extremely fast Python package manager written in Rust. It's the **recommended** way to install on macOS.
+```bash
+# 1. Install uv (one-time)
+brew install uv
+# or: curl -LsSf https://astral.sh/uv/install.sh | sh
+# 2. Clone and setup
+git clone https://huggingface.co/tritesh/dflash-mlx-universal.git
+cd dflash-mlx-universal
+# 3. One-command setup (creates venv, installs deps, locks)
+chmod +x setup_uv.sh
+./setup_uv.sh
+# Or manually:
+uv venv
+uv pip install -e ".[dev,server]"
+uv lock
+```
+**Why `uv`?**
+- 10-100× faster than `pip` (written in Rust)
+- Automatic virtual environment management
+- Lock file (`uv.lock`) for reproducible installs
+- `uv run` — run any script without activating venv manually
+```bash
+# Examples of uv workflow
+uv run python examples/qwen3_4b_demo.py
+uv run pytest tests/ -v
+uv run python -m dflash_mlx.serve --target ... --draft ... --port 8000
+uv run black dflash_mlx/
+uv run ruff check dflash_mlx/
+```
+### Option 2: pip (Classic)
 ```bash
 pip install mlx-lm dflash-mlx-universal
 ```
     --model z-lab/Qwen3-4B-DFlash-b16 \
     --output ./Qwen3-4B-DFlash-mlx
+# Or with uv (recommended)
+uv run python -m dflash_mlx.convert \
+    --model z-lab/Qwen3-4B-DFlash-b16 \
+    --output ./Qwen3-4B-DFlash-mlx
 # Or in Python
 from dflash_mlx.convert import convert_dflash_to_mlx
     --block-size 16 \
     --port 8000
+# With uv (recommended)
+uv run python -m dflash_mlx.serve \
+    --target mlx-community/Qwen3.5-9B-4bit \
+    --draft ./Qwen3.5-9B-DFlash-mlx \
+    --block-size 16 \
+    --port 8000
 # Query with curl
 curl http://localhost:8000/v1/chat/completions \
   -H "Content-Type: application/json" \
 │   └── test_adapters.py         # Adapter tests (NEW)
 ├── benchmark_m2.py              # Apple Silicon benchmark
 ├── setup_m2.sh                  # Automated setup script
+├── setup_uv.sh                  # ✅ UV setup script (NEW v0.2.0)
+├── .python-version              # Python version pin for uv
+├── USAGE_GUIDE.md               # Detailed usage guide
 ├── M2_PRO_MAX_GUIDE.md          # Detailed M2 Pro Max guide
 ├── README.md                    # This file
+└── pyproject.toml               # Package configuration (with uv support)
 ```
 ---
 ## 🧪 Testing
 ```bash
+# With uv (recommended)
+uv run pytest tests/
+uv run pytest tests/test_adapters.py -v
+uv run pytest tests/test_model.py -v
+uv run pytest --cov=dflash_mlx tests/
+# Classic pip
+pytest tests/
 pytest tests/test_adapters.py -v
 pytest tests/test_model.py -v
 ```
 ---
 **Get 6× faster LLM inference on Apple Silicon today!** 🚀
 > *Tested on M2/M3/M4 Pro/Max/Ultra with mlx-lm 0.24+.*
+```