--- library_name: mlx license: mit license_link: https://huggingface.co/microsoft/Phi-3.5-mini-instruct/resolve/main/LICENSE language: - multilingual pipeline_tag: text-generation base_model: microsoft/Phi-3.5-mini-instruct tags: - nlp - code - mlx - quantization - bias-evaluation - q6 --- # phi-3.5-mini-instruct-q6 (MLX, CBA artifact) MLX-format 6-bit (Q6) variant of [`microsoft/Phi-3.5-mini-instruct`](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). This is one of the **15 model artifacts** from the paper: > **Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels** > Plawan Kumar Rath, Rahul Maliakkal. *IEEE Cloud Summit 2026*. > Code: ## Quantization Weight-only post-training quantization via `mlx_lm.convert`: - **bits:** 6 - **group_size:** 64 - **mode:** affine ## How this artifact was produced ```bash python -m mlx_lm.convert \ --hf-path microsoft/Phi-3.5-mini-instruct \ --mlx-path ./phi-3.5-mini-instruct-q6 \ --quantize \ --q-bits 6 \ --q-group-size 64 ``` This is the **exact** artifact used to produce the inference results in §4.3 of the paper (911,100 records over BBQ ambiguous, 5 seeds × 12,148 items × 15 configs). ## Usage (MLX) ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("plawanrath/phi-3.5-mini-instruct-q6-mlx-cba") prompt = tokenizer.apply_chat_template( [{"role": "user", "content": "Hello!"}], add_generation_prompt=True, tokenize=False, ) print(generate(model, tokenizer, prompt=prompt, max_tokens=128)) ``` Or via CLI: ```bash mlx_lm.generate --model plawanrath/phi-3.5-mini-instruct-q6-mlx-cba --prompt "Hello!" ``` ## Paper findings relevant to this variant The paper documents a **dose-response** relationship between quantization aggressiveness and emergent stereotypical behavior on BBQ ambiguous questions: | Variant | % of BF16-unbiased items that became biased | |---|---| | Q8 | 0.1–0.9% | | Q6 | 0.3–1.3% | | Q4 | 2.2–5.6% | | Q3 | 6.0–21.1% | These changes are largely **invisible to perplexity** (<0.5% shift at Q8, <3% at Q4 across all three families). Treat any deployment of compressed instruction-tuned models on fairness-sensitive tasks accordingly. ## Model details - **Base model:** [`microsoft/Phi-3.5-mini-instruct`](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) - **Family:** Phi-3 - **Parameters:** 3.8B - **Precision:** 6-bit (Q6) - **Format:** MLX (Apple Silicon) - **Conversion framework:** [`mlx-lm`](https://github.com/ml-explore/mlx-lm) ## License Inherited from the base model (`mit`). See the upstream model page for the full license text. ## Citation ```bibtex @inproceedings{rath2026quantization, title = { Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels }, author = {Rath, Plawan Kumar and Maliakkal, Rahul}, booktitle = { IEEE Cloud Summit 2026 }, year = {2026} } ```