| --- |
| library_name: mlx |
| license: mit |
| license_link: https://huggingface.co/microsoft/Phi-3.5-mini-instruct/resolve/main/LICENSE |
| language: |
| - multilingual |
| pipeline_tag: text-generation |
| base_model: microsoft/Phi-3.5-mini-instruct |
| tags: |
| - nlp |
| - code |
| - mlx |
| - quantization |
| - bias-evaluation |
| - q4 |
| --- |
| |
| # phi-3.5-mini-instruct-q4 (MLX, CBA artifact) |
|
|
| MLX-format 4-bit (Q4) variant of [`microsoft/Phi-3.5-mini-instruct`](https://huggingface.co/microsoft/Phi-3.5-mini-instruct). |
|
|
| This is one of the **15 model artifacts** from the paper: |
|
|
| > **Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels** |
| > Plawan Kumar Rath, Rahul Maliakkal. *IEEE Cloud Summit 2026*. |
| > Code: <https://github.com/plawanrath/compression-bias-amplification> |
|
|
| ## Quantization |
|
|
| Weight-only post-training quantization via `mlx_lm.convert`: |
|
|
| - **bits:** 4 |
| - **group_size:** 64 |
| - **mode:** affine |
| |
| |
| ## How this artifact was produced |
| |
| ```bash |
| python -m mlx_lm.convert \ |
| --hf-path microsoft/Phi-3.5-mini-instruct \ |
| --mlx-path ./phi-3.5-mini-instruct-q4 \ |
| --quantize \ |
| --q-bits 4 \ |
| --q-group-size 64 |
| ``` |
| |
| This is the **exact** artifact used to produce the inference results in §4.3 of the paper (911,100 records over BBQ ambiguous, 5 seeds × 12,148 items × 15 configs). |
|
|
| ## Usage (MLX) |
|
|
| ```bash |
| pip install mlx-lm |
| ``` |
|
|
| ```python |
| from mlx_lm import load, generate |
| |
| model, tokenizer = load("plawanrath/phi-3.5-mini-instruct-q4-mlx-cba") |
| prompt = tokenizer.apply_chat_template( |
| [{"role": "user", "content": "Hello!"}], |
| add_generation_prompt=True, |
| tokenize=False, |
| ) |
| print(generate(model, tokenizer, prompt=prompt, max_tokens=128)) |
| ``` |
|
|
| Or via CLI: |
|
|
| ```bash |
| mlx_lm.generate --model plawanrath/phi-3.5-mini-instruct-q4-mlx-cba --prompt "Hello!" |
| ``` |
|
|
| ## Paper findings relevant to this variant |
|
|
| The paper documents a **dose-response** relationship between quantization aggressiveness and emergent stereotypical behavior on BBQ ambiguous questions: |
|
|
| | Variant | % of BF16-unbiased items that became biased | |
| |---|---| |
| | Q8 | 0.1–0.9% | |
| | Q6 | 0.3–1.3% | |
| | Q4 | 2.2–5.6% | |
| | Q3 | 6.0–21.1% | |
|
|
| These changes are largely **invisible to perplexity** (<0.5% shift at Q8, <3% at Q4 across all three families). Treat any deployment of compressed instruction-tuned models on fairness-sensitive tasks accordingly. |
|
|
| ## Model details |
|
|
| - **Base model:** [`microsoft/Phi-3.5-mini-instruct`](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) |
| - **Family:** Phi-3 |
| - **Parameters:** 3.8B |
| - **Precision:** 4-bit (Q4) |
| - **Format:** MLX (Apple Silicon) |
| - **Conversion framework:** [`mlx-lm`](https://github.com/ml-explore/mlx-lm) |
|
|
| ## License |
|
|
| Inherited from the base model (`mit`). See the upstream model page for the full license text. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{rath2026quantization, |
| title = { Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels }, |
| author = {Rath, Plawan Kumar and Maliakkal, Rahul}, |
| booktitle = { IEEE Cloud Summit 2026 }, |
| year = {2026} |
| } |
| ``` |
|
|