Initial upload (MLX artifact for IEEE Cloud Summit 2026 paper)

ce32349 verified 6 days ago

3.07 kB

	---
	library_name: mlx
	license: mit
	license_link: https://huggingface.co/microsoft/Phi-3.5-mini-instruct/resolve/main/LICENSE
	language:
	- multilingual
	pipeline_tag: text-generation
	base_model: microsoft/Phi-3.5-mini-instruct
	tags:
	- nlp
	- code
	- mlx
	- quantization
	- bias-evaluation
	- q4
	---

	# phi-3.5-mini-instruct-q4 (MLX, CBA artifact)

	MLX-format 4-bit (Q4) variant of [`microsoft/Phi-3.5-mini-instruct`](https://huggingface.co/microsoft/Phi-3.5-mini-instruct).

	This is one of the 15 model artifacts from the paper:

	> Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels
	> Plawan Kumar Rath, Rahul Maliakkal. IEEE Cloud Summit 2026.
	> Code: <https://github.com/plawanrath/compression-bias-amplification>

	## Quantization

	Weight-only post-training quantization via `mlx_lm.convert`:

	- bits: 4
	- group_size: 64
	- mode: affine


	## How this artifact was produced

	```bash
	python -m mlx_lm.convert \
	--hf-path microsoft/Phi-3.5-mini-instruct \
	--mlx-path ./phi-3.5-mini-instruct-q4 \
	--quantize \
	--q-bits 4 \
	--q-group-size 64
	```

	This is the exact artifact used to produce the inference results in §4.3 of the paper (911,100 records over BBQ ambiguous, 5 seeds × 12,148 items × 15 configs).

	## Usage (MLX)

	```bash
	pip install mlx-lm
	```

	```python
	from mlx_lm import load, generate

	model, tokenizer = load("plawanrath/phi-3.5-mini-instruct-q4-mlx-cba")
	prompt = tokenizer.apply_chat_template(
	[{"role": "user", "content": "Hello!"}],
	add_generation_prompt=True,
	tokenize=False,
	)
	print(generate(model, tokenizer, prompt=prompt, max_tokens=128))
	```

	Or via CLI:

	```bash
	mlx_lm.generate --model plawanrath/phi-3.5-mini-instruct-q4-mlx-cba --prompt "Hello!"
	```

	## Paper findings relevant to this variant

	The paper documents a dose-response relationship between quantization aggressiveness and emergent stereotypical behavior on BBQ ambiguous questions:

	\| Variant \| % of BF16-unbiased items that became biased \|
	\|---\|---\|
	\| Q8 \| 0.1–0.9% \|
	\| Q6 \| 0.3–1.3% \|
	\| Q4 \| 2.2–5.6% \|
	\| Q3 \| 6.0–21.1% \|

	These changes are largely invisible to perplexity (<0.5% shift at Q8, <3% at Q4 across all three families). Treat any deployment of compressed instruction-tuned models on fairness-sensitive tasks accordingly.

	## Model details

	- Base model: [`microsoft/Phi-3.5-mini-instruct`](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)
	- Family: Phi-3
	- Parameters: 3.8B
	- Precision: 4-bit (Q4)
	- Format: MLX (Apple Silicon)
	- Conversion framework: [`mlx-lm`](https://github.com/ml-explore/mlx-lm)

	## License

	Inherited from the base model (`mit`). See the upstream model page for the full license text.

	## Citation

	```bibtex
	@inproceedings{rath2026quantization,
	title = { Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels },
	author = {Rath, Plawan Kumar and Maliakkal, Rahul},
	booktitle = { IEEE Cloud Summit 2026 },
	year = {2026}
	}
	```