gemma-4-31B-uncensored-heretic · MLX 8-bit MLX conversion of llmfan46/gemma-4-31B-it-uncensored-heretic, a fine-tune of Google's Gemma 4 31B Instruct. Quantized to ~8.6 bits per weight using mlx-vlm v0.4.3 on Apple Silicon. Performance on Apple M4 Max · 128 GB

Peak memory: ~34 GB Prompt throughput: ~20.6 tok/s Generation speed: ~14.5 tok/s

Requirements bashpip install -U mlx-vlm

Gemma 4 support requires mlx-vlm >= 0.4.3. Standard mlx-lm does not yet support the gemma4 architecture.

Usage Text only bashpython -m mlx_vlm generate
--model TxemAI/gemma-4-31B-uncensored-heretic-mlx-8bit
--prompt "Your prompt here"
--max-tokens 512 With image bashpython -m mlx_vlm generate
--model TxemAI/gemma-4-31B-uncensored-heretic-mlx-8bit
--prompt "Describe this image."
--image path/to/image.jpg
--max-tokens 512 Python API pythonfrom mlx_vlm import load, generate

model, processor = load("TxemAI/gemma-4-31B-uncensored-heretic-mlx-8bit")

response = generate( model, processor, prompt="Your prompt here", max_tokens=512, temperature=0.7, ) print(response) Memory requirements PrecisionVRAMBF16 (full)62 GBQ8 (this model)34 GBQ4~18 GB Notes

The model activates Gemma 4's thinking channel (<|channel>thought) on reasoning-heavy prompts — this is expected behaviour. The mel filter warning on load is harmless; it relates to the audio encoder and does not affect text or vision inference. Unofficial community conversion. For the original fine-tune see llmfan46/gemma-4-31B-it-uncensored-heretic.

Conversion bashpython -m mlx_vlm convert
--hf-path llmfan46/gemma-4-31B-it-uncensored-heretic
--mlx-path ./gemma-4-31B-uncensored-heretic-mlx-8bit
--quantize --q-bits 8 Credits

Google DeepMind — Gemma 4 base model llmfan46 — uncensored-heretic fine-tune ml-explore — MLX framework Blaizzy — mlx-vlm library

Downloads last month
1,877
Safetensors
Model size
9B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TxemAI/gemma-4-31B-uncensored-heretic-mlx-8bit

Quantized
(11)
this model

Collection including TxemAI/gemma-4-31B-uncensored-heretic-mlx-8bit