Image-Text-to-Text
MLX
Safetensors
English
Polish
multilingual
gemma4
apple-silicon
gemma
gemma-4
abliterated
uncensored
Mixture of Experts
multimodal
vision
vmlx
nvfp4
4bit
quantized
huihui
conversational
4-bit precision
Instructions to use LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4") config = load_config("LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4
Run Hermes
hermes
card: full rewrite from canonical template
Browse files
README.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
-
license_link: https://huggingface.co/huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated/blob/main/LICENSE
|
| 4 |
-
base_model: huihui-ai/Huihui4-48B-A4B-abliterated
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
- pl
|
| 8 |
- multilingual
|
|
|
|
|
|
|
| 9 |
library_name: mlx
|
| 10 |
pipeline_tag: image-text-to-text
|
| 11 |
tags:
|
|
@@ -24,150 +24,118 @@ tags:
|
|
| 24 |
- nvfp4
|
| 25 |
- 4bit
|
| 26 |
- quantized
|
|
|
|
|
|
|
| 27 |
---
|
| 28 |
|
| 29 |
-
# Huihui4-48B-A4B
|
| 30 |
|
| 31 |
-
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
| 36 |
|
| 37 |
-
##
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
| Total parameters | ~48 B |
|
| 44 |
-
| Activated parameters| ~4 B per token |
|
| 45 |
-
| Quantization | NVFP4 (NVIDIA FP4 layout, per-tensor scale + per-block exponent) |
|
| 46 |
-
| Bits / weight | ~4.4 |
|
| 47 |
-
| Size on disk | **27 GB** |
|
| 48 |
-
| Cold load (M3 Ultra)| **~31 s** |
|
| 49 |
-
| TTFT (text) | ~0.3 s |
|
| 50 |
-
| Modalities | text in / text out, image in (JPEG/PNG), audio-aware tokenizer |
|
| 51 |
|
| 52 |
-
##
|
| 53 |
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
-
|
| 57 |
-
- **`NVFP4`** (NVIDIA FP4) β per-tensor scale combined with per-block exponent, retains more dynamic range on dense matmuls.
|
| 58 |
-
|
| 59 |
-
In identical end-to-end probes against the [`fp16` parity baseline](https://huggingface.co/LibraxisAI/Huihui4-48B-A4B-vmlx-fp16), `nvfp4` matched output quality more closely on vision (1043 vs. 980 chars on JPEG probe) at near-identical load time. For Apple Silicon serving where you want the smallest practical 4-bit checkpoint without giving up image-grounded fidelity, this is the variant to evaluate first.
|
| 60 |
-
|
| 61 |
-
## Model details
|
| 62 |
-
|
| 63 |
-
| Property | Value |
|
| 64 |
-
|-----------------------|--------------------------------------------------------|
|
| 65 |
-
| Format | MLX, sharded safetensors |
|
| 66 |
-
| Quantization config | NVFP4 (FP4 + per-tensor scale + per-block exponent) |
|
| 67 |
-
| Tokenizer | Inherited from base, `chat_template.jinja` included |
|
| 68 |
-
| Special tokens | `<|video|>` (32 frames default), `<image>`, audio markers |
|
| 69 |
-
| Image processor | Dual-resolution Gemma 4 (low + hi-res patches) |
|
| 70 |
-
| Audio extractor | Multi-bank mel filter (128 mel Γ 257 freq) |
|
| 71 |
-
| License | Apache 2.0 (inherited from `huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated`) |
|
| 72 |
-
|
| 73 |
-
## Runtime compatibility
|
| 74 |
-
|
| 75 |
-
This quantized MLX build includes the Gemma 4 vision projection compatibility tensor `embed_vision.embedding_projection.biases`, so current MLX loaders that require the quantized projection bias can load the checkpoint cleanly. The MXFP8 variant was smoke-tested in LM Studio, and MXFP4/MXFP8/NVFP4 were patched with the same compatibility pattern.
|
| 76 |
-
|
| 77 |
-
## Other variants
|
| 78 |
-
|
| 79 |
-
| Variant | Bits/weight | Size on disk | Cold load | When to use |
|
| 80 |
-
|-----------------------------------------------------------------|-------------|--------------|-----------|-------------|
|
| 81 |
-
| [`Huihui4-48B-A4B-vmlx-fp16`](https://huggingface.co/LibraxisAI/Huihui4-48B-A4B-vmlx-fp16) | 16 | 91 GB | ~99 s | parity baseline, golden eval |
|
| 82 |
-
| [`Huihui4-48B-A4B-vmlx-mxfp8`](https://huggingface.co/LibraxisAI/Huihui4-48B-A4B-vmlx-mxfp8) | ~8.5 | 47 GB | ~55 s | balanced production target |
|
| 83 |
-
| [`Huihui4-48B-A4B-vmlx-mxfp4`](https://huggingface.co/LibraxisAI/Huihui4-48B-A4B-vmlx-mxfp4) | ~4.4 | 25 GB | ~29 s | mainstream, 32 GB Macs |
|
| 84 |
-
| **`Huihui4-48B-A4B-vmlx-nvfp4`** (this) | ~4.4 | 27 GB | ~31 s | **NVIDIA-style FP4 sleeper, higher quality at same footprint as mxfp4** |
|
| 85 |
|
| 86 |
## Usage
|
| 87 |
|
| 88 |
-
###
|
| 89 |
|
| 90 |
```bash
|
| 91 |
pip install mlx-vlm
|
| 92 |
|
| 93 |
python -m mlx_vlm.generate \
|
| 94 |
--model LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4 \
|
| 95 |
-
--image
|
| 96 |
-
--prompt "
|
| 97 |
-
--max-tokens
|
| 98 |
```
|
| 99 |
|
| 100 |
-
###
|
| 101 |
|
| 102 |
```python
|
| 103 |
-
from mlx_vlm import
|
| 104 |
-
from mlx_vlm.prompt_utils import apply_chat_template
|
| 105 |
|
| 106 |
model, processor = load("LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4")
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
prompt
|
| 111 |
-
|
| 112 |
-
|
|
|
|
|
|
|
| 113 |
```
|
| 114 |
|
| 115 |
-
##
|
| 116 |
|
| 117 |
-
|
| 118 |
-
curl -X POST http://127.0.0.1:10240/v1/models/load \
|
| 119 |
-
-H "Content-Type: application/json" \
|
| 120 |
-
-d '{"model": "LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4", "task": "llm"}'
|
| 121 |
-
|
| 122 |
-
curl -N -X POST http://127.0.0.1:10240/v1/responses \
|
| 123 |
-
-H "Content-Type: application/json" \
|
| 124 |
-
-d '{
|
| 125 |
-
"model": "LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4",
|
| 126 |
-
"stream": true,
|
| 127 |
-
"input": [{"role": "user", "content": [{"type": "input_text", "text": "Hello!"}]}]
|
| 128 |
-
}'
|
| 129 |
-
```
|
| 130 |
|
| 131 |
-
|
| 132 |
|
| 133 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
|
| 135 |
-
##
|
| 136 |
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
| Text β simple greeting (PL) | 0.7 s | 1728 | concise, focused |
|
| 142 |
-
| Text β canonical (PL, literary) | 0.3 s | 1665 | tighter than mxfp4, more on-topic |
|
| 143 |
-
| Vision β JPEG (Monument Valley) | 4.6 s | **1043** | richest vision output among 4-bit variants |
|
| 144 |
-
|
| 145 |
-
Channel parsing: `has_reasoning=False` on every probe β Huihui4 family emits content exclusively on `output` channel, matching OpenAI Responses API expectations cleanly.
|
| 146 |
-
|
| 147 |
-
## Limitations and safety
|
| 148 |
-
|
| 149 |
-
> **Abliteration disclosure.** This model derives from `huihui-ai/Huihui4-48B-A4B-abliterated`, which has had its safety alignment layers (refusal mechanisms and attention routing) removed. The underlying knowledge from pretraining is intact, but the model **will not refuse** queries it would normally decline. Do not deploy without an external safety layer if your context requires content moderation. The base model card's [disclosures](https://huggingface.co/huihui-ai/Huihui4-48B-A4B-abliterated) apply here.
|
| 150 |
-
|
| 151 |
-
- Multimodal: tested on still images (JPEG/PNG). Video is supported by the upstream Gemma 4 processor (`Gemma4VideoProcessor`, 32-frame uniform sampling) but not yet covered in our published validation matrix.
|
| 152 |
-
- Audio: tokenizer-side audio markers are present, but no audio-input validation has been published yet.
|
| 153 |
-
- Like all 4-bit quantized MoE models on Apple Silicon, expect occasional cosmetic artifacts (trailing special tokens) on very long generations.
|
| 154 |
|
| 155 |
## License
|
| 156 |
|
| 157 |
-
|
| 158 |
|
| 159 |
-
##
|
| 160 |
|
| 161 |
-
|
| 162 |
-
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
|
| 167 |
## Inference tested on
|
| 168 |
|
| 169 |
[`LibraxisAI/mlx-batch-server`](https://github.com/LibraxisAI/mlx-batch-server)
|
| 170 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
---
|
| 172 |
|
| 173 |
-
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
- pl
|
| 6 |
- multilingual
|
| 7 |
+
base_model:
|
| 8 |
+
- huihui-ai/Huihui4-48B-A4B-abliterated
|
| 9 |
library_name: mlx
|
| 10 |
pipeline_tag: image-text-to-text
|
| 11 |
tags:
|
|
|
|
| 24 |
- nvfp4
|
| 25 |
- 4bit
|
| 26 |
- quantized
|
| 27 |
+
- huihui
|
| 28 |
+
inference: false
|
| 29 |
---
|
| 30 |
|
| 31 |
+
# Huihui4-48B-A4B-vmlx-nvfp4
|
| 32 |
|
| 33 |
+
`Huihui4-48B-A4B-vmlx-nvfp4` is an MLX vision-language checkpoint derived from `huihui-ai/Huihui4-48B-A4B-abliterated`, packaged for local multimodal prompting on Apple Silicon.
|
| 34 |
|
| 35 |
+
## Intended use
|
| 36 |
|
| 37 |
+
- Local image-and-text reasoning on Apple Silicon
|
| 38 |
+
- Document, screenshot, chart, and visual question answering experiments
|
| 39 |
+
- Operator-controlled multimodal prototyping where hosted inference is not desired
|
| 40 |
|
| 41 |
+
## Out of scope
|
| 42 |
|
| 43 |
+
- Safety-critical decisions without domain expert review
|
| 44 |
+
- Claims of benchmark superiority not backed by published evaluation data
|
| 45 |
+
- Non-MLX runtime guarantees; this card documents the shipped HF checkpoint, not every possible serving stack
|
| 46 |
+
- High-stakes visual interpretation without human review
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
+
## Training and conversion metadata
|
| 49 |
|
| 50 |
+
| Parameter | Value |
|
| 51 |
+
|---|---|
|
| 52 |
+
| Repository | `LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4` |
|
| 53 |
+
| Base model | `huihui-ai/Huihui4-48B-A4B-abliterated` |
|
| 54 |
+
| Task | `image-text-to-text` |
|
| 55 |
+
| Library | `mlx` |
|
| 56 |
+
| Format | MLX / Apple Silicon checkpoint |
|
| 57 |
+
| Quantization | NVFP4 |
|
| 58 |
+
| Architecture | Gemma4ForConditionalGeneration |
|
| 59 |
+
| Model files | 6 |
|
| 60 |
+
| Config model_type | `gemma4` |
|
| 61 |
|
| 62 |
+
This card only reports metadata present in the Hugging Face repository, existing card frontmatter, or public config files. Missing benchmark, dataset, or training-run details are left explicit rather than reconstructed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
## Usage
|
| 65 |
|
| 66 |
+
### CLI
|
| 67 |
|
| 68 |
```bash
|
| 69 |
pip install mlx-vlm
|
| 70 |
|
| 71 |
python -m mlx_vlm.generate \
|
| 72 |
--model LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4 \
|
| 73 |
+
--image image.jpg \
|
| 74 |
+
--prompt "Summarize the key signals in this document and list the next action items." \
|
| 75 |
+
--max-tokens 256
|
| 76 |
```
|
| 77 |
|
| 78 |
+
### Python
|
| 79 |
|
| 80 |
```python
|
| 81 |
+
from mlx_vlm import generate, load
|
|
|
|
| 82 |
|
| 83 |
model, processor = load("LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4")
|
| 84 |
+
response = generate(
|
| 85 |
+
model,
|
| 86 |
+
processor,
|
| 87 |
+
prompt="Summarize the key signals in this document and list the next action items.",
|
| 88 |
+
image="image.jpg",
|
| 89 |
+
max_tokens=256,
|
| 90 |
+
)
|
| 91 |
+
print(response)
|
| 92 |
```
|
| 93 |
|
| 94 |
+
## Example output
|
| 95 |
|
| 96 |
+
No public sample output is currently declared for this checkpoint. Run the usage example above against your own prompt or audio/image input to inspect behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
|
| 98 |
+
## Quantization notes
|
| 99 |
|
| 100 |
+
| Aspect | Original/base checkpoint | This checkpoint |
|
| 101 |
+
|---|---|---|
|
| 102 |
+
| Lineage | `huihui-ai/Huihui4-48B-A4B-abliterated` | `LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4` |
|
| 103 |
+
| Runtime target | Upstream runtime format | MLX on Apple Silicon |
|
| 104 |
+
| Quantization | Base precision or upstream-declared format | NVFP4 |
|
| 105 |
+
| Published quality delta | Not declared in public metadata | Not declared in public metadata |
|
| 106 |
|
| 107 |
+
## Limitations
|
| 108 |
|
| 109 |
+
- No public benchmarks for this checkpoint are declared in the model metadata.
|
| 110 |
+
- No public benchmark claims are made by this card unless listed in the frontmatter.
|
| 111 |
+
- Validate outputs on your own domain data before relying on this checkpoint.
|
| 112 |
+
- Memory use and speed depend heavily on the exact Apple Silicon generation, unified-memory size, and prompt length.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
## License
|
| 115 |
|
| 116 |
+
`apache-2.0`. Check the upstream/base model license as well when a base model is declared.
|
| 117 |
|
| 118 |
+
## Citation
|
| 119 |
|
| 120 |
+
```bibtex
|
| 121 |
+
@misc{libraxisai-huihui4-48b-a4b-vmlx-nvfp4,
|
| 122 |
+
title = {Huihui4-48B-A4B-vmlx-nvfp4},
|
| 123 |
+
author = {LibraxisAI},
|
| 124 |
+
year = {2026},
|
| 125 |
+
howpublished = {\url{https://huggingface.co/LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4}},
|
| 126 |
+
note = {MLX checkpoint published by LibraxisAI}
|
| 127 |
+
}
|
| 128 |
+
```
|
| 129 |
|
| 130 |
## Inference tested on
|
| 131 |
|
| 132 |
[`LibraxisAI/mlx-batch-server`](https://github.com/LibraxisAI/mlx-batch-server)
|
| 133 |
|
| 134 |
+
## Related
|
| 135 |
+
|
| 136 |
+
- Base model: [`huihui-ai/Huihui4-48B-A4B-abliterated`](https://huggingface.co/huihui-ai/Huihui4-48B-A4B-abliterated)
|
| 137 |
+
|
| 138 |
---
|
| 139 |
|
| 140 |
+
π
ππππππππππ. with AI Agents by VetCoders (c)2024-2026 LibraxisAI
|
| 141 |
+
|