div0-space commited on
Commit
9c3d56e
Β·
verified Β·
1 Parent(s): 9852827

card: full rewrite from canonical template

Browse files
Files changed (1) hide show
  1. README.md +73 -105
README.md CHANGED
@@ -1,11 +1,11 @@
1
  ---
2
  license: apache-2.0
3
- license_link: https://huggingface.co/huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated/blob/main/LICENSE
4
- base_model: huihui-ai/Huihui4-48B-A4B-abliterated
5
  language:
6
  - en
7
  - pl
8
  - multilingual
 
 
9
  library_name: mlx
10
  pipeline_tag: image-text-to-text
11
  tags:
@@ -24,150 +24,118 @@ tags:
24
  - nvfp4
25
  - 4bit
26
  - quantized
 
 
27
  ---
28
 
29
- # Huihui4-48B-A4B vMLX NVFP4
30
 
31
- MLX **NVFP4** (NVIDIA-style FP4) build of [`huihui-ai/Huihui4-48B-A4B-abliterated`](https://huggingface.co/huihui-ai/Huihui4-48B-A4B-abliterated) for Apple Silicon β€” Gemma 4 architecture, abliterated, 48B-parameter MoE with ~4B active per token, full multimodal (image + text).
32
 
33
- **Sleeper variant.** Same bits-per-weight footprint as `mxfp4`, but uses an NVIDIA-style FP4 layout (per-tensor scale + per-block exponent) that retains noticeably more numerical headroom on dense attention matmuls. In our matrix, `nvfp4` produces longer and slightly higher-quality outputs than `mxfp4` at almost identical disk footprint and load time. Few publishers ship this format β€” it is a deliberate part of the LibraxisAI release.
34
 
35
- Built end-to-end with our `mlx-vlm` editable fork (LibraxisAI delta on top of upstream `Blaizzy/mlx-vlm`) β€” Fix 1 (progress visibility during lazy weight materialization), Fix 3 (per-shard eval + cache release during save), parity-aligned converter for Gemma 4 multi-bank audio + dual-image processor.
 
 
36
 
37
- ## TL;DR
38
 
39
- | Property | Value |
40
- |---------------------|--------------------------------------------------------------|
41
- | Base model | `huihui-ai/Huihui4-48B-A4B-abliterated` (Gemma 4, abliterated) |
42
- | Architecture | `Gemma4ForConditionalGeneration` MoE, 256 experts/layer, 8 active per token |
43
- | Total parameters | ~48 B |
44
- | Activated parameters| ~4 B per token |
45
- | Quantization | NVFP4 (NVIDIA FP4 layout, per-tensor scale + per-block exponent) |
46
- | Bits / weight | ~4.4 |
47
- | Size on disk | **27 GB** |
48
- | Cold load (M3 Ultra)| **~31 s** |
49
- | TTFT (text) | ~0.3 s |
50
- | Modalities | text in / text out, image in (JPEG/PNG), audio-aware tokenizer |
51
 
52
- ## Why this build
53
 
54
- `NVFP4` and `MXFP4` are competing 4-bit formats with different rounding strategies:
 
 
 
 
 
 
 
 
 
 
55
 
56
- - **`MXFP4`** (Microscaling FP4) β€” block scale per 32 weights, conservative on attention paths, slightly smaller files.
57
- - **`NVFP4`** (NVIDIA FP4) β€” per-tensor scale combined with per-block exponent, retains more dynamic range on dense matmuls.
58
-
59
- In identical end-to-end probes against the [`fp16` parity baseline](https://huggingface.co/LibraxisAI/Huihui4-48B-A4B-vmlx-fp16), `nvfp4` matched output quality more closely on vision (1043 vs. 980 chars on JPEG probe) at near-identical load time. For Apple Silicon serving where you want the smallest practical 4-bit checkpoint without giving up image-grounded fidelity, this is the variant to evaluate first.
60
-
61
- ## Model details
62
-
63
- | Property | Value |
64
- |-----------------------|--------------------------------------------------------|
65
- | Format | MLX, sharded safetensors |
66
- | Quantization config | NVFP4 (FP4 + per-tensor scale + per-block exponent) |
67
- | Tokenizer | Inherited from base, `chat_template.jinja` included |
68
- | Special tokens | `<|video|>` (32 frames default), `<image>`, audio markers |
69
- | Image processor | Dual-resolution Gemma 4 (low + hi-res patches) |
70
- | Audio extractor | Multi-bank mel filter (128 mel Γ— 257 freq) |
71
- | License | Apache 2.0 (inherited from `huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated`) |
72
-
73
- ## Runtime compatibility
74
-
75
- This quantized MLX build includes the Gemma 4 vision projection compatibility tensor `embed_vision.embedding_projection.biases`, so current MLX loaders that require the quantized projection bias can load the checkpoint cleanly. The MXFP8 variant was smoke-tested in LM Studio, and MXFP4/MXFP8/NVFP4 were patched with the same compatibility pattern.
76
-
77
- ## Other variants
78
-
79
- | Variant | Bits/weight | Size on disk | Cold load | When to use |
80
- |-----------------------------------------------------------------|-------------|--------------|-----------|-------------|
81
- | [`Huihui4-48B-A4B-vmlx-fp16`](https://huggingface.co/LibraxisAI/Huihui4-48B-A4B-vmlx-fp16) | 16 | 91 GB | ~99 s | parity baseline, golden eval |
82
- | [`Huihui4-48B-A4B-vmlx-mxfp8`](https://huggingface.co/LibraxisAI/Huihui4-48B-A4B-vmlx-mxfp8) | ~8.5 | 47 GB | ~55 s | balanced production target |
83
- | [`Huihui4-48B-A4B-vmlx-mxfp4`](https://huggingface.co/LibraxisAI/Huihui4-48B-A4B-vmlx-mxfp4) | ~4.4 | 25 GB | ~29 s | mainstream, 32 GB Macs |
84
- | **`Huihui4-48B-A4B-vmlx-nvfp4`** (this) | ~4.4 | 27 GB | ~31 s | **NVIDIA-style FP4 sleeper, higher quality at same footprint as mxfp4** |
85
 
86
  ## Usage
87
 
88
- ### `mlx-vlm` CLI
89
 
90
  ```bash
91
  pip install mlx-vlm
92
 
93
  python -m mlx_vlm.generate \
94
  --model LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4 \
95
- --image path/to/image.jpg \
96
- --prompt "Describe what you see in detail." \
97
- --max-tokens 1024
98
  ```
99
 
100
- ### `mlx-vlm` Python
101
 
102
  ```python
103
- from mlx_vlm import load, generate
104
- from mlx_vlm.prompt_utils import apply_chat_template
105
 
106
  model, processor = load("LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4")
107
- config = model.config
108
-
109
- messages = [{"role": "user", "content": "Hello! Tell me about yourself."}]
110
- prompt = apply_chat_template(processor, config, messages)
111
- output = generate(model, processor, prompt, max_tokens=512)
112
- print(output)
 
 
113
  ```
114
 
115
- ### `mlx-batch-runner` (Responses API, streaming)
116
 
117
- ```bash
118
- curl -X POST http://127.0.0.1:10240/v1/models/load \
119
- -H "Content-Type: application/json" \
120
- -d '{"model": "LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4", "task": "llm"}'
121
-
122
- curl -N -X POST http://127.0.0.1:10240/v1/responses \
123
- -H "Content-Type: application/json" \
124
- -d '{
125
- "model": "LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4",
126
- "stream": true,
127
- "input": [{"role": "user", "content": [{"type": "input_text", "text": "Hello!"}]}]
128
- }'
129
- ```
130
 
131
- Our local validation/runtime path for this upload workflow was `../mlx-batch-runner`, which ships full `/v1/responses` SSE with model cache TTL (`MODEL_CACHE_TTL=600` default) and pin support (`PINNED_MODELS=...`).
132
 
133
- For the public LibraxisAI server project, see [mlx-batch-server](https://github.com/LibraxisAI/mlx-batch-server): an Apple Silicon MLX inference server with batch processing, OpenAI-compatible `/v1/responses`, dynamic model load/unload, streaming, and VLM support.
 
 
 
 
 
134
 
135
- ## Validation
136
 
137
- End-to-end pipeline test 2026-04-22 (load β†’ text simple β†’ text canonical β†’ vision JPEG β†’ unload):
138
-
139
- | Probe | TTFT | Output chars | Notes |
140
- |---------------------------------|-------|--------------|---------------------------------------------|
141
- | Text β€” simple greeting (PL) | 0.7 s | 1728 | concise, focused |
142
- | Text β€” canonical (PL, literary) | 0.3 s | 1665 | tighter than mxfp4, more on-topic |
143
- | Vision β€” JPEG (Monument Valley) | 4.6 s | **1043** | richest vision output among 4-bit variants |
144
-
145
- Channel parsing: `has_reasoning=False` on every probe β€” Huihui4 family emits content exclusively on `output` channel, matching OpenAI Responses API expectations cleanly.
146
-
147
- ## Limitations and safety
148
-
149
- > **Abliteration disclosure.** This model derives from `huihui-ai/Huihui4-48B-A4B-abliterated`, which has had its safety alignment layers (refusal mechanisms and attention routing) removed. The underlying knowledge from pretraining is intact, but the model **will not refuse** queries it would normally decline. Do not deploy without an external safety layer if your context requires content moderation. The base model card's [disclosures](https://huggingface.co/huihui-ai/Huihui4-48B-A4B-abliterated) apply here.
150
-
151
- - Multimodal: tested on still images (JPEG/PNG). Video is supported by the upstream Gemma 4 processor (`Gemma4VideoProcessor`, 32-frame uniform sampling) but not yet covered in our published validation matrix.
152
- - Audio: tokenizer-side audio markers are present, but no audio-input validation has been published yet.
153
- - Like all 4-bit quantized MoE models on Apple Silicon, expect occasional cosmetic artifacts (trailing special tokens) on very long generations.
154
 
155
  ## License
156
 
157
- Apache 2.0 β€” inherited via `huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated`. See [LICENSE link](https://huggingface.co/huihui-ai/Huihui-gemma-4-26B-A4B-it-abliterated/blob/main/LICENSE) and the underlying [Google Gemma](https://ai.google.dev/gemma/terms) terms.
158
 
159
- ## Acknowledgements
160
 
161
- - **huihui-ai** β€” abliteration of the Gemma-4-26B-A4B-it base, MoE expansion to 48B-A4B, original distillation.
162
- - **TeichAI** β€” `gemma-4-26B-A4B-it-Claude-Opus-Distill` co-base.
163
- - **Google DeepMind** β€” Gemma 4 architecture and pretraining.
164
- - **Apple MLX team** β€” MLX framework, quantization primitives.
165
- - **`Blaizzy/mlx-vlm`** β€” upstream multimodal MLX runtime; this build uses our editable LibraxisAI delta which we are upstreaming as separate PRs.
 
 
 
 
166
 
167
  ## Inference tested on
168
 
169
  [`LibraxisAI/mlx-batch-server`](https://github.com/LibraxisAI/mlx-batch-server)
170
 
 
 
 
 
171
  ---
172
 
173
- `πš…πš’πš‹πšŽπšŒπš›πšŠπšπšπšŽπš. with AI Agents by VetCoders (c)2024-2026 The LibraxisAI Team`
 
 
1
  ---
2
  license: apache-2.0
 
 
3
  language:
4
  - en
5
  - pl
6
  - multilingual
7
+ base_model:
8
+ - huihui-ai/Huihui4-48B-A4B-abliterated
9
  library_name: mlx
10
  pipeline_tag: image-text-to-text
11
  tags:
 
24
  - nvfp4
25
  - 4bit
26
  - quantized
27
+ - huihui
28
+ inference: false
29
  ---
30
 
31
+ # Huihui4-48B-A4B-vmlx-nvfp4
32
 
33
+ `Huihui4-48B-A4B-vmlx-nvfp4` is an MLX vision-language checkpoint derived from `huihui-ai/Huihui4-48B-A4B-abliterated`, packaged for local multimodal prompting on Apple Silicon.
34
 
35
+ ## Intended use
36
 
37
+ - Local image-and-text reasoning on Apple Silicon
38
+ - Document, screenshot, chart, and visual question answering experiments
39
+ - Operator-controlled multimodal prototyping where hosted inference is not desired
40
 
41
+ ## Out of scope
42
 
43
+ - Safety-critical decisions without domain expert review
44
+ - Claims of benchmark superiority not backed by published evaluation data
45
+ - Non-MLX runtime guarantees; this card documents the shipped HF checkpoint, not every possible serving stack
46
+ - High-stakes visual interpretation without human review
 
 
 
 
 
 
 
 
47
 
48
+ ## Training and conversion metadata
49
 
50
+ | Parameter | Value |
51
+ |---|---|
52
+ | Repository | `LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4` |
53
+ | Base model | `huihui-ai/Huihui4-48B-A4B-abliterated` |
54
+ | Task | `image-text-to-text` |
55
+ | Library | `mlx` |
56
+ | Format | MLX / Apple Silicon checkpoint |
57
+ | Quantization | NVFP4 |
58
+ | Architecture | Gemma4ForConditionalGeneration |
59
+ | Model files | 6 |
60
+ | Config model_type | `gemma4` |
61
 
62
+ This card only reports metadata present in the Hugging Face repository, existing card frontmatter, or public config files. Missing benchmark, dataset, or training-run details are left explicit rather than reconstructed.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
  ## Usage
65
 
66
+ ### CLI
67
 
68
  ```bash
69
  pip install mlx-vlm
70
 
71
  python -m mlx_vlm.generate \
72
  --model LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4 \
73
+ --image image.jpg \
74
+ --prompt "Summarize the key signals in this document and list the next action items." \
75
+ --max-tokens 256
76
  ```
77
 
78
+ ### Python
79
 
80
  ```python
81
+ from mlx_vlm import generate, load
 
82
 
83
  model, processor = load("LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4")
84
+ response = generate(
85
+ model,
86
+ processor,
87
+ prompt="Summarize the key signals in this document and list the next action items.",
88
+ image="image.jpg",
89
+ max_tokens=256,
90
+ )
91
+ print(response)
92
  ```
93
 
94
+ ## Example output
95
 
96
+ No public sample output is currently declared for this checkpoint. Run the usage example above against your own prompt or audio/image input to inspect behavior.
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
+ ## Quantization notes
99
 
100
+ | Aspect | Original/base checkpoint | This checkpoint |
101
+ |---|---|---|
102
+ | Lineage | `huihui-ai/Huihui4-48B-A4B-abliterated` | `LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4` |
103
+ | Runtime target | Upstream runtime format | MLX on Apple Silicon |
104
+ | Quantization | Base precision or upstream-declared format | NVFP4 |
105
+ | Published quality delta | Not declared in public metadata | Not declared in public metadata |
106
 
107
+ ## Limitations
108
 
109
+ - No public benchmarks for this checkpoint are declared in the model metadata.
110
+ - No public benchmark claims are made by this card unless listed in the frontmatter.
111
+ - Validate outputs on your own domain data before relying on this checkpoint.
112
+ - Memory use and speed depend heavily on the exact Apple Silicon generation, unified-memory size, and prompt length.
 
 
 
 
 
 
 
 
 
 
 
 
 
113
 
114
  ## License
115
 
116
+ `apache-2.0`. Check the upstream/base model license as well when a base model is declared.
117
 
118
+ ## Citation
119
 
120
+ ```bibtex
121
+ @misc{libraxisai-huihui4-48b-a4b-vmlx-nvfp4,
122
+ title = {Huihui4-48B-A4B-vmlx-nvfp4},
123
+ author = {LibraxisAI},
124
+ year = {2026},
125
+ howpublished = {\url{https://huggingface.co/LibraxisAI/Huihui4-48B-A4B-vmlx-nvfp4}},
126
+ note = {MLX checkpoint published by LibraxisAI}
127
+ }
128
+ ```
129
 
130
  ## Inference tested on
131
 
132
  [`LibraxisAI/mlx-batch-server`](https://github.com/LibraxisAI/mlx-batch-server)
133
 
134
+ ## Related
135
+
136
+ - Base model: [`huihui-ai/Huihui4-48B-A4B-abliterated`](https://huggingface.co/huihui-ai/Huihui4-48B-A4B-abliterated)
137
+
138
  ---
139
 
140
+ πš…πš’πš‹πšŽπšŒπš›πšŠπšπšπšŽπš. with AI Agents by VetCoders (c)2024-2026 LibraxisAI
141
+