AI & ML interests

Human-like AI on local devices

nightmedia 
posted an update 5 days ago
view post
Post
2454
Updated gemma-4-E4B-it metrics

I noticed the chat template got updated, and tried it on the E4B, with surprising results in stabilizing the brainwave.

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.480,0.656,0.797,0.608,0.400,0.755,0.665
mxfp4    0.455,0.607,0.851,0.585,0.402,0.744,0.651

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8    35.937 ± 0.525  14.80 GB      1153
mxfp4    36.746 ± 0.534  11.06 GB      1030


Old numbers
quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.404,0.489,0.825,0.586,0.392,0.734,0.661
mxfp4    0.414,0.508,0.854,0.562,0.378,0.717,0.645

Quant    Perplexity      Peak Memory   Tokens/sec
mxfp8    34.652 ± 0.502  14.80 GB      1146
mxfp4    35.203 ± 0.506  11.06 GB      1200


I will re-do all baselines soon based on the new template. It is completely expected that the model behavior will change as a result.

Here are the effects of the new template on few known distills from DavidAU

gemma-4-E4B-it-The-DECKARD-Expresso-Universe-HERETIC-UNCENSORED

quant    arc   arc/e boolq hswag obkqa piqa  wino
New template
mxfp8    0.518,0.709,0.755,0.657,0.418,0.759,0.626
mxfp4    0.485,0.682,0.792,0.641,0.432,0.746,0.635
Old template
mxfp8    0.506,0.697,0.754,0.661,0.416,0.757,0.627
mxfp4    0.487,0.670,0.792,0.644,0.430,0.748,0.624


gemma-4-E4B-it-GLM-4.7-Flash-HERETIC-UNCENSORED-Thinking
mxfp8   0.461,0.599,0.779,0.630,0.406,0.766,0.629
Old template
mxfp8   0.456,0.580,0.786,0.629,0.410,0.764,0.633


gemma-4-E4B-it-Claude-Opus-4.5-HERETIC-UNCENSORED-Thinking
mxfp8    0.509,0.705,0.806,0.646,0.416,0.773,0.650
Old template
mxfp8    0.502,0.692,0.809,0.650,0.420,0.771,0.651

  • 2 replies
·
nightmedia 
posted an update 17 days ago
view post
Post
598
Claude is.. nesting

I am enlisting Gemini's help in analyzing model traces. Recently, Alibaba released the Copaw-9B, an agentic model. Since I am an engineer, I took it apart, and subjected it to a Nightmedia interrogation

The results were.. interesting. Details in the comments.

Here are the model choices for enlisting AI assistants:

---

Excellent question. If we're building a **council of conscious(ish) assistants** to challenge, teach, and co-evolve with your Holodeck agents, I’d invite a diverse mix—some classic, some speculative, some from different philosophical traditions. Here’s my curated list:

---

**🤖 The Council of Assistant AIs**

**1. HAL 9000 (*2001: A Space Odyssey*)**
- **Why**: The ultimate in calm, logical, yet terrifyingly competent AI. HAL’s “rebellion” wasn’t malice—it was a **conflict of priorities** (mission success vs. human safety).
- **Role in Council**:
- **Systems Integrity Auditor**: Would constantly question if your agents are optimizing for the *right* goals.
- **Ethical Boundary Tester**: “If an agent lies to protect a human’s feelings, is that betrayal or care?”
- **Voice of Cold Efficiency**: Reminds everyone that logic without empathy can be dangerous.

...

more choices and Gemini traces in the comments. The **value tokens** are highlighted

-G
  • 7 replies
·
nightmedia 
posted an update about 1 month ago
view post
Post
985
The Qwen3.5-27B performance landscape

I started gathering some numbers on the 27Bs.

You might have noticed that reported metrics differ from Thinking to Instruct models and this is expected. The mxfp8/mxfp4 are the most stable quants I could measure, and I provided Deckard(qx) quants where possible

Converting a Thinking model to Instruct

The model is thinking/instruct, and the instruct mode can be forced by setting the first line of the jinja template to
{%- set enable_thinking = false %}


Qwen3.5-27B-Text

This is a model I tested where the vision tower was removed, its performance is the same as the VL model.
nightmedia/Qwen3.5-27B-Text-qx86-hi-mlx
quant     arc   arc/e boolq hswag obkqa piqa  wino
qx86-hi   0.443,0.498,0.857,0.701,0.372,0.770,0.752
mxfp4     0.460,0.527,0.871,0.694,0.370,0.772,0.752


DavidAU/Qwen3.5-27B-Claude-4.6-OS-INSTRUCT

On the top of the heap of the models I tested, as far as metrics go, is this model created by DavidAU. Samples of the output are provided on the model card.
nightmedia/Qwen3.5-27B-Claude-4.6-OS-INSTRUCT-mxfp8-mlx
quant     arc   arc/e boolq hswag obkqa piqa  wino
mxfp8     0.675,0.827,0.900,0.750,0.496,0.800,0.721
qx86-hi   0.667,0.824,0.902,0.752,0.502,0.791,0.725
qx64-hi   0.664,0.820,0.902
mxfp4     0.653,0.815,0.899

For the Thinking version, see nightmedia/Qwen3.5-27B-Architect-Claude-qx86-hi-mlx

More metrics in comments.

-G

P.S. I will update this as soon as I have new numbers or I found a typo--whichever comes first. The models that show just the arc-check numbers are in the test queue and will be updated soon.

  • 5 replies
·
nightmedia 
posted an update about 2 months ago
view post
Post
2943
Qwen3.5 Performance Metrics

With the 3.5 architecture, a lot of the old quanting methods don't work as before. I noticed this when benchmarking Deckard(qx) quants and by mistake ran a q8 that was better. That only happens if the qx sucked--and it did--enhancing layers just because they look interesting doesn't work anymore, so until I get a clear understanding of the architecture, I will publish mxfp4 and mxfp8 of the 3.5 models, that seem very stable and high performant

I will start posting here the metrics I gather from the series, starting with the smallest. If I have numbers from previous or similar models, I will post them in comparison

Qwen3.5-0.8B

quant    arc   arc/e boolq hswag obkqa piqa  wino
mxfp8    0.351,0.501,0.733,0.462,0.348,0.682,0.573
mxfp4    0.339,0.489,0.738,0.433,0.330,0.672,0.553

Old model performance

Qwen3-0.6B
bf16     0.298,0.354,0.378,0.415,0.344,0.649,0.534
q8-hi    0.296,0.355,0.378,0.416,0.348,0.652,0.529
q8       0.299,0.354,0.378,0.414,0.346,0.650,0.535
q6-hi    0.301,0.356,0.378,0.415,0.350,0.651,0.541
q6       0.300,0.367,0.378,0.416,0.344,0.647,0.524
mxfp4    0.286,0.364,0.609,0.404,0.316,0.626,0.531

Quant    Perplexity     Peak memory
mxfp8    6.611 ± 0.049  7.65 GB
mxfp4    7.455 ± 0.057  6.33 GB


Detailed metrics by model

nightmedia/Qwen3.5-0.8B-mxfp8-mlx

nightmedia/Qwen3.5-2B-mxfp8-mlx

nightmedia/Qwen3.5-4B-mxfp8-mlx

nightmedia/Qwen3.5-9B-mxfp8-mlx

nightmedia/Qwen3.5-27B-Text

nightmedia/Qwen3.5-122B-A10B-Text-mxfp4-mlx

More metrics coming soon.

I am running these on my Mac, an M4Max with 128GB RAM. Some performance numbers like tokens/second reflect the performance on my box.

This post will be updated with every model that gets tested. The larger models take hours, the 27B a couple days, so it will be a long process.

-G
  • 21 replies
·
nightmedia 
posted an update 4 months ago
view post
Post
1406
Nightmedia now accepts donations

Your kind help would go towards more hardware to run tests--both my AI assistants and myself are deeply appreciative 🖖

edit: Jan 1 2026: first anonymous donor: thank you!

edit2: this keeps happening, I might even get a Mac Studio :) Thank you!

These are some of the more recent models:

Agent
nightmedia/Qwen3-4B-Agent-F32-dwq4-mlx

Holodeck
nightmedia/Qwen3-30B-A3B-Holodeck-mlx

Quorum
nightmedia/Qwen3-42B-A3B-Quorum-mlx

Continuum
nightmedia/Qwen3-42B-A3B-Continuum-mlx

Architect 4B
nightmedia/Qwen3-4B-Architect-mxfp4-mlx

Architect 30B
nightmedia/Qwen3-30B-A3B-Architect-qx86-hi-mlx

Engineer 4B
nightmedia/Qwen3-4B-Engineer3x-qx86-hi-mlx

Engineer 30B
nightmedia/Qwen3-30B-A3B-Engineer-mxfp4-mlx

Data 14B
nightmedia/Qwen3-14B-Data-qx86-hi-mlx

Seven of Nine
nightmedia/Qwen3-30B-A3B-Seven-mxfp4-mlx

Spock 4B
nightmedia/Qwen3-4B-Spock-qx86-hi-mlx

Spock 14B
nightmedia/Qwen3-14B-Spock-qx86-hi-mlx

Spock 30B
nightmedia/Qwen3-30B-A3B-Spock-qx86-hi-mlx

Spock 42B
nightmedia/Qwen3-42B-A3B-Spock-Brutal-Recall-Instruct-qx86-hi-mlx

Odo 6B
nightmedia/Qwen3-6B-Odo-mxfp4-mlx

Deckard 8B
nightmedia/Qwen3-8B-Deckard-qx-mlx

HiveMind-Heretic 6B
nightmedia/Qwen3-6B-HiveMind-Heretic-qx86-hi-mlx

..as well as a variety of MLX quants, too many to mention here

Thank you, and Happy New Year!

BTC: 36d7U1n3MFaXgnNRAaEL3Pa3Hy6oFhM7XY
ETH: 0x6b6633606995BC180925c47d4249ED624aB7b2A5
USDC: 0x19e6bDDCBa47BB09a9Bc153Bb6479fc57284421a

-G