nebula-8lang-14b-gguf
Q4_K_M GGUF build of electrocampbell/nebula-8lang-14b, the 14B Nebula→target translator. This is the drop-in backend for the Nebula MCP's translate_code and write_code tools when paired with a remote host agent (Sonnet / Opus / GPT / etc.) driving Claude Code or another MCP client.
Base: Qwen-2.5-14B (Apache 2.0)
Training: QLoRA SFT on electrocampbell/nebula-8lang-203k — 203K Nebula↔target pairs across Python, JavaScript, TypeScript, Go, Swift, Kotlin, Rust, C.
Benchmark: 89.0% raw Pass@1 on HumanEval Nebula→Python (full precision).
Size on disk: ~8.4 GB Q4_K_M
VRAM in use: ~13-15 GB total (weights + KV cache at 4K context). Fits on a 24 GB GPU alone.
Why Q4_K_M
The safetensors release at F16 is ~29 GB, which exceeds a 24 GB GPU. Q4_K_M brings it within reach of an RTX 3090 / 4090 / A5000 without a measurable quality drop on translation tasks. Verified on gdsp's convolution.go: the Q4_K_M build produces the same semantically correct output as the full-precision checkpoint (2× multiplier preserved, function decomposition preserved, correct slice bounds, preserved helper calls like v_mul_ec — all five failure modes of the 7B translator fixed).
How to use (Ollama)
huggingface-cli download electrocampbell/nebula-8lang-14b-gguf \
nebula-8lang-14b.Q4_K_M.gguf Modelfile \
--local-dir ~/models/nebula-8lang-14b
cd ~/models/nebula-8lang-14b
ollama create nebula-8lang-14b-q4 -f Modelfile
Then point the Nebula MCP at it via environment:
export NEBULA_OLLAMA_URL=http://localhost:11434 # or your remote Ollama host
# NEBULA_OLLAMA_MODEL defaults to nebula-8lang-14b-q4 in Nebula ≥ 0.2.x
claude # Claude Code picks up the MCP config automatically
The two Nebula translator tiers
| Model | Size | HumanEval Pass@1 | VRAM | Use when |
|---|---|---|---|---|
nebula-8lang-7b |
5 GB Q4_K_M | 67.7% raw | ~8 GB | Co-locating with nebula-host-30b (research config) |
nebula-8lang-14b-q4 |
8.4 GB Q4_K_M | 89.0% raw | ~15 GB | Production coding with Sonnet/Opus as host agent |
The 14B variant is the production default. The 7B is retained as a fallback for users running the full self-hosted stack (host-30b + translator both on one 24 GB GPU), where VRAM is too tight to fit the 14B alongside the agent.
License
Apache 2.0 (base model and all training data).
Citation
@misc{nebula2026,
author = {Campbell, Colin},
title = {Nebula: A Universal Code Intermediate Language for Token-Efficient LLM Code Generation},
year = {2026},
url = {https://github.com/electrocampbell/nebula}
}
- Downloads last month
- 55
4-bit
Model tree for electrocampbell/nebula-8lang-14b-gguf
Base model
Qwen/Qwen2.5-14B