nebula-8lang-14b-gguf

Q4_K_M GGUF build of electrocampbell/nebula-8lang-14b, the 14B Nebula→target translator. This is the drop-in backend for the Nebula MCP's translate_code and write_code tools when paired with a remote host agent (Sonnet / Opus / GPT / etc.) driving Claude Code or another MCP client.

Base: Qwen-2.5-14B (Apache 2.0) Training: QLoRA SFT on electrocampbell/nebula-8lang-203k — 203K Nebula↔target pairs across Python, JavaScript, TypeScript, Go, Swift, Kotlin, Rust, C. Benchmark: 89.0% raw Pass@1 on HumanEval Nebula→Python (full precision). Size on disk: ~8.4 GB Q4_K_M VRAM in use: ~13-15 GB total (weights + KV cache at 4K context). Fits on a 24 GB GPU alone.

Why Q4_K_M

The safetensors release at F16 is ~29 GB, which exceeds a 24 GB GPU. Q4_K_M brings it within reach of an RTX 3090 / 4090 / A5000 without a measurable quality drop on translation tasks. Verified on gdsp's convolution.go: the Q4_K_M build produces the same semantically correct output as the full-precision checkpoint (2× multiplier preserved, function decomposition preserved, correct slice bounds, preserved helper calls like v_mul_ec — all five failure modes of the 7B translator fixed).

How to use (Ollama)

huggingface-cli download electrocampbell/nebula-8lang-14b-gguf \
    nebula-8lang-14b.Q4_K_M.gguf Modelfile \
    --local-dir ~/models/nebula-8lang-14b

cd ~/models/nebula-8lang-14b
ollama create nebula-8lang-14b-q4 -f Modelfile

Then point the Nebula MCP at it via environment:

export NEBULA_OLLAMA_URL=http://localhost:11434          # or your remote Ollama host
# NEBULA_OLLAMA_MODEL defaults to nebula-8lang-14b-q4 in Nebula ≥ 0.2.x
claude  # Claude Code picks up the MCP config automatically

The two Nebula translator tiers

Model Size HumanEval Pass@1 VRAM Use when
nebula-8lang-7b 5 GB Q4_K_M 67.7% raw ~8 GB Co-locating with nebula-host-30b (research config)
nebula-8lang-14b-q4 8.4 GB Q4_K_M 89.0% raw ~15 GB Production coding with Sonnet/Opus as host agent

The 14B variant is the production default. The 7B is retained as a fallback for users running the full self-hosted stack (host-30b + translator both on one 24 GB GPU), where VRAM is too tight to fit the 14B alongside the agent.

License

Apache 2.0 (base model and all training data).

Citation

@misc{nebula2026,
  author = {Campbell, Colin},
  title  = {Nebula: A Universal Code Intermediate Language for Token-Efficient LLM Code Generation},
  year   = {2026},
  url    = {https://github.com/electrocampbell/nebula}
}
Downloads last month
55
GGUF
Model size
15B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for electrocampbell/nebula-8lang-14b-gguf

Base model

Qwen/Qwen2.5-14B
Quantized
(78)
this model