nebula-8lang-14b-gguf

Q4_K_M GGUF build of electrocampbell/nebula-8lang-14b, the 14B Nebula→target translator. This is the drop-in backend for the Nebula MCP's translate_code and write_code tools when paired with a remote host agent (Sonnet / Opus / GPT / etc.) driving Claude Code or another MCP client.

Base: Qwen-2.5-14B (Apache 2.0) Training: QLoRA SFT on electrocampbell/nebula-8lang-203k — 203K Nebula↔target pairs across Python, JavaScript, TypeScript, Go, Swift, Kotlin, Rust, C. Benchmark: 89.0% raw Pass@1 on HumanEval Nebula→Python (full precision). Size on disk: ~8.4 GB Q4_K_M VRAM in use: ~13-15 GB total (weights + KV cache at 4K context). Fits on a 24 GB GPU alone.

Why Q4_K_M

The safetensors release at F16 is ~29 GB, which exceeds a 24 GB GPU. Q4_K_M brings it within reach of an RTX 3090 / 4090 / A5000 without a measurable quality drop on translation tasks. Verified on gdsp's convolution.go: the Q4_K_M build produces the same semantically correct output as the full-precision checkpoint (2× multiplier preserved, function decomposition preserved, correct slice bounds, preserved helper calls like v_mul_ec — all five failure modes of the 7B translator fixed).

How to use (Ollama)

huggingface-cli download electrocampbell/nebula-8lang-14b-gguf \
    nebula-8lang-14b.Q4_K_M.gguf Modelfile \
    --local-dir ~/models/nebula-8lang-14b

cd ~/models/nebula-8lang-14b
ollama create nebula-8lang-14b-q4 -f Modelfile

Then point the Nebula MCP at it via environment:

export NEBULA_OLLAMA_URL=http://localhost:11434          # or your remote Ollama host
# NEBULA_OLLAMA_MODEL defaults to nebula-8lang-14b-q4 in Nebula ≥ 0.2.x
claude  # Claude Code picks up the MCP config automatically

The two Nebula translator tiers

Model	Size	HumanEval Pass@1	VRAM	Use when
`nebula-8lang-7b`	5 GB Q4_K_M	67.7% raw	~8 GB	Co-locating with `nebula-host-30b` (research config)
`nebula-8lang-14b-q4`	8.4 GB Q4_K_M	89.0% raw	~15 GB	Production coding with Sonnet/Opus as host agent

The 14B variant is the production default. The 7B is retained as a fallback for users running the full self-hosted stack (host-30b + translator both on one 24 GB GPU), where VRAM is too tight to fit the 14B alongside the agent.

License

Apache 2.0 (base model and all training data).

Citation

@misc{nebula2026,
  author = {Campbell, Colin},
  title  = {Nebula: A Universal Code Intermediate Language for Token-Efficient LLM Code Generation},
  year   = {2026},
  url    = {https://github.com/electrocampbell/nebula}
}

Downloads last month: 55

GGUF

Model size

15B params

Architecture

qwen2

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for electrocampbell/nebula-8lang-14b-gguf

Base model

Qwen/Qwen2.5-14B

Quantized

(78)

this model