nebula-8lang-14b

Fine-tuned Qwen/Qwen2.5-14B for translating Nebula — a universal code intermediate language — into 8 target programming languages: Python, JavaScript, TypeScript, Go, Swift, Kotlin, Rust, and C.

Part of the Nebula 1.0 release. Nebula is a token-efficient canonical form that compresses 16% smaller than source code on average across 8 languages, while round-tripping cleanly back to any of them.

Training


Base model	`Qwen/Qwen2.5-14B`
Method	LoRA (SFT)
LoRA rank / alpha	16 / 16
LoRA dropout	0.05
LoRA modules	all-linear
Epochs	3
Learning rate	1e-5
Batch size	8
Training data	`electrocampbell/nebula-8lang-203k` (203K pairs)
Trained on	Together AI

Evaluation

HumanEval (164 problems, Nebula→Python, Pass@1):

Model	Raw	With Error Correction
nebula-8lang-1.5b	45.1%	79.3%
nebula-8lang-7b	67.7%	88.4%
nebula-8lang-14b (this model)	57.9% (3090) / 89.0% (H100)	88.4%

MBPP (500 problems, Nebula→Python, Pass@1): 48.8%

Note: 14B raw score on a 24GB GPU is limited by CPU/GPU memory split. On a 2x H100 80GB hosted endpoint, raw Pass@1 jumps to 89.0%, beating the 7B by 21pp.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("electrocampbell/nebula-8lang-14b")
model = AutoModelForCausalLM.from_pretrained("electrocampbell/nebula-8lang-14b")

system = "You are a code translator. Given code in Nebula (a universal intermediate language), produce the equivalent idiomatic Python code. Output only the Python code, no explanations."
nebula_code = '''fn add(a, b): rt a + b'''

messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": nebula_code},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
out = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

Replace the system prompt's Python with any of: JavaScript, TypeScript, Go, Swift, Kotlin, Rust, C.

Citation

If you use this model, please cite the Nebula project: https://github.com/colinc86/nebula

License

Apache 2.0, inherited from the Qwen 2.5 base model.

Downloads last month: 70

Safetensors

Model size

15B params

Tensor type

BF16

Model tree for electrocampbell/nebula-8lang-14b

Base model

Qwen/Qwen2.5-14B

Finetuned

(104)

this model

Quantizations

2 models

electrocampbell
/

nebula-8lang-14b