nebula-8lang-14b
Fine-tuned Qwen/Qwen2.5-14B for translating Nebula — a universal code intermediate language — into 8 target programming languages: Python, JavaScript, TypeScript, Go, Swift, Kotlin, Rust, and C.
Part of the Nebula 1.0 release. Nebula is a token-efficient canonical form that compresses 16% smaller than source code on average across 8 languages, while round-tripping cleanly back to any of them.
Training
| Base model | Qwen/Qwen2.5-14B |
| Method | LoRA (SFT) |
| LoRA rank / alpha | 16 / 16 |
| LoRA dropout | 0.05 |
| LoRA modules | all-linear |
| Epochs | 3 |
| Learning rate | 1e-5 |
| Batch size | 8 |
| Training data | electrocampbell/nebula-8lang-203k (203K pairs) |
| Trained on | Together AI |
Evaluation
HumanEval (164 problems, Nebula→Python, Pass@1):
| Model | Raw | With Error Correction |
|---|---|---|
| nebula-8lang-1.5b | 45.1% | 79.3% |
| nebula-8lang-7b | 67.7% | 88.4% |
| nebula-8lang-14b (this model) | 57.9% (3090) / 89.0% (H100) | 88.4% |
MBPP (500 problems, Nebula→Python, Pass@1): 48.8%
Note: 14B raw score on a 24GB GPU is limited by CPU/GPU memory split. On a 2x H100 80GB hosted endpoint, raw Pass@1 jumps to 89.0%, beating the 7B by 21pp.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("electrocampbell/nebula-8lang-14b")
model = AutoModelForCausalLM.from_pretrained("electrocampbell/nebula-8lang-14b")
system = "You are a code translator. Given code in Nebula (a universal intermediate language), produce the equivalent idiomatic Python code. Output only the Python code, no explanations."
nebula_code = '''fn add(a, b): rt a + b'''
messages = [
{"role": "system", "content": system},
{"role": "user", "content": nebula_code},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
out = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
Replace the system prompt's Python with any of: JavaScript, TypeScript, Go, Swift, Kotlin, Rust, C.
Citation
If you use this model, please cite the Nebula project: https://github.com/colinc86/nebula
License
Apache 2.0, inherited from the Qwen 2.5 base model.
- Downloads last month
- 70