File size: 2,536 Bytes
1103d2c b9680f4 1103d2c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | ---
license: apache-2.0
base_model: unsloth/Qwen2.5-Coder-32B-Instruct-bnb-4bit
language:
- en
library_name: unsloth
tags:
- amd
- rocm
- hip
- cuda
- code-generation
- lablab-ai
- ghost-coder
- mi300x
- unsloth
---
# Ghost-Coder: Autonomous CUDA-to-HIP Translator 👻
**Ghost-Coder** is a specialized, agent-ready LLM designed to bridge the gap between NVIDIA's CUDA and AMD's open ROCm ecosystem. Developed for the **Lablab.ai AMD Developer Hackathon (2026)**, this model serves as the "brain" of a self-healing agentic workflow that translates, compiles, and iterates on GPU kernels.
## 🚀 Overview
Ghost-Coder isn't just a translator; it’s an engineer. By fine-tuning **Qwen2.5-Coder-32B** specifically on the **CASS (CUDA-to-HIP)** mapping dataset, we've enabled a model that understands the deep structural nuances of GPU programming, from shared memory primitives to warp-level synchronization.
### 💎 Hardware & Framework
- **Training Hardware:** AMD Instinct™ MI300X VF (192GB HBM3)
- **Framework:** [Unsloth](https://github.com/unslothai/unsloth) (Optimized for 2x faster ROCm fine-tuning)
- **Optimization:** 4-bit QLoRA with a 4096 context window.
## 🧠 Model Highlights
- **High-Fidelity Mapping:** Precise translation of `cuda*` APIs to their corresponding `hip*` counterparts.
- **Agentic Ready:** Optimized to parse `hipcc` compiler error logs and self-correct syntax or logic errors in real-time.
- **Massive Scale:** Leveraging the 32B parameter Qwen2.5-Coder foundation for superior C++ reasoning compared to smaller 7B models.
## 🛠️ Training Specifications
To ensure maximum generalization and prevent overfitting, the model underwent a high-throughput training sprint:
| Parameter | Configuration |
| :--- | :--- |
| **Total Steps** | 200 (Optimized Sprint) |
| **Global Batch Size** | 64 |
| **Learning Rate** | 2e-4 |
| **VRAM Utilization** | ~158GB / 192GB |
| **Dataset** | 12,800+ Curated CUDA-to-HIP Pairs |
## 🏁 Intended Use (The Ghost-Harness)
This model is designed to work within the **Ghost-Harness** agentic loop:
1. **Input:** User provides a raw `.cu` (CUDA) file.
2. **Action:** Ghost-Coder generates a `.cpp` (HIP) translation.
3. **Validation:** The harness runs `hipcc` on the output.
4. **Self-Healing:** If compilation fails, the error logs are fed back to Ghost-Coder for an iterative fix.
## 📝 Acknowledgments
Special thanks to **AMD** for the world-class compute and **Lablab.ai** for hosting the "Build Across the AI Stack" challenge.
---
*Developed by Talha* |