license: apache-2.0
base_model: unsloth/Qwen2.5-Coder-32B-Instruct-bnb-4bit
language:
- en
library_name: unsloth
tags:
- amd
- rocm
- hip
- cuda
- code-generation
- lablab-ai
- ghost-coder
- mi300x
- unsloth
Ghost-Coder: Autonomous CUDA-to-HIP Translator π»
Ghost-Coder is a specialized, agent-ready LLM designed to bridge the gap between NVIDIA's CUDA and AMD's open ROCm ecosystem. Developed for the Lablab.ai AMD Developer Hackathon (2026), this model serves as the "brain" of a self-healing agentic workflow that translates, compiles, and iterates on GPU kernels.
π Overview
Ghost-Coder isn't just a translator; itβs an engineer. By fine-tuning Qwen2.5-Coder-32B specifically on the CASS (CUDA-to-HIP) mapping dataset, we've enabled a model that understands the deep structural nuances of GPU programming, from shared memory primitives to warp-level synchronization.
π Hardware & Framework
- Training Hardware: AMD Instinctβ’ MI300X VF (192GB HBM3)
- Framework: Unsloth (Optimized for 2x faster ROCm fine-tuning)
- Optimization: 4-bit QLoRA with a 4096 context window.
π§ Model Highlights
- High-Fidelity Mapping: Precise translation of
cuda*APIs to their correspondinghip*counterparts. - Agentic Ready: Optimized to parse
hipcccompiler error logs and self-correct syntax or logic errors in real-time. - Massive Scale: Leveraging the 32B parameter Qwen2.5-Coder foundation for superior C++ reasoning compared to smaller 7B models.
π οΈ Training Specifications
To ensure maximum generalization and prevent overfitting, the model underwent a high-throughput training sprint:
| Parameter | Configuration |
|---|---|
| Total Steps | 200 (Optimized Sprint) |
| Global Batch Size | 64 |
| Learning Rate | 2e-4 |
| VRAM Utilization | ~158GB / 192GB |
| Dataset | 12,800+ Curated CUDA-to-HIP Pairs |
π Intended Use (The Ghost-Harness)
This model is designed to work within the Ghost-Harness agentic loop:
- Input: User provides a raw
.cu(CUDA) file. - Action: Ghost-Coder generates a
.cpp(HIP) translation. - Validation: The harness runs
hipccon the output. - Self-Healing: If compilation fails, the error logs are fed back to Ghost-Coder for an iterative fix.
π Acknowledgments
Special thanks to AMD for the world-class compute and Lablab.ai for hosting the "Build Across the AI Stack" challenge.
Developed by Talha