Upload model trained with Unsloth

b9680f4 verified about 17 hours ago

2.54 kB

license: apache-2.0
base_model: unsloth/Qwen2.5-Coder-32B-Instruct-bnb-4bit
language:
  - en
library_name: unsloth
tags:
  - amd
  - rocm
  - hip
  - cuda
  - code-generation
  - lablab-ai
  - ghost-coder
  - mi300x
  - unsloth

Ghost-Coder: Autonomous CUDA-to-HIP Translator 👻

Ghost-Coder is a specialized, agent-ready LLM designed to bridge the gap between NVIDIA's CUDA and AMD's open ROCm ecosystem. Developed for the Lablab.ai AMD Developer Hackathon (2026), this model serves as the "brain" of a self-healing agentic workflow that translates, compiles, and iterates on GPU kernels.

🚀 Overview

Ghost-Coder isn't just a translator; it’s an engineer. By fine-tuning Qwen2.5-Coder-32B specifically on the CASS (CUDA-to-HIP) mapping dataset, we've enabled a model that understands the deep structural nuances of GPU programming, from shared memory primitives to warp-level synchronization.

💎 Hardware & Framework

Training Hardware: AMD Instinct™ MI300X VF (192GB HBM3)
Framework: Unsloth (Optimized for 2x faster ROCm fine-tuning)
Optimization: 4-bit QLoRA with a 4096 context window.

🧠 Model Highlights

High-Fidelity Mapping: Precise translation of cuda* APIs to their corresponding hip* counterparts.
Agentic Ready: Optimized to parse hipcc compiler error logs and self-correct syntax or logic errors in real-time.
Massive Scale: Leveraging the 32B parameter Qwen2.5-Coder foundation for superior C++ reasoning compared to smaller 7B models.

🛠️ Training Specifications

To ensure maximum generalization and prevent overfitting, the model underwent a high-throughput training sprint:

Parameter	Configuration
Total Steps	200 (Optimized Sprint)
Global Batch Size	64
Learning Rate	2e-4
VRAM Utilization	~158GB / 192GB
Dataset	12,800+ Curated CUDA-to-HIP Pairs

🏁 Intended Use (The Ghost-Harness)

This model is designed to work within the Ghost-Harness agentic loop:

Input: User provides a raw .cu (CUDA) file.
Action: Ghost-Coder generates a .cpp (HIP) translation.
Validation: The harness runs hipcc on the output.
Self-Healing: If compilation fails, the error logs are fed back to Ghost-Coder for an iterative fix.

📝 Acknowledgments

Special thanks to AMD for the world-class compute and Lablab.ai for hosting the "Build Across the AI Stack" challenge.

Developed by Talha