How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lablab-ai-amd-developer-hackathon/Ghost-Coder-Qwen2.5-32B-LoR to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lablab-ai-amd-developer-hackathon/Ghost-Coder-Qwen2.5-32B-LoR to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for lablab-ai-amd-developer-hackathon/Ghost-Coder-Qwen2.5-32B-LoR to start chatting
Load model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="lablab-ai-amd-developer-hackathon/Ghost-Coder-Qwen2.5-32B-LoR",
    max_seq_length=2048,
)
Quick Links

Ghost-Coder: Autonomous CUDA-to-HIP Translator πŸ‘»

Ghost-Coder is a specialized, agent-ready LLM designed to bridge the gap between NVIDIA's CUDA and AMD's open ROCm ecosystem. Developed for the Lablab.ai AMD Developer Hackathon (2026), this model serves as the "brain" of a self-healing agentic workflow that translates, compiles, and iterates on GPU kernels.

πŸš€ Overview

Ghost-Coder isn't just a translator; it’s an engineer. By fine-tuning Qwen2.5-Coder-32B specifically on the CASS (CUDA-to-HIP) mapping dataset, we've enabled a model that understands the deep structural nuances of GPU programming, from shared memory primitives to warp-level synchronization.

πŸ’Ž Hardware & Framework

  • Training Hardware: AMD Instinctβ„’ MI300X VF (192GB HBM3)
  • Framework: Unsloth (Optimized for 2x faster ROCm fine-tuning)
  • Optimization: 4-bit QLoRA with a 4096 context window.

🧠 Model Highlights

  • High-Fidelity Mapping: Precise translation of cuda* APIs to their corresponding hip* counterparts.
  • Agentic Ready: Optimized to parse hipcc compiler error logs and self-correct syntax or logic errors in real-time.
  • Massive Scale: Leveraging the 32B parameter Qwen2.5-Coder foundation for superior C++ reasoning compared to smaller 7B models.

πŸ› οΈ Training Specifications

To ensure maximum generalization and prevent overfitting, the model underwent a high-throughput training sprint:

Parameter Configuration
Total Steps 200 (Optimized Sprint)
Global Batch Size 64
Learning Rate 2e-4
VRAM Utilization ~158GB / 192GB
Dataset 12,800+ Curated CUDA-to-HIP Pairs

🏁 Intended Use (The Ghost-Harness)

This model is designed to work within the Ghost-Harness agentic loop:

  1. Input: User provides a raw .cu (CUDA) file.
  2. Action: Ghost-Coder generates a .cpp (HIP) translation.
  3. Validation: The harness runs hipcc on the output.
  4. Self-Healing: If compilation fails, the error logs are fed back to Ghost-Coder for an iterative fix.

πŸ“ Acknowledgments

Special thanks to AMD for the world-class compute and Lablab.ai for hosting the "Build Across the AI Stack" challenge.


Developed by Talha

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lablab-ai-amd-developer-hackathon/Ghost-Coder-Qwen2.5-32B-LoR