Ghost-Coder: Qwen2.5-32B CUDA-to-HIP Translator

Ghost-Coder is a specialized LLM designed to bridge the gap between NVIDIA's proprietary CUDA and AMD's open ROCm ecosystem. This model is a fine-tuned version of Qwen2.5-Coder-32B-Instruct, optimized specifically for high-fidelity translation of GPU kernels.

Developed for the Lablab.ai AMD Developer Hackathon (2026).

🚀 Model Highlights

Specialization: Maps complex CUDA logic (memory management, warp primitives, kernels) to functional AMD HIP code.
Hardware-Aware: Fine-tuned specifically for execution on AMD Instinct hardware.
Agent-Ready: Designed to be the "brain" of an autonomous, self-healing compiler loop.

🛠️ Training Details

The model was fine-tuned using the Unsloth framework on a high-speed sprint configuration to maximize generalization.

Hardware: AMD Instinct MI300X (192GB VRAM)
Base Model: Qwen2.5-Coder-32B-Instruct (4-bit QLoRA)
Dataset: Curated subset of CASS (CUDA-to-HIP mapping pairs)
Context Length: 4096
Training Steps: 200
Global Batch Size: 64

🧠 Intended Use

Ghost-Coder is intended for use in the Ghost-Harness, an agentic workflow that:

Translates CUDA source code to HIP.
Attempts compilation via hipcc.
Self-corrects based on compiler error feedback.

📝 Acknowledgements

Special thanks to AMD and Lablab.ai for providing the compute resources and the platform to build across the AI stack.

Created by Talha

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for muhammadtlha944/Ghost-Coder-Qwen2.5-32B-LoRA

Base model

Qwen/Qwen2.5-32B

Finetuned

Qwen/Qwen2.5-Coder-32B

Finetuned

Qwen/Qwen2.5-Coder-32B-Instruct

Quantized

unsloth/Qwen2.5-Coder-32B-Instruct-bnb-4bit

Finetuned

(34)

this model