lablab-ai-amd-developer-hackathon
/

Ghost-Coder-Qwen2.5-32B-LoR

code-generation

Model card Files Files and versions

Ghost-Coder-Qwen2.5-32B-LoR / README.md

muhammadtlha944's picture

muhammadtlha944

Upload model trained with Unsloth

b9680f4 verified about 19 hours ago

|

history blame contribute delete

2.54 kB

	---
	license: apache-2.0
	base_model: unsloth/Qwen2.5-Coder-32B-Instruct-bnb-4bit
	language:
	- en
	library_name: unsloth
	tags:
	- amd
	- rocm
	- hip
	- cuda
	- code-generation
	- lablab-ai
	- ghost-coder
	- mi300x
	- unsloth
	---

	# Ghost-Coder: Autonomous CUDA-to-HIP Translator 👻

	Ghost-Coder is a specialized, agent-ready LLM designed to bridge the gap between NVIDIA's CUDA and AMD's open ROCm ecosystem. Developed for the Lablab.ai AMD Developer Hackathon (2026), this model serves as the "brain" of a self-healing agentic workflow that translates, compiles, and iterates on GPU kernels.

	## 🚀 Overview
	Ghost-Coder isn't just a translator; it’s an engineer. By fine-tuning Qwen2.5-Coder-32B specifically on the CASS (CUDA-to-HIP) mapping dataset, we've enabled a model that understands the deep structural nuances of GPU programming, from shared memory primitives to warp-level synchronization.

	### 💎 Hardware & Framework
	- Training Hardware: AMD Instinct™ MI300X VF (192GB HBM3)
	- Framework: [Unsloth](https://github.com/unslothai/unsloth) (Optimized for 2x faster ROCm fine-tuning)
	- Optimization: 4-bit QLoRA with a 4096 context window.

	## 🧠 Model Highlights
	- High-Fidelity Mapping: Precise translation of `cuda` APIs to their corresponding `hip` counterparts.
	- Agentic Ready: Optimized to parse `hipcc` compiler error logs and self-correct syntax or logic errors in real-time.
	- Massive Scale: Leveraging the 32B parameter Qwen2.5-Coder foundation for superior C++ reasoning compared to smaller 7B models.

	## 🛠️ Training Specifications
	To ensure maximum generalization and prevent overfitting, the model underwent a high-throughput training sprint:

	\| Parameter \| Configuration \|
	\| :--- \| :--- \|
	\| Total Steps \| 200 (Optimized Sprint) \|
	\| Global Batch Size \| 64 \|
	\| Learning Rate \| 2e-4 \|
	\| VRAM Utilization \| ~158GB / 192GB \|
	\| Dataset \| 12,800+ Curated CUDA-to-HIP Pairs \|

	## 🏁 Intended Use (The Ghost-Harness)
	This model is designed to work within the Ghost-Harness agentic loop:
	1. Input: User provides a raw `.cu` (CUDA) file.
	2. Action: Ghost-Coder generates a `.cpp` (HIP) translation.
	3. Validation: The harness runs `hipcc` on the output.
	4. Self-Healing: If compilation fails, the error logs are fed back to Ghost-Coder for an iterative fix.

	## 📝 Acknowledgments
	Special thanks to AMD for the world-class compute and Lablab.ai for hosting the "Build Across the AI Stack" challenge.

	---
	Developed by Talha