Nekochu commited on
Commit
0248d77
·
verified ·
1 Parent(s): 15d69ed

Add gpu-space-cpu SKILL

Browse files
Files changed (1) hide show
  1. gpu-space-cpu/SKILL.md +49 -0
gpu-space-cpu/SKILL.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: gpu-space-cpu
3
+ description: Convert GPU HuggingFace Spaces to CPU-only for free tier deployment. Use when user says "make this run on CPU", "convert to CPU Space", "remove GPU dependencies", "deploy without GPU", "free HuggingFace Space", or "port CUDA app to CPU". Also use when analyzing repos with bitsandbytes, flash-attn, or CUDA dependencies for CPU compatibility.
4
+ ---
5
+
6
+ # GPU → CPU Space demo Conversion
7
+
8
+ - Free space Tier CPU = 2 CPU cores (vCPUs) + 16GB RAM + 50GB non-persistent disk space
9
+
10
+ ## Workflow
11
+
12
+ 1. Grep GPU deps: `@spaces.GPU|bitsandbytes|flash-attn|triton|xformers|auto-gptq|exllama|apex|\.cuda\(|device.*cuda`
13
+ 2. Remove GPU packages, replace code: `cuda`→`cpu`, `float16`→`int8`, remove `.half()`, `.cuda()`, `device_map="auto"`
14
+ 3. Create: app.py (all logic, prioritize recent built-in gradio [v6+](https://www.gradio.app/guides/gradio-6-migration-guide)), README.md, requirements.txt
15
+ 4. Test: `pip install -r requirements.txt && python app.py` and Test API by adding mcp https://www.gradio.app/guides/building-mcp-server-with-gradio
16
+ 5. Deploy after local test passes
17
+
18
+ ## requirements.txt
19
+
20
+ ```
21
+ --extra-index-url https://download.pytorch.org/whl/cpu
22
+ torch
23
+ ```
24
+ Never pin transitive deps. No packages.txt (ffmpeg/git/cmake pre-installed).
25
+
26
+ ## README.md
27
+
28
+ ```yaml
29
+ ---
30
+ title: Name
31
+ emoji: X
32
+ sdk: gradio
33
+ sdk_version: 6.3.0
34
+ app_file: app.py
35
+ ---
36
+ ```
37
+
38
+ ## Quantization (model too large)
39
+
40
+ 1. ONNX Runtime (Optimum+OMP_NUM_THREADS=2) INT8 (preferred - onnxruntime fast on CPU)
41
+ 2. TorchAO
42
+ 3. torch.quantization.quantize_dynamic
43
+ 3. Onnx FP32 is faster than FP16 on CPU
44
+
45
+ ## Stop
46
+
47
+ - Model >12GB with INT8 → needs GPU
48
+ - INT8 quality loss → needs GPU
49
+ - 3 failed approaches → tell user honestly