smolvlm-500m-ccmcp-v1
GUI grounding model trained on ScreenSpot dataset for Claude-compatible computer use.
Files
| File | Description | Size |
|---|---|---|
mmproj-smolvlm-500m-ccmcp-v1-f16.gguf |
Vision projector (F16) | 190.2 MB |
smolvlm-500m-ccmcp-v1-Q4_K_M.gguf |
Main model (Q4_K_M) | 289.2 MB |
smolvlm-500m-ccmcp-v1-f16.gguf |
Main model (F16) | 782.4 MB |
Training
- Base Model: HuggingFaceTB/SmolVLM-500M-Instruct
- Dataset: ScreenSpot GUI grounding (1,017 examples)
- Method: LoRA fine-tuning (r=16, alpha=32)
- Task: Predict click coordinates in Claude format
Output Format
{"action": "left_click", "coordinate": [847, 523]}
Usage with Ollama
# Modelfile
FROM ./smolvlm-500m-ccmcp-v1-Q4_K_M.gguf
FROM ./mmproj-smolvlm-500m-ccmcp-v1-f16.gguf
PARAMETER num_ctx 4096
PARAMETER temperature 0.1
SYSTEM "You are a GUI grounding assistant. Given a screenshot and instruction, output click coordinates as JSON."
ollama create smolvlm_500m_ccmcp_v1 -f Modelfile
ollama run smolvlm_500m_ccmcp_v1 --image screenshot.png "Click the Submit button"
License
Apache 2.0 (inherits from base model)
- Downloads last month
- 13
Hardware compatibility
Log In to add your hardware
4-bit
16-bit
Model tree for pierretokns/smolvlm-500m-ccmcp-v1
Base model
HuggingFaceTB/SmolLM2-360M Quantized
HuggingFaceTB/SmolLM2-360M-Instruct Quantized
HuggingFaceTB/SmolVLM-500M-Instruct