VLAC Service
A minimal HTTP API service that exposes the Vision-Language-Action-Critic (VLAC) model for use in SimpleVLA-RL training.
Quick Start
1. Install Dependencies
pip install -r requirements_vlac_service.txt
2. Start the Service
python vlac_service.py --port 8111 --gpu-ids 0,1,2,3
3. Test the Service
python test_vlac_service.py --url http://localhost:8111
Usage
Command Line Options
python vlac_service.py --help
--port: Port to run on (default: 8111)--host: Host to bind to (default: 0.0.0.0)--ckpt-path: Path to VLAC checkpoint (default: /home/zechen/SimpleVLA-RL/CKPT/VLAC)--gpu-ids: Comma-separated GPU IDs (default: "0")--workers: Number of workers (default: 1)
Environment Variables
VLAC_SAVE_INPUTS=1: Save decoded images to/tmp/vlac_debug/for debugging
API Endpoints
Health Check
curl -X POST http://localhost:8111/healthcheck
Pairwise Critic
curl -X POST http://localhost:8111/pairwise-critic \
-H "Content-Type: application/json" \
-d '{
"task": "Pick up the bowl and put it in the box",
"image_a": "<base64_image>",
"image_b": "<base64_image>",
"rich": false
}'
Done Detection
curl -X POST http://localhost:8111/done \
-H "Content-Type: application/json" \
-d '{
"task": "Pick up the bowl and put it in the box",
"first_frame": "<base64_image>",
"prev_frame": "<base64_image>",
"curr_frame": "<base64_image>",
"reference": ["<base64_image>"]
}'
Trajectory Critic
curl -X POST http://localhost:8111/trajectory-critic \
-H "Content-Type: application/json" \
-d '{
"task": "Pick up the bowl and put it in the box",
"frames": ["<base64_image>", "<base64_image>"],
"skip": 5,
"ref_num": 6,
"batch_size": 10,
"think": false,
"return_video": false
}'
Integration with SimpleVLA-RL
The service is designed to be called from the verl training framework:
- During training: Call
/doneafter each step to determine episode termination - At episode end: Call
/trajectory-criticto get value estimates for terminal rewards - During evaluation: Use environment
donesignal (skip VLAC)
See vlac_service_contract.md for the full API specification.
Architecture
- Single process, single GPU: Each service instance uses one GPU selected automatically
- Automatic batching: Large requests are chunked into batches ≤ 8 frames
- Image processing: All images auto-resized to 448×448, base64 encoded
- Simple deployment: No Docker or complex orchestration required
Troubleshooting
Service won't start
- Check that the checkpoint path exists:
/home/zechen/SimpleVLA-RL/CKPT/VLAC - Verify GPU availability with
nvidia-smi - Check that all dependencies are installed
Out of memory errors
- Reduce batch size in requests
- Use fewer reference images
- Check GPU memory usage with
nvidia-smi
Slow responses
- Use fewer reference images in
/donerequests - Reduce
skipparameter in/trajectory-critic - Consider running multiple service instances on different GPUs
Files
vlac_service.py: Main service implementationtest_vlac_service.py: Test script with sample requestsrequirements_vlac_service.txt: Python dependenciesvlac_service_contract.md: Full API specificationguidelines.md: Integration guidelines for SimpleVLA-RL
Performance Notes
- GPU memory usage: ~20-30 GB during inference
- Typical latency:
/healthcheck: <10ms/pairwise-critic: ~200-500ms/done: ~300-800ms (depends on reference images)/trajectory-critic: ~1-5s (depends on trajectory length)
The service is optimized for the SimpleVLA-RL use case where GPU memory is shared with the main training process.