VLAC Service

A minimal HTTP API service that exposes the Vision-Language-Action-Critic (VLAC) model for use in SimpleVLA-RL training.

Quick Start

1. Install Dependencies

pip install -r requirements_vlac_service.txt

2. Start the Service

python vlac_service.py --port 8111 --gpu-ids 0,1,2,3

3. Test the Service

python test_vlac_service.py --url http://localhost:8111

Usage

Command Line Options

python vlac_service.py --help

--port: Port to run on (default: 8111)
--host: Host to bind to (default: 0.0.0.0)
--ckpt-path: Path to VLAC checkpoint (default: /home/zechen/SimpleVLA-RL/CKPT/VLAC)
--gpu-ids: Comma-separated GPU IDs (default: "0")
--workers: Number of workers (default: 1)

Environment Variables

VLAC_SAVE_INPUTS=1: Save decoded images to /tmp/vlac_debug/ for debugging

API Endpoints

Health Check

curl -X POST http://localhost:8111/healthcheck

Pairwise Critic

curl -X POST http://localhost:8111/pairwise-critic \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "image_a": "<base64_image>",
    "image_b": "<base64_image>",
    "rich": false
  }'

Done Detection

curl -X POST http://localhost:8111/done \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "first_frame": "<base64_image>",
    "prev_frame": "<base64_image>", 
    "curr_frame": "<base64_image>",
    "reference": ["<base64_image>"]
  }'

Trajectory Critic

curl -X POST http://localhost:8111/trajectory-critic \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "frames": ["<base64_image>", "<base64_image>"],
    "skip": 5,
    "ref_num": 6,
    "batch_size": 10,
    "think": false,
    "return_video": false
  }'

Integration with SimpleVLA-RL

The service is designed to be called from the verl training framework:

During training: Call /done after each step to determine episode termination
At episode end: Call /trajectory-critic to get value estimates for terminal rewards
During evaluation: Use environment done signal (skip VLAC)

See vlac_service_contract.md for the full API specification.

Architecture

Single process, single GPU: Each service instance uses one GPU selected automatically
Automatic batching: Large requests are chunked into batches ≤ 8 frames
Image processing: All images auto-resized to 448×448, base64 encoded
Simple deployment: No Docker or complex orchestration required

Troubleshooting

Service won't start

Check that the checkpoint path exists: /home/zechen/SimpleVLA-RL/CKPT/VLAC
Verify GPU availability with nvidia-smi
Check that all dependencies are installed

Out of memory errors

Reduce batch size in requests
Use fewer reference images
Check GPU memory usage with nvidia-smi

Slow responses

Use fewer reference images in /done requests
Reduce skip parameter in /trajectory-critic
Consider running multiple service instances on different GPUs

Files

vlac_service.py: Main service implementation
test_vlac_service.py: Test script with sample requests
requirements_vlac_service.txt: Python dependencies
vlac_service_contract.md: Full API specification
guidelines.md: Integration guidelines for SimpleVLA-RL

Performance Notes

GPU memory usage: ~20-30 GB during inference
Typical latency:
- /healthcheck: <10ms
- /pairwise-critic: ~200-500ms
- /done: ~300-800ms (depends on reference images)
- /trajectory-critic: ~1-5s (depends on trajectory length)

The service is optimized for the SimpleVLA-RL use case where GPU memory is shared with the main training process.