TTI / Dev /README_VLAC_SERVICE.md
JosephBai's picture
Upload folder using huggingface_hub
857c2e9 verified

VLAC Service

A minimal HTTP API service that exposes the Vision-Language-Action-Critic (VLAC) model for use in SimpleVLA-RL training.

Quick Start

1. Install Dependencies

pip install -r requirements_vlac_service.txt

2. Start the Service

python vlac_service.py --port 8111 --gpu-ids 0,1,2,3

3. Test the Service

python test_vlac_service.py --url http://localhost:8111

Usage

Command Line Options

python vlac_service.py --help
  • --port: Port to run on (default: 8111)
  • --host: Host to bind to (default: 0.0.0.0)
  • --ckpt-path: Path to VLAC checkpoint (default: /home/zechen/SimpleVLA-RL/CKPT/VLAC)
  • --gpu-ids: Comma-separated GPU IDs (default: "0")
  • --workers: Number of workers (default: 1)

Environment Variables

  • VLAC_SAVE_INPUTS=1: Save decoded images to /tmp/vlac_debug/ for debugging

API Endpoints

Health Check

curl -X POST http://localhost:8111/healthcheck

Pairwise Critic

curl -X POST http://localhost:8111/pairwise-critic \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "image_a": "<base64_image>",
    "image_b": "<base64_image>",
    "rich": false
  }'

Done Detection

curl -X POST http://localhost:8111/done \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "first_frame": "<base64_image>",
    "prev_frame": "<base64_image>", 
    "curr_frame": "<base64_image>",
    "reference": ["<base64_image>"]
  }'

Trajectory Critic

curl -X POST http://localhost:8111/trajectory-critic \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Pick up the bowl and put it in the box",
    "frames": ["<base64_image>", "<base64_image>"],
    "skip": 5,
    "ref_num": 6,
    "batch_size": 10,
    "think": false,
    "return_video": false
  }'

Integration with SimpleVLA-RL

The service is designed to be called from the verl training framework:

  1. During training: Call /done after each step to determine episode termination
  2. At episode end: Call /trajectory-critic to get value estimates for terminal rewards
  3. During evaluation: Use environment done signal (skip VLAC)

See vlac_service_contract.md for the full API specification.

Architecture

  • Single process, single GPU: Each service instance uses one GPU selected automatically
  • Automatic batching: Large requests are chunked into batches ≤ 8 frames
  • Image processing: All images auto-resized to 448×448, base64 encoded
  • Simple deployment: No Docker or complex orchestration required

Troubleshooting

Service won't start

  • Check that the checkpoint path exists: /home/zechen/SimpleVLA-RL/CKPT/VLAC
  • Verify GPU availability with nvidia-smi
  • Check that all dependencies are installed

Out of memory errors

  • Reduce batch size in requests
  • Use fewer reference images
  • Check GPU memory usage with nvidia-smi

Slow responses

  • Use fewer reference images in /done requests
  • Reduce skip parameter in /trajectory-critic
  • Consider running multiple service instances on different GPUs

Files

  • vlac_service.py: Main service implementation
  • test_vlac_service.py: Test script with sample requests
  • requirements_vlac_service.txt: Python dependencies
  • vlac_service_contract.md: Full API specification
  • guidelines.md: Integration guidelines for SimpleVLA-RL

Performance Notes

  • GPU memory usage: ~20-30 GB during inference
  • Typical latency:
    • /healthcheck: <10ms
    • /pairwise-critic: ~200-500ms
    • /done: ~300-800ms (depends on reference images)
    • /trajectory-critic: ~1-5s (depends on trajectory length)

The service is optimized for the SimpleVLA-RL use case where GPU memory is shared with the main training process.