VLAC Integration with SimpleVLA-RL
This document describes the integration of Vision-Language-Action-Critic (VLAC) into the SimpleVLA-RL training framework.
Overview
The VLAC integration replaces the simulator's terminal success signal during training with VLAC-predicted values, while preserving the simulator signal for evaluation. This enables more nuanced reward signals and better task progress estimation.
Architecture
SimpleVLA-RL Training Process:
βββββββββββββββββββ HTTP/JSON βββββββββββββββββββ
β Training ββββββββββββββββββββΊβ VLAC Service β
β (rob_rollout) β (done check, β (port 8111) β
β β terminal value) β β
βββββββββββββββββββ βββββββββββββββββββ
Key Components
- VLAC Service (
vlac_service.py): HTTP API exposing VLAC model functionality - VLAC Client (
verl/utils/vlac_client.py): Python client for service communication - Enhanced Rollout (
verl/workers/rollout/rob_rollout.py): Modified rollout with VLAC integration
Usage
1. Start VLAC Service
# Start VLAC service (required for training)
python vlac_service.py --port 8111 --gpu-ids 0,1,2,3
2. Training with VLAC
# Run training with VLAC integration enabled
bash examples/run_openvla_oft_rl_vlac.sh
Key Variables (edit at top of script):
PROJECT_NAME='SimpleVLA-RL-VLAC'
EXPERIMENT_NAME='vlac-libero10-sftall_node1_trial'
SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
DATASET_NAME="libero_10"
VLAC_SERVICE_URL="http://localhost:8111"
Key VLAC Configuration:
+actor_rollout_ref.rollout.use_vlac=true
+actor_rollout_ref.rollout.vlac_service_url=$VLAC_SERVICE_URL
trainer.val_before_train=False # Avoid val_only issue
3. Evaluation (Environment Done)
# Run evaluation with environment done signal
bash examples/eval_openvla_oft_vlac.sh
Key Variables (edit at top of script):
PROJECT_NAME='SimpleVLA-RL-VLAC-Eval'
EXPERIMENT_NAME='vlac-libero10-sftall_node1_eval'
SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
DATASET_NAME="libero_10"
Key Configuration:
trainer.val_only=True # Pure evaluation mode
+actor_rollout_ref.rollout.use_vlac=false # Explicit disable
Training Logic
Training Mode (use_vlac=true, val_only=false)
- Episode Step: After each action, collect trajectory frames
- Done Check: Call VLAC
/doneendpoint with (first_frame, prev_frame, curr_frame)- If VLAC says
done=true: terminate episode, reward = 1.0 - Otherwise: continue episode
- If VLAC says
- Max Steps: When max steps reached:
- Call VLAC
/trajectory-criticendpoint - Use final value as terminal reward (normalized to 0-1)
- Call VLAC
Evaluation Mode (val_only=true)
- Use original environment
donesignal - No VLAC service calls required
- Compute success rate using simulator feedback
Integration Details
Environment Worker (env_worker)
New Features:
- Trajectory frame collection (
trajectory_frames[]) - VLAC client initialization per worker process
- VLAC done detection after each step
- Terminal value computation at episode end
New Output Fields:
{
'vlac_done': bool, # Whether VLAC detected completion
'terminal_reward': float # VLAC-computed terminal reward (0-1)
}
Rollout Class (RobHFRollout)
New Configuration:
self.use_vlac = getattr(config, 'use_vlac', False)
self.vlac_service_url = getattr(config, 'vlac_service_url', 'http://localhost:8111')
New Batch Fields:
batch["vlac_done"] = torch.tensor(...) # VLAC termination flags
batch["terminal_reward"] = torch.tensor(...) # VLAC terminal rewards
VLAC Client (VLACClient)
Key Methods:
check_done(): Episode termination detectioncompute_trajectory_values(): Terminal value computationpairwise_critic(): Frame comparison (optional)
Error Handling:
- Graceful fallback if VLAC service unavailable
- Automatic retry logic for transient failures
- Timeout protection for long-running requests
Configuration Options
| Parameter | Default | Description |
|---|---|---|
use_vlac |
false |
Enable VLAC integration |
vlac_service_url |
http://localhost:8111 |
VLAC service endpoint |
val_only |
false |
Evaluation mode (disables VLAC) |
Performance Considerations
GPU Memory Sharing
- VLAC service: ~20-30 GB during inference
- SimpleVLA-RL: ~60-70 GB during training
- Total: fits comfortably on H100 80GB cards
Latency Impact
- Done check: ~300-800ms per step (depends on frames)
- Terminal value: ~1-5s per episode (depends on trajectory length)
- Overall training throughput: ~10-20% slower due to VLAC calls
Scaling
- Multiple VLAC service instances on different GPUs
- Load balancing across service instances
- Batch optimization for trajectory processing
Debugging & Monitoring
Service Health
# Check VLAC service status
curl -X POST http://localhost:8111/healthcheck
# Enable debug image saving
export VLAC_SAVE_INPUTS=1
Training Logs
VLAC integration enabled. Service URL: http://localhost:8111
Training mode: True
VLAC detected task completion at step 45 (prob: 0.847)
Max steps reached, computing VLAC terminal value...
VLAC terminal value: 0.632
Common Issues
Service Connection Failed:
- Verify VLAC service is running:
ps aux | grep vlac_service - Check service logs for errors
- Test manual service calls
Out of Memory:
- Reduce VLAC batch sizes in service
- Use fewer reference images
- Monitor GPU usage:
nvidia-smi
Slow Training:
- Check VLAC service response times
- Reduce trajectory frame collection frequency
- Use multiple VLAC service instances
File Structure
SimpleVLA-RL/
βββ vlac_service.py # VLAC HTTP service
βββ test_vlac_service.py # Service test suite
βββ vlac_service_contract.md # API specification
βββ README_VLAC_SERVICE.md # Service documentation
βββ requirements_vlac_service.txt # Service dependencies
βββ examples/
β βββ run_openvla_oft_rl_vlac.sh # Training with VLAC
β βββ eval_openvla_oft_vlac.sh # Evaluation script
βββ verl/
βββ utils/vlac_client.py # VLAC service client
βββ workers/rollout/rob_rollout.py # Enhanced rollout worker
Next Steps
Performance Optimization:
- Implement request batching
- Add async processing
- Cache frequent computations
Robustness:
- Add circuit breaker pattern
- Implement request queuing
- Add health monitoring
Advanced Features:
- Reference frame caching
- Multi-task adaptation
- Progressive difficulty scaling
Support
For questions or issues with VLAC integration:
- Check service health endpoints
- Review training logs for VLAC messages
- Test service manually with
test_vlac_service.py - Verify configuration parameters match examples