# VLAC Integration with SimpleVLA-RL This document describes the integration of Vision-Language-Action-Critic (VLAC) into the SimpleVLA-RL training framework. ## Overview The VLAC integration replaces the simulator's terminal success signal during training with VLAC-predicted values, while preserving the simulator signal for evaluation. This enables more nuanced reward signals and better task progress estimation. ## Architecture ``` SimpleVLA-RL Training Process: ┌─────────────────┐ HTTP/JSON ┌─────────────────┐ │ Training │◄─────────────────►│ VLAC Service │ │ (rob_rollout) │ (done check, │ (port 8111) │ │ │ terminal value) │ │ └─────────────────┘ └─────────────────┘ ``` ### Key Components 1. **VLAC Service** (`vlac_service.py`): HTTP API exposing VLAC model functionality 2. **VLAC Client** (`verl/utils/vlac_client.py`): Python client for service communication 3. **Enhanced Rollout** (`verl/workers/rollout/rob_rollout.py`): Modified rollout with VLAC integration ## Usage ### 1. Start VLAC Service ```bash # Start VLAC service (required for training) python vlac_service.py --port 8111 --gpu-ids 0,1,2,3 ``` ### 2. Training with VLAC ```bash # Run training with VLAC integration enabled bash examples/run_openvla_oft_rl_vlac.sh ``` **Key Variables (edit at top of script):** ```bash PROJECT_NAME='SimpleVLA-RL-VLAC' EXPERIMENT_NAME='vlac-libero10-sftall_node1_trial' SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall" DATASET_NAME="libero_10" VLAC_SERVICE_URL="http://localhost:8111" ``` **Key VLAC Configuration:** ```yaml +actor_rollout_ref.rollout.use_vlac=true +actor_rollout_ref.rollout.vlac_service_url=$VLAC_SERVICE_URL trainer.val_before_train=False # Avoid val_only issue ``` ### 3. Evaluation (Environment Done) ```bash # Run evaluation with environment done signal bash examples/eval_openvla_oft_vlac.sh ``` **Key Variables (edit at top of script):** ```bash PROJECT_NAME='SimpleVLA-RL-VLAC-Eval' EXPERIMENT_NAME='vlac-libero10-sftall_node1_eval' SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall" DATASET_NAME="libero_10" ``` **Key Configuration:** ```yaml trainer.val_only=True # Pure evaluation mode +actor_rollout_ref.rollout.use_vlac=false # Explicit disable ``` ## Training Logic ### Training Mode (`use_vlac=true`, `val_only=false`) 1. **Episode Step**: After each action, collect trajectory frames 2. **Done Check**: Call VLAC `/done` endpoint with (first_frame, prev_frame, curr_frame) - If VLAC says `done=true`: terminate episode, reward = 1.0 - Otherwise: continue episode 3. **Max Steps**: When max steps reached: - Call VLAC `/trajectory-critic` endpoint - Use final value as terminal reward (normalized to 0-1) ### Evaluation Mode (`val_only=true`) - Use original environment `done` signal - No VLAC service calls required - Compute success rate using simulator feedback ## Integration Details ### Environment Worker (`env_worker`) **New Features:** - Trajectory frame collection (`trajectory_frames[]`) - VLAC client initialization per worker process - VLAC done detection after each step - Terminal value computation at episode end **New Output Fields:** ```python { 'vlac_done': bool, # Whether VLAC detected completion 'terminal_reward': float # VLAC-computed terminal reward (0-1) } ``` ### Rollout Class (`RobHFRollout`) **New Configuration:** ```python self.use_vlac = getattr(config, 'use_vlac', False) self.vlac_service_url = getattr(config, 'vlac_service_url', 'http://localhost:8111') ``` **New Batch Fields:** ```python batch["vlac_done"] = torch.tensor(...) # VLAC termination flags batch["terminal_reward"] = torch.tensor(...) # VLAC terminal rewards ``` ### VLAC Client (`VLACClient`) **Key Methods:** - `check_done()`: Episode termination detection - `compute_trajectory_values()`: Terminal value computation - `pairwise_critic()`: Frame comparison (optional) **Error Handling:** - Graceful fallback if VLAC service unavailable - Automatic retry logic for transient failures - Timeout protection for long-running requests ## Configuration Options | Parameter | Default | Description | |-----------|---------|-------------| | `use_vlac` | `false` | Enable VLAC integration | | `vlac_service_url` | `http://localhost:8111` | VLAC service endpoint | | `val_only` | `false` | Evaluation mode (disables VLAC) | ## Performance Considerations ### GPU Memory Sharing - VLAC service: ~20-30 GB during inference - SimpleVLA-RL: ~60-70 GB during training - Total: fits comfortably on H100 80GB cards ### Latency Impact - Done check: ~300-800ms per step (depends on frames) - Terminal value: ~1-5s per episode (depends on trajectory length) - Overall training throughput: ~10-20% slower due to VLAC calls ### Scaling - Multiple VLAC service instances on different GPUs - Load balancing across service instances - Batch optimization for trajectory processing ## Debugging & Monitoring ### Service Health ```bash # Check VLAC service status curl -X POST http://localhost:8111/healthcheck # Enable debug image saving export VLAC_SAVE_INPUTS=1 ``` ### Training Logs ``` VLAC integration enabled. Service URL: http://localhost:8111 Training mode: True VLAC detected task completion at step 45 (prob: 0.847) Max steps reached, computing VLAC terminal value... VLAC terminal value: 0.632 ``` ### Common Issues **Service Connection Failed:** - Verify VLAC service is running: `ps aux | grep vlac_service` - Check service logs for errors - Test manual service calls **Out of Memory:** - Reduce VLAC batch sizes in service - Use fewer reference images - Monitor GPU usage: `nvidia-smi` **Slow Training:** - Check VLAC service response times - Reduce trajectory frame collection frequency - Use multiple VLAC service instances ## File Structure ``` SimpleVLA-RL/ ├── vlac_service.py # VLAC HTTP service ├── test_vlac_service.py # Service test suite ├── vlac_service_contract.md # API specification ├── README_VLAC_SERVICE.md # Service documentation ├── requirements_vlac_service.txt # Service dependencies ├── examples/ │ ├── run_openvla_oft_rl_vlac.sh # Training with VLAC │ └── eval_openvla_oft_vlac.sh # Evaluation script └── verl/ ├── utils/vlac_client.py # VLAC service client └── workers/rollout/rob_rollout.py # Enhanced rollout worker ``` ## Next Steps 1. **Performance Optimization**: - Implement request batching - Add async processing - Cache frequent computations 2. **Robustness**: - Add circuit breaker pattern - Implement request queuing - Add health monitoring 3. **Advanced Features**: - Reference frame caching - Multi-task adaptation - Progressive difficulty scaling ## Support For questions or issues with VLAC integration: 1. Check service health endpoints 2. Review training logs for VLAC messages 3. Test service manually with `test_vlac_service.py` 4. Verify configuration parameters match examples