TTI / Dev /README_VLAC_INTEGRATION.md
JosephBai's picture
Upload folder using huggingface_hub
857c2e9 verified

VLAC Integration with SimpleVLA-RL

This document describes the integration of Vision-Language-Action-Critic (VLAC) into the SimpleVLA-RL training framework.

Overview

The VLAC integration replaces the simulator's terminal success signal during training with VLAC-predicted values, while preserving the simulator signal for evaluation. This enables more nuanced reward signals and better task progress estimation.

Architecture

SimpleVLA-RL Training Process:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    HTTP/JSON     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Training      │◄─────────────────►│   VLAC Service  β”‚
β”‚   (rob_rollout) β”‚    (done check,   β”‚   (port 8111)   β”‚
β”‚                 β”‚   terminal value) β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

  1. VLAC Service (vlac_service.py): HTTP API exposing VLAC model functionality
  2. VLAC Client (verl/utils/vlac_client.py): Python client for service communication
  3. Enhanced Rollout (verl/workers/rollout/rob_rollout.py): Modified rollout with VLAC integration

Usage

1. Start VLAC Service

# Start VLAC service (required for training)
python vlac_service.py --port 8111 --gpu-ids 0,1,2,3

2. Training with VLAC

# Run training with VLAC integration enabled
bash examples/run_openvla_oft_rl_vlac.sh

Key Variables (edit at top of script):

PROJECT_NAME='SimpleVLA-RL-VLAC'
EXPERIMENT_NAME='vlac-libero10-sftall_node1_trial'
SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
DATASET_NAME="libero_10"
VLAC_SERVICE_URL="http://localhost:8111"

Key VLAC Configuration:

+actor_rollout_ref.rollout.use_vlac=true
+actor_rollout_ref.rollout.vlac_service_url=$VLAC_SERVICE_URL
trainer.val_before_train=False  # Avoid val_only issue

3. Evaluation (Environment Done)

# Run evaluation with environment done signal  
bash examples/eval_openvla_oft_vlac.sh

Key Variables (edit at top of script):

PROJECT_NAME='SimpleVLA-RL-VLAC-Eval'
EXPERIMENT_NAME='vlac-libero10-sftall_node1_eval'
SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
DATASET_NAME="libero_10"

Key Configuration:

trainer.val_only=True  # Pure evaluation mode
+actor_rollout_ref.rollout.use_vlac=false  # Explicit disable

Training Logic

Training Mode (use_vlac=true, val_only=false)

  1. Episode Step: After each action, collect trajectory frames
  2. Done Check: Call VLAC /done endpoint with (first_frame, prev_frame, curr_frame)
    • If VLAC says done=true: terminate episode, reward = 1.0
    • Otherwise: continue episode
  3. Max Steps: When max steps reached:
    • Call VLAC /trajectory-critic endpoint
    • Use final value as terminal reward (normalized to 0-1)

Evaluation Mode (val_only=true)

  • Use original environment done signal
  • No VLAC service calls required
  • Compute success rate using simulator feedback

Integration Details

Environment Worker (env_worker)

New Features:

  • Trajectory frame collection (trajectory_frames[])
  • VLAC client initialization per worker process
  • VLAC done detection after each step
  • Terminal value computation at episode end

New Output Fields:

{
  'vlac_done': bool,         # Whether VLAC detected completion
  'terminal_reward': float   # VLAC-computed terminal reward (0-1)
}

Rollout Class (RobHFRollout)

New Configuration:

self.use_vlac = getattr(config, 'use_vlac', False)
self.vlac_service_url = getattr(config, 'vlac_service_url', 'http://localhost:8111')

New Batch Fields:

batch["vlac_done"] = torch.tensor(...)      # VLAC termination flags
batch["terminal_reward"] = torch.tensor(...)  # VLAC terminal rewards

VLAC Client (VLACClient)

Key Methods:

  • check_done(): Episode termination detection
  • compute_trajectory_values(): Terminal value computation
  • pairwise_critic(): Frame comparison (optional)

Error Handling:

  • Graceful fallback if VLAC service unavailable
  • Automatic retry logic for transient failures
  • Timeout protection for long-running requests

Configuration Options

Parameter Default Description
use_vlac false Enable VLAC integration
vlac_service_url http://localhost:8111 VLAC service endpoint
val_only false Evaluation mode (disables VLAC)

Performance Considerations

GPU Memory Sharing

  • VLAC service: ~20-30 GB during inference
  • SimpleVLA-RL: ~60-70 GB during training
  • Total: fits comfortably on H100 80GB cards

Latency Impact

  • Done check: ~300-800ms per step (depends on frames)
  • Terminal value: ~1-5s per episode (depends on trajectory length)
  • Overall training throughput: ~10-20% slower due to VLAC calls

Scaling

  • Multiple VLAC service instances on different GPUs
  • Load balancing across service instances
  • Batch optimization for trajectory processing

Debugging & Monitoring

Service Health

# Check VLAC service status
curl -X POST http://localhost:8111/healthcheck

# Enable debug image saving
export VLAC_SAVE_INPUTS=1

Training Logs

VLAC integration enabled. Service URL: http://localhost:8111
Training mode: True
VLAC detected task completion at step 45 (prob: 0.847)
Max steps reached, computing VLAC terminal value...
VLAC terminal value: 0.632

Common Issues

Service Connection Failed:

  • Verify VLAC service is running: ps aux | grep vlac_service
  • Check service logs for errors
  • Test manual service calls

Out of Memory:

  • Reduce VLAC batch sizes in service
  • Use fewer reference images
  • Monitor GPU usage: nvidia-smi

Slow Training:

  • Check VLAC service response times
  • Reduce trajectory frame collection frequency
  • Use multiple VLAC service instances

File Structure

SimpleVLA-RL/
β”œβ”€β”€ vlac_service.py                    # VLAC HTTP service
β”œβ”€β”€ test_vlac_service.py              # Service test suite
β”œβ”€β”€ vlac_service_contract.md          # API specification
β”œβ”€β”€ README_VLAC_SERVICE.md            # Service documentation
β”œβ”€β”€ requirements_vlac_service.txt     # Service dependencies
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ run_openvla_oft_rl_vlac.sh   # Training with VLAC
β”‚   └── eval_openvla_oft_vlac.sh     # Evaluation script
└── verl/
    β”œβ”€β”€ utils/vlac_client.py         # VLAC service client
    └── workers/rollout/rob_rollout.py # Enhanced rollout worker

Next Steps

  1. Performance Optimization:

    • Implement request batching
    • Add async processing
    • Cache frequent computations
  2. Robustness:

    • Add circuit breaker pattern
    • Implement request queuing
    • Add health monitoring
  3. Advanced Features:

    • Reference frame caching
    • Multi-task adaptation
    • Progressive difficulty scaling

Support

For questions or issues with VLAC integration:

  1. Check service health endpoints
  2. Review training logs for VLAC messages
  3. Test service manually with test_vlac_service.py
  4. Verify configuration parameters match examples