TTI / Dev /README_VLAC_INTEGRATION.md

Upload folder using huggingface_hub

857c2e9 verified 7 days ago

7.34 kB

VLAC Integration with SimpleVLA-RL

This document describes the integration of Vision-Language-Action-Critic (VLAC) into the SimpleVLA-RL training framework.

Overview

The VLAC integration replaces the simulator's terminal success signal during training with VLAC-predicted values, while preserving the simulator signal for evaluation. This enables more nuanced reward signals and better task progress estimation.

Architecture

SimpleVLA-RL Training Process:
┌─────────────────┐    HTTP/JSON     ┌─────────────────┐
│   Training      │◄─────────────────►│   VLAC Service  │
│   (rob_rollout) │    (done check,   │   (port 8111)   │
│                 │   terminal value) │                 │
└─────────────────┘                   └─────────────────┘

Key Components

VLAC Service (vlac_service.py): HTTP API exposing VLAC model functionality
VLAC Client (verl/utils/vlac_client.py): Python client for service communication
Enhanced Rollout (verl/workers/rollout/rob_rollout.py): Modified rollout with VLAC integration

Usage

1. Start VLAC Service

# Start VLAC service (required for training)
python vlac_service.py --port 8111 --gpu-ids 0,1,2,3

2. Training with VLAC

# Run training with VLAC integration enabled
bash examples/run_openvla_oft_rl_vlac.sh

Key Variables (edit at top of script):

PROJECT_NAME='SimpleVLA-RL-VLAC'
EXPERIMENT_NAME='vlac-libero10-sftall_node1_trial'
SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
DATASET_NAME="libero_10"
VLAC_SERVICE_URL="http://localhost:8111"

Key VLAC Configuration:

+actor_rollout_ref.rollout.use_vlac=true
+actor_rollout_ref.rollout.vlac_service_url=$VLAC_SERVICE_URL
trainer.val_before_train=False  # Avoid val_only issue

3. Evaluation (Environment Done)

# Run evaluation with environment done signal  
bash examples/eval_openvla_oft_vlac.sh

Key Variables (edit at top of script):

PROJECT_NAME='SimpleVLA-RL-VLAC-Eval'
EXPERIMENT_NAME='vlac-libero10-sftall_node1_eval'
SFT_MODEL_PATH="CKPT/Openvla-oft-SFT-libero10-trajall"
DATASET_NAME="libero_10"

Key Configuration:

trainer.val_only=True  # Pure evaluation mode
+actor_rollout_ref.rollout.use_vlac=false  # Explicit disable

Training Logic

Training Mode (`use_vlac=true`, `val_only=false`)

Episode Step: After each action, collect trajectory frames
Done Check: Call VLAC /done endpoint with (first_frame, prev_frame, curr_frame)
- If VLAC says done=true: terminate episode, reward = 1.0
- Otherwise: continue episode
Max Steps: When max steps reached:
- Call VLAC /trajectory-critic endpoint
- Use final value as terminal reward (normalized to 0-1)

Evaluation Mode (`val_only=true`)

Use original environment done signal
No VLAC service calls required
Compute success rate using simulator feedback

Integration Details

Environment Worker (`env_worker`)

New Features:

Trajectory frame collection (trajectory_frames[])
VLAC client initialization per worker process
VLAC done detection after each step
Terminal value computation at episode end

New Output Fields:

{
  'vlac_done': bool,         # Whether VLAC detected completion
  'terminal_reward': float   # VLAC-computed terminal reward (0-1)
}

Rollout Class (`RobHFRollout`)

New Configuration:

self.use_vlac = getattr(config, 'use_vlac', False)
self.vlac_service_url = getattr(config, 'vlac_service_url', 'http://localhost:8111')

New Batch Fields:

batch["vlac_done"] = torch.tensor(...)      # VLAC termination flags
batch["terminal_reward"] = torch.tensor(...)  # VLAC terminal rewards

VLAC Client (`VLACClient`)

Key Methods:

check_done(): Episode termination detection
compute_trajectory_values(): Terminal value computation
pairwise_critic(): Frame comparison (optional)

Error Handling:

Graceful fallback if VLAC service unavailable
Automatic retry logic for transient failures
Timeout protection for long-running requests

Configuration Options

Parameter	Default	Description
`use_vlac`	`false`	Enable VLAC integration
`vlac_service_url`	`http://localhost:8111`	VLAC service endpoint
`val_only`	`false`	Evaluation mode (disables VLAC)

Performance Considerations

GPU Memory Sharing

VLAC service: ~20-30 GB during inference
SimpleVLA-RL: ~60-70 GB during training
Total: fits comfortably on H100 80GB cards

Latency Impact

Done check: ~300-800ms per step (depends on frames)
Terminal value: ~1-5s per episode (depends on trajectory length)
Overall training throughput: ~10-20% slower due to VLAC calls

Scaling

Multiple VLAC service instances on different GPUs
Load balancing across service instances
Batch optimization for trajectory processing

Debugging & Monitoring

Service Health

# Check VLAC service status
curl -X POST http://localhost:8111/healthcheck

# Enable debug image saving
export VLAC_SAVE_INPUTS=1

Training Logs

VLAC integration enabled. Service URL: http://localhost:8111
Training mode: True
VLAC detected task completion at step 45 (prob: 0.847)
Max steps reached, computing VLAC terminal value...
VLAC terminal value: 0.632

Common Issues

Service Connection Failed:

Verify VLAC service is running: ps aux | grep vlac_service
Check service logs for errors
Test manual service calls

Out of Memory:

Reduce VLAC batch sizes in service
Use fewer reference images
Monitor GPU usage: nvidia-smi

Slow Training:

Check VLAC service response times
Reduce trajectory frame collection frequency
Use multiple VLAC service instances

File Structure

SimpleVLA-RL/
├── vlac_service.py                    # VLAC HTTP service
├── test_vlac_service.py              # Service test suite
├── vlac_service_contract.md          # API specification
├── README_VLAC_SERVICE.md            # Service documentation
├── requirements_vlac_service.txt     # Service dependencies
├── examples/
│   ├── run_openvla_oft_rl_vlac.sh   # Training with VLAC
│   └── eval_openvla_oft_vlac.sh     # Evaluation script
└── verl/
    ├── utils/vlac_client.py         # VLAC service client
    └── workers/rollout/rob_rollout.py # Enhanced rollout worker

Next Steps

Performance Optimization:
- Implement request batching
- Add async processing
- Cache frequent computations
Robustness:
- Add circuit breaker pattern
- Implement request queuing
- Add health monitoring
Advanced Features:
- Reference frame caching
- Multi-task adaptation
- Progressive difficulty scaling

Support

For questions or issues with VLAC integration:

Check service health endpoints
Review training logs for VLAC messages
Test service manually with test_vlac_service.py
Verify configuration parameters match examples