Other
Transformers
Safetensors
PyTorch
English
vision-language-action
humanoid-robotics
telepathy
multimodal
robotics-control
lora
Instructions to use Veltraxor/Sigma with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Veltraxor/Sigma with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Veltraxor/Sigma", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: gemma | |
| language: | |
| - en | |
| tags: | |
| - vision-language-action | |
| - humanoid-robotics | |
| - telepathy | |
| - multimodal | |
| - robotics-control | |
| - lora | |
| - pytorch | |
| base_model: lerobot/pi05_base | |
| datasets: | |
| - lerobot/svla_so101_pickplace | |
| library_name: transformers | |
| pipeline_tag: other | |
| author: "Libo Wang" | |
| # Sigma: The Key for Vision–Language–Action Models toward Telepathy | |
| [](https://huggingface.co/Veltraxor/Sigma) | |
| [](https://huggingface.co/lerobot/pi05_base) | |
| [](https://huggingface.co/datasets/lerobot/svla_so101_pickplace) | |
| Sigma is a **telepathy-style Vision–Language–Action (VLA) model** built on top of `lerobot/pi05_base`. | |
| It adds a semantic “telepathy” path and LoRA adapters that steer continuous robot control using internal **semantic memory** and **intent states**, while keeping the original π0.5 backbone weights intact and recoverable. | |
| --- | |
| ## 1. Summary | |
| - **Base policy**: `lerobot/pi05_base` (π0.5) | |
| - **Author**: **Libo Wang** | |
| - **GPU for training**: single RTX 4090 (24GB) | |
| - **Data**: `lerobot/svla_so101_pickplace` | |
| - **Objective**: | |
| Make a π0.5-style VLA **use internal semantic & intent states** to refine continuous control, rather than only imitating trajectories. | |
| Sigma keeps the perception and control structure of π0.5, and introduces an additional pathway that: | |
| - fuses **vision, language, and robot state** into a shared latent sequence, | |
| - maintains a **semantic state** m_t and an **intent vector** z_intent over time, | |
| - converts them into **telepathy factors** that modulate the policy’s action outputs as residual corrections. | |
| --- | |
| ## 2. Architecture at a Glance | |
| Sigma can be seen as **π0.5 + telepathic head + LoRA adapters**: | |
| - **Vision / State stream** | |
| - reuse π0.5 encoders for images and robot state; | |
| - add FiLM-style modulation from telepathy factors on vision tokens. | |
| - **Language–semantic stream** | |
| - take text tokens, vision tokens, and state tokens into a shared MLLM backbone; | |
| - derive: | |
| - a **semantic memory** m_t that accumulates cross-time information, | |
| - an **intent vector** z_intent, | |
| - pooled **semantic factors** aligned with the text embedding space. | |
| - **Action stream (three branches)** | |
| - treat π0.5 outputs as **baseline**: | |
| - action vector (per-step), | |
| - action chunk (short horizon), | |
| - action trajectory (full horizon); | |
| - learn **residual actions** driven by telepathy factors on all three branches. | |
| The resulting policy still *looks like* π0.5 from the outside (same inputs, same output types), but actions are now corrected by an internal telepathy pathway that is aware of **deep semantics and associative intent**. | |
| --- | |
| ## 3. Training Setup | |
| ### 3.1 Dataset & preprocessing | |
| - **Upstream dataset**: `lerobot/svla_so101_pickplace` | |
| - **Task**: pick-and-place style manipulation with multi-frame RGB + robot state + continuous actions. | |
| A preprocessing script (`dataset_preprocess_sigma_vla.py`) does: | |
| - sliding-window segmentation with horizon `T = 16`, | |
| - filtering out windows with nearly zero action norm to remove static segments, | |
| - packing vision frames, robot state, and 3-scale action targets into tensor batches, | |
| - exporting three sharded files: | |
| ```text | |
| storage/sigma_pickplace/shard_00000.pt | |
| storage/sigma_pickplace/shard_00001.pt | |
| storage/sigma_pickplace/shard_00002.pt | |
| ``` | |
| These shards are the **only** data used for Sigma training and evaluation. | |
| ### 3.2 LoRA fine-tuning (Sigma training) | |
| Training is performed on a **single RTX 4090** using `train_sigma_telepathy_vla_lora.py`: | |
| ```bash | |
| python train_sigma_telepathy_vla_lora.py \ | |
| --base_model_id lerobot/pi05_base \ | |
| --dataset_dir /workspace/storage/sigma_pickplace \ | |
| --output_dir /workspace/storage/sigma_lora_out \ | |
| --batch_size 4 \ | |
| --gradient_accumulation_steps 4 \ | |
| --max_steps 300 \ | |
| --dtype bf16 | |
| ``` | |
| Key aspects: | |
| - freeze backbone weights from `lerobot/pi05_base`; | |
| - attach **LoRA** on key projections (q, k, v, o) and the telepathy heads; | |
| - jointly optimize: | |
| - **three control losses**: | |
| - `L_act_vec` for per-step action vectors, | |
| - `L_act_chk` for short-horizon chunks, | |
| - `L_act_trj` for full trajectories; | |
| - **semantic & telepathy regularizers**: | |
| - alignment of semantic factors with text embeddings, | |
| - control of telepathy factor norm `tau_l2`. | |
| All LoRA and telepathy parameters are stored under: | |
| ```text | |
| storage/sigma_lora_out/ | |
| sigma_telepathy_heads.pt | |
| adapter_config.json | |
| adapter_model.bin | |
| ... | |
| ``` | |
| ### 3.3 Telepathy-aware training logic | |
| Two key training mechanisms are implemented inside the loss: | |
| - **Telepathic Residual Action Focusing (TRAF)** | |
| Focuses learning on *residual actions* instead of full actions, and uses **hard-sample mining** (top-k error segments) to allocate more gradient budget to difficult humanoid control windows. | |
| - **Telepathic Semantic Alignment Curriculum (TSAC)** | |
| Gradually increases the weights of: | |
| - semantic memory–text alignment, | |
| - intent–telepathy alignment, | |
| while maintaining action regression as the primary objective early on. | |
| Late in training, Sigma is encouraged to let **internal semantic/intent structure** drive the residual corrections. | |
| --- | |
| ## 4. Inference-time Telepathy Adapter | |
| A lightweight adapter (`sigma_adapter.py`) controls how much the telepathy residuals are allowed to modify the baseline π0.5 actions: | |
| - reads: | |
| - baseline π0.5 actions (`base_action_vector`, …), | |
| - Sigma residuals, | |
| - telepathy diagnostics (norms, cosine alignments), | |
| - computes a **risk-aware scaling factor** in min_scale, max_scale, | |
| - blends: | |
| ```python | |
| action = base_action + scale * telepathy_residual | |
| ``` | |
| If residuals are too large or misaligned, `scale` is pushed toward 0, effectively reverting to π0.5 behavior. | |
| If residuals are moderate and well aligned, `scale` approaches 1, enabling telepathy-enhanced control. | |
| --- | |
| ## 5. Evaluation Protocol | |
| Evaluation uses `eval_sigma_vla_rollout.py` in **offline closed-loop replay**: | |
| - both Sigma and the baseline: | |
| - use the *same* preprocessed shards (`shard_0000x.pt`), | |
| - share the *same* telepathy heads file `sigma_telepathy_heads.pt`, | |
| - **only Sigma**: | |
| - loads LoRA weights, | |
| - activates telepathy residuals and the adapter in control output. | |
| ### 5.1 CHECK A – telepathy geometry & alignment sanity | |
| CHECK A verifies that **telepathy geometry is identical** between experimental and control runs: | |
| - `heads_tensors = 325` | |
| - `mean ≈ 0.002`, `std ≈ 0.107`, `rms ≈ 0.107` for telepathy head weights | |
| - `avg_tau_l2 ≈ 51.6` – average L2 norm of telepathy factors | |
| - `avg_semantic_text_alignment ≈ 0.13` – semantic factor vs. text embedding alignment | |
| These numbers are matched between Sigma and the π0.5 baseline, so behavior differences cannot be explained by changing telepathy parameters or text alignment geometry. | |
| ### 5.2 CHECK B – multiscale control & telepathy metrics | |
| CHECK B defines and reports: | |
| - `mse_vec` – per-step action vector MSE (fine-grain control precision) | |
| - `mse_chk` – short segment chunk MSE (local motion consistency) | |
| - `mse_trj` – full trajectory MSE (long-horizon tracking) | |
| - `tau_l2` – telepathy factor norms (activation strength) | |
| - `sem_align` – semantic alignment (e.g., cosine) between semantic factors and text embeddings | |
| On the same 723 samples and 181 batches: | |
| - Sigma shows **consistently lower `mse_vec`, `mse_chk`, `mse_trj`** than the baseline, | |
| - while **`tau_l2` and `sem_align` remain similar** between both models. | |
| This pattern supports the interpretation that Sigma **uses the same semantic / telepathy geometry more effectively**, converting it into tangible gains in control accuracy instead of merely altering the embedding space. | |
| --- | |
| ## 6. How to Use Sigma | |
| > ⚠️ You must have access to `lerobot/pi05_base` and the preprocessed shards or an equivalent environment to reproduce full experiments. | |
| ### 6.1 Installation (example) | |
| ```bash | |
| # base env | |
| pip install "transformers>=4.40.0" accelerate torch torchvision | |
| pip install lerobot | |
| # clone this repository (example path) | |
| git clone https://github.com/Veltraxor/Sigma.git | |
| cd Sigma | |
| ``` | |
| ### 6.2 Loading Sigma on top of pi0.5 | |
| ```python | |
| import torch | |
| from lerobot import Pi05Policy | |
| from sigma_vla import SigmaTelepathyVLA, SigmaTelepathyAdapter | |
| device = "cuda" | |
| dtype = torch.bfloat16 | |
| # 1. Load base π0.5 policy | |
| base_policy = Pi05Policy.from_pretrained("lerobot/pi05_base") | |
| # 2. Build Sigma on top of the base policy | |
| sigma_policy = SigmaTelepathyVLA.from_base( | |
| base_policy=base_policy, | |
| lora_dir="./storage/sigma_lora_out", | |
| telepathy_heads_path="./storage/sigma_lora_out/sigma_telepathy_heads.pt", | |
| device=device, | |
| dtype=dtype, | |
| ) | |
| # 3. Optional runtime adapter | |
| adapter = SigmaTelepathyAdapter( | |
| min_scale=0.0, | |
| max_scale=1.0, | |
| risk_temperature=1.0, | |
| ) | |
| # 4. Single batch forward (offline replay) | |
| batch = { | |
| "vis_obs": vis_obs_tensor, # [B, T, C, H, W] | |
| "robot_state": robot_state_tensor, # [B, T, D_state] | |
| "texts": list_of_text_prompts, # length B | |
| } | |
| with torch.no_grad(): | |
| out = sigma_policy(**batch, use_telepathy=True) | |
| blended_action = adapter( | |
| base_action_vector=out["base_action_vector"], | |
| telepathy_residual=out["telepathy_residual_vector"], | |
| telepathy_factors=out["telepathy_factors"], | |
| ) | |
| ``` | |
| --- | |
| ## 7. Repository Layout (typical) | |
| A typical Sigma repo / model card includes: | |
| ```text | |
| README.md # this file | |
| sigma_env.example # example env file for HF tokens, paths | |
| dataset_preprocess_sigma_vla.py | |
| train_sigma_telepathy_vla_lora.py | |
| eval_sigma_vla_rollout.py | |
| sigma_telepathy_vla.py # model definition | |
| sigma_adapter.py # inference-time adapter | |
| storage/ | |
| sigma_pickplace/ | |
| shard_00000.pt | |
| shard_00001.pt | |
| shard_00002.pt | |
| sigma_lora_out/ | |
| sigma_telepathy_heads.pt | |
| adapter_config.json | |
| adapter_model.bin | |
| ... | |
| logs/ | |
| sigma_eval_report.json | |
| sigma_eval_checkA.json | |
| sigma_eval_checkB.json | |
| ``` | |
| You can adapt this layout to your own environment; the key assumption is that **Sigma is always loaded as a LoRA + telepathy delta on top of `lerobot/pi05_base`**. | |
| --- | |
| ## 8. Intended Use, Risks, and Limitations | |
| - **Intended use** | |
| Sigma is intended for **research and experimentation** on: | |
| - semantic / telepathy-style control in VLA systems, | |
| - offline trajectory analysis and simulation, | |
| - early-stage humanoid / manipulator control studies. | |
| - **Not intended for** | |
| - direct deployment on physical robots **without additional safety layers**; | |
| - safety-critical or human-facing applications. | |
| - **Known limitations** | |
| - trained only on `svla_so101_pickplace`; | |
| - evaluated only in offline replay; | |
| - telepathy path tuned for a single task family and embodiment. | |
| Users should treat Sigma as a **proof-of-concept** that demonstrates how “deep semantic + associative intent” can be engineered into residual control, not as a generic controller. | |
| --- | |
| ## 9. Author & Acknowledgements | |
| - **Author**: **Libo Wang** | |
| - Base policy and dataset by **Physical Intelligence / LeRobot** teams. | |
| - Training environment based on a single RTX 4090 GPU; all scripts are structured to be portable to other single-GPU or multi-GPU setups with minimal changes. | |
| --- | |
| ## 10. Citation | |
| If you use Sigma, please cite both the original π0.5 / OpenPI work and this Sigma extension. | |
| **π0.5 / OpenPI:** | |
| ```bibtex | |
| @article{openpi2024, | |
| title = {Open-World Robotic Manipulation with Vision-Language-Action Models}, | |
| author = {Physical Intelligence}, | |
| year = {2024}, | |
| url = {https://github.com/Physical-Intelligence/openpi} | |
| } | |
| ``` | |
| **Sigma (example entry):** | |
| ```bibtex | |
| @article{sigma2025, | |
| title = {Sigma: The Key for Vision--Language--Action Models toward Telepathy}, | |
| author = {Wang, Libo}, | |
| year = {2025}, | |
| note = {Telepathy-style extension of lerobot/pi05_base}, | |
| url = {https://huggingface.co/Veltraxor/Sigma} | |
| } | |
| ``` | |