Instructions to use paulpacaud/guardian-vanilla with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use paulpacaud/guardian-vanilla with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="paulpacaud/guardian-vanilla", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("paulpacaud/guardian-vanilla", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use paulpacaud/guardian-vanilla with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "paulpacaud/guardian-vanilla"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "paulpacaud/guardian-vanilla",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/paulpacaud/guardian-vanilla

SGLang

How to use paulpacaud/guardian-vanilla with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "paulpacaud/guardian-vanilla" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "paulpacaud/guardian-vanilla",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "paulpacaud/guardian-vanilla" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "paulpacaud/guardian-vanilla",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use paulpacaud/guardian-vanilla with Docker Model Runner:
```
docker model run hf.co/paulpacaud/guardian-vanilla
```

guardian-vanilla / README.md

paulpacaud

Initial release: guardian-vanilla checkpoint + model card

5246b65 3 days ago

preview code

raw

history blame contribute delete

5.77 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	pipeline_tag: image-text-to-text
	tags:
	- robotics
	- failure-detection
	- manipulation
	- vision-language
	- multi-view
	- internvl
	base_model: OpenGVLab/InternVL3-8B
	---

	# Guardian — Multi-View VLM for Robotic Planning & Execution Failure Detection (Vanilla variant)

	Guardian is a vision-language model fine-tuned for unified planning and execution verification in robotic manipulation. Given an instruction and one or more images of the robot scene, it predicts whether a proposed plan is correct (planning verification) or whether a subtask was successfully executed (execution verification).

	This checkpoint (`guardian-vanilla`) is the vanilla variant: it is trained and inferred without chain-of-thought reasoning, emitting only the final `<answer>` and `<category>` tokens. This makes it ~6× faster at inference than the thinking variant at a small accuracy cost (see Table IV of the paper). The richer CoT counterpart (`guardian-thinking`) is released at [`paulpacaud/guardian-thinking`](https://huggingface.co/paulpacaud/guardian-thinking).

	\| Project page \| Paper \| Code \| Data \|
	\|---\|---\|---\|---\|
	\| [di.ens.fr/willow/research/guardian](https://www.di.ens.fr/willow/research/guardian/) \| [arXiv:2512.01946](https://arxiv.org/abs/2512.01946) \| [GitHub](https://github.com/) \| [🤗 Guardian collection](https://huggingface.co/collections/paulpacaud/robotic-failure-detection-dataset-and-model-guardian) \|

	## Model summary

	- Architecture: InternVL3-8B (Qwen2.5-7B LLM + InternViT-300M-448px-V2.5), fine-tuned with LoRA (rank 16) on the LLM only; visual encoder and MLP connector kept frozen.
	- Capabilities:
	- Planning verification — from an initial scene image and a proposed list of subtasks, decide whether the plan is correct.
	- Execution verification — from before/after observations of a subtask (single-view or multi-view), decide whether the subtask succeeded.
	- Vanilla mode — direct prediction, no reasoning trace.
	- Output format:
	- Vanilla: `<answer> True\|False </answer> <category> ... </category>`
	- Training data: FailCoT (RLBench-Fail + BridgeDataV2-Fail), ~30K planning + execution failures. See the paper Scaling Cross-Environment Failure Reasoning Data for Vision-Language Robotic Manipulation (Pacaud et al., 2026).

	## Quick start

	The simplest way to run Guardian is the lightweight wrapper shipped in the Guardian repo (`examples/guardian.py`):

	```python
	from examples.guardian import Guardian

	guardian = Guardian(
	model_path="<path>/guardian-vanilla",
	thinking=False,
	)

	# Planning verification: 1 image of the initial scene
	answer, category = guardian.verify_plan(
	img_paths_list=["/path/to/start_img.png"],
	task_instruction="stack the red cup on the blue cup",
	plan=str([
	"grasp red cup",
	"move grasped object on top of blue cup",
	"release",
	]),
	)

	# Execution verification: 2, 6, or 8 images (before/after, possibly multi-view)
	answer, category = guardian.verify_subtask(
	img_paths_list=[
	"/path/to/start_left.png",
	"/path/to/start_right.png",
	"/path/to/start_wrist.png",
	"/path/to/end_left.png",
	"/path/to/end_right.png",
	"/path/to/end_wrist.png",
	],
	task_instruction="stack the red cup on the blue cup",
	subtask_instruction="grasp red cup",
	)
	```

	For execution verification, the wrapper accepts:
	- 2 images — single-view: `[start, end]`
	- 6 images — three views: `[start_left, start_right, start_wrist, end_left, end_right, end_wrist]`
	- 8 images — four views, similarly ordered.

	See [`docs/RUN_DEMO.md`](https://github.com/) in the Guardian repo for the full demo.

	## Downloading the checkpoint

	```bash
	hf download paulpacaud/guardian-vanilla \
	--local-dir ./data/failure_forge/models/guardian-vanilla
	```

	The codebase expects the checkpoint to live under `./data/failure_forge/models/guardian-vanilla/`.

	## Evaluation

	Guardian is evaluated on three real-robot OOD benchmarks bundled at [`paulpacaud/Guardian-FailCoT-OOD-datasets`](https://huggingface.co/datasets/paulpacaud/Guardian-FailCoT-OOD-datasets) — UR5-Fail, RoboFail, RoboVQA — plus the in-distribution test splits of FailCoT (RLBench-Fail / BridgeDataV2-Fail). Reproduce numbers following [`docs/Offline_VQA_Evaluation.md`](https://github.com/) in the Guardian repo.

	## Intended use

	Guardian is designed as a plug-and-play verification module for robotic manipulation pipelines (e.g. as the verifier in 3D-LOTUS++): at each planning step or subtask boundary, query Guardian; on a failure, trigger replanning or re-execution. Use the vanilla variant when inference latency matters more than peak accuracy.

	## Citation

	```bibtex
	@misc{pacaud2026guardian_failcot,
	title = {Scaling Cross-Environment Failure Reasoning Data for Vision-Language Robotic Manipulation},
	author = {Paul Pacaud and Ricardo Garcia and Shizhe Chen and Cordelia Schmid},
	year = {2026},
	eprint = {2512.01946},
	archivePrefix = {arXiv},
	primaryClass = {cs.RO}
	}
	```

	If you specifically build on the earlier Guardian workshop paper:

	```bibtex
	@inproceedings{pacaud2025guardian,
	title = {Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language Models},
	author = {Paul Pacaud and Ricardo Garcia Pinel and Shizhe Chen and Cordelia Schmid},
	booktitle = {Workshop on Making Sense of Data in Robotics: Composition, Curation, and Interpretability at Scale at CoRL 2025},
	year = {2025},
	url = {https://openreview.net/forum?id=wps46mtC9B}
	}
	```

	## License

	Released under the Apache 2.0 license, inheriting the license of the InternVL3-8B base model.