Merged Pour Coke Dataset (With Videos)
This dataset combines three pour coke datasets for disturbance recovery research:
- pour_coke_static - Clean demonstrations (episodes 0-49)
- pour_coke_perturbate_source - Source perturbations on black cup (episodes 50-69)
- pour_coke_perturbate_target - Target perturbations on white cup (episodes 70-89)
Dataset Information
- Total Episodes: 90
- Total Frames: 53,644
- Robot: SO-101 Follower (6-DOF arm)
- FPS: 15
- LeRobot Version: v3.0
- Videos: Included (3 camera views, MP4 format)
Task Description
Pick up black cup and pour into white cup
Episode Breakdown
Bucket A: Clean Demonstrations (Episodes 0-49)
- 50 episodes of clean, unperturbed demonstrations
- Static environment with no disturbances
- Baseline behavior for comparison
Bucket B1: Source Perturbations (Episodes 50-69)
- 20 episodes with perturbations to the black cup (source object)
- Grasp disturbances during approach phase
- Tests recovery from source object displacement
Bucket B2: Target Perturbations (Episodes 70-89)
- 20 episodes with perturbations to the white cup (target object)
- Target disturbances during alignment phase
- Tests recovery from target object displacement
Robot Configuration
The SO-101 robot has 6 degrees of freedom:
- shoulder_pan - Base rotation (yaw)
- shoulder_lift - Shoulder pitch
- elbow_flex - Elbow pitch
- wrist_flex - Wrist pitch
- wrist_roll - Wrist rotation (roll)
- gripper - Gripper open/close (0 = closed, higher = open)
Camera Views
The dataset includes 3 synchronized camera views:
- shoulder_base_wide_view - Wide angle view from shoulder base
- workspace_variable_view - Adjustable workspace overview
- wrist_roll_top_down - Top-down view from wrist
All videos are stored as MP4 files at 15 FPS, 640x480 resolution.
Data Format
Each sample contains:
{
'action': [float32] * 6, # Target joint positions
'observation.state': [float32] * 6, # Current joint positions
'timestamp': float32, # Time in seconds
'frame_index': int64, # Frame index within episode
'episode_index': int64, # Episode identifier (0-89)
'index': int64, # Global frame index
'task_index': int64 # Task identifier (0)
}
Video Organization
Videos are organized by camera and episode chunk:
videos/
βββ observation.images.shoulder_base_wide_view/
β βββ chunk-000/ # Episodes 0-49
β βββ chunk-050/ # Episodes 50-69
β βββ chunk-070/ # Episodes 70-89
βββ observation.images.workspace_variable_view/
β βββ chunk-000/
β βββ chunk-050/
β βββ chunk-070/
βββ observation.images.wrist_roll_top_down/
βββ chunk-000/
βββ chunk-050/
βββ chunk-070/
Usage
Load the dataset
from datasets import load_dataset
# Load the full dataset
dataset = load_dataset("bencxr/merged_pour_coke_with_videos")
# Access training split
train = dataset['train']
# Get a sample
sample = train[0]
print(f"Episode: {sample['episode_index']}")
print(f"Action: {sample['action']}")
print(f"State: {sample['observation.state']}")
Access videos
Videos are stored separately in the videos/ directory. To access them:
from pathlib import Path
from huggingface_hub import snapshot_download
import cv2
# Download the entire dataset including videos
dataset_path = snapshot_download(
repo_id="bencxr/merged_pour_coke_with_videos",
repo_type="dataset"
)
# Load a video
video_path = Path(dataset_path) / "videos/observation.images.shoulder_base_wide_view/chunk-000/file-000.mp4"
cap = cv2.VideoCapture(str(video_path))
# Read frames
ret, frame = cap.read()
Using with LeRobot
This dataset follows LeRobot v3.0 format and can be used with the LeRobot library:
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
dataset = LeRobotDataset("bencxr/merged_pour_coke_with_videos")
Use Cases
This dataset is designed for:
- Disturbance Recovery Research: Study how policies handle unexpected object displacements
- Robust Policy Learning: Train policies that generalize to perturbations
- Stage-Aware Reward Modeling (SARM): Learn task-stage-specific reward models
- Behavior Cloning with Recovery: Train policies with both clean and disturbed demonstrations
- Sim-to-Real Transfer: Evaluate robustness to real-world disturbances
Citation
If you use this dataset, please cite:
@dataset{merged_pour_coke_2026,
title={Merged Pour Coke Dataset with Disturbance Recovery},
author={Dhabaria, Anjali and others},
year={2026},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/datasets/bencxr/merged_pour_coke_with_videos}}
}
Source Datasets
This is a merged version of:
- anjalidhabaria/pour_coke_static
- anjalidhabaria/pour_coke_perturbate_source
- anjalidhabaria/pour_coke_perturbate_target
License
Apache 2.0
Acknowledgments
Original datasets collected by Anjali Dhabaria. This merged version was created for disturbance recovery research in robotic manipulation.
Created
Dataset merged and prepared on 2026-01-31.