Habitat 3.0 Social Rearrangement β€” Independent Baseline (No Communication)

Trained weights for the independent policy baseline on the Social Rearrangement task from Habitat 3.0. Two embodied agents β€” a Boston Dynamics Spot robot and a humanoid β€” cooperate to rearrange objects across 37 HSSD scenes.

Agents can only observe each other's relative GPS position. They share no explicit messages, no shared context, no coordination protocol β€” just independent PPO policies learning to not get in each other's way.

For the learned communication variant that achieves 2.3x the task success, see hab3-social-rearrange-fabric.

This work is part of the thesis "Scalable Multi-Agent Coordination Using a Shared-Context Architecture for Embodied Robotics" by Benjamin Kubwimana.

What's in this repo

File Description
model.pth Final checkpoint after 100M frames (65MB)
training_curve.log Raw training log with per-update metrics

The checkpoint contains the full model state dict for both agents.

Task overview

Each episode drops the two agents into an HSSD home scene with a set of objects that need to be moved to goal locations. The task is structured as a PDDL planning problem with four subgoal stages:

  • Stage 1.1: Agent 0 (Spot) picks up its target object
  • Stage 1.2: Agent 0 places the object at the goal
  • Stage 2.1: Agent 1 (Humanoid) picks up its target object
  • Stage 2.2: Agent 1 places the object at the goal

Full success (pddl_success) requires both agents to complete all their subgoals within 750 timesteps.

Architecture

Both agents use a hierarchical RL policy:

  • High-level: Neural network (ResNet18 visual encoder β†’ 2-layer LSTM) that selects which skill to execute
  • Low-level: Oracle navigation + learned manipulation skills (pick, place, nav_to_obj, etc.)

The high-level policy is what gets trained. Low-level skills use privileged oracle information (perfect pathfinding, etc.).

Observations per agent:

  • Depth camera image
  • Binary is_holding flag
  • GPS+compass to object start/goal positions
  • Relative GPS to the other agent

Trainer: DD-PPO (Decentralized Distributed PPO) across 7 GPUs, 18 environments per GPU (126 parallel environments total).

Training details

Parameter Value
Total frames 100M
Updates 6,277
Batch size 128 steps Γ— 126 envs
Learning rate 2.5e-4
PPO epochs 1
Mini-batches 2
Clip param 0.2
Discount (Ξ³) 0.99
GAE (Ο„) 0.95
Entropy coef 0.0001
Max grad norm 0.2
Backbone ResNet18
RNN 2-layer LSTM
Trainable params ~8.4M per agent
Wall time ~40 hours (including restarts)
Throughput ~697 fps

Final metrics

Metric Value
Reward 15.5
Full task success 27.7%
Stage 1.1 (Spot picks) 85.6%
Stage 1.2 (Spot places) 46.4%
Stage 2.1 (Human picks) 81.6%
Stage 2.2 (Human places) 49.4%
Collision rate 29.5%
Cooperation reward +1.04

Training curve

Frames Reward Task success Collisions Cooperation
1.6M (upd 100) 1.8 0.0% 91% -0.42
8M (upd 500) 5.2 1.2% 68% -0.28
16M (upd 1000) 7.1 3.4% 58% -0.20
48M (upd 3000) 10.5 11% 47% +0.26
96M (upd 6000) 15.0 26% 30% +0.99
100M (upd 6277) 15.5 27.7% 29.5% +1.04

Cooperation reward flips from negative to positive around 25M frames β€” that's roughly when the agents stop bumping into each other and start implicitly coordinating.

How to evaluate

Requires Habitat 3.0 (v0.3.3) with habitat-baselines and habitat-sim installed.

python -u -m habitat_baselines.run \
    --config-name=social_rearrange/pop_play.yaml \
    habitat_baselines.evaluate=True \
    habitat_baselines.eval.should_load_ckpt=True \
    habitat_baselines.eval_ckpt_path_dir=model.pth \
    habitat_baselines.test_episode_count=50 \
    habitat_baselines.num_environments=1 \
    habitat.dataset.data_path=data/datasets/hab3_episodes/val/social_rearrange.json.gz \
    habitat.dataset.scenes_dir=data/scene_datasets \
    'habitat_baselines.eval.video_option=["disk"]'

Known quirks

The official hab3_episodes dataset has some inconsistencies:

  • ~39% of episodes reference objects in name_to_receptacle that aren't in the episode's rigid_objs list
  • Some episode spawn positions don't land on the navmesh

We patched kinematic_relationship_manager.py and rearrange_sim.py to handle these gracefully (skip missing objects, fallback to random navigable points). Without these patches, training will crash intermittently on certain episodes.

Citation

@mastersthesis{kubwimana2026scalable,
  title  = {Scalable Multi-Agent Coordination Using a Shared-Context Architecture for Embodied Robotics},
  author = {Kubwimana, Benjamin},
  year   = {2026},
  school = {Georgia Institute of Technology},
  note   = {Baseline model: \url{https://huggingface.co/edge-inference/hab3-social-rearrange-baseline}}
}

Built on the Habitat 3.0 platform:

@inproceedings{puig2023habitat3,
  title     = {Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots},
  author    = {Puig, Xavier and Undersander, Eric and Szot, Andrew and Cote, Mikael Dallaire and Batra, Dhruv and Berges, Vincent-Pierre and others},
  booktitle = {ICLR},
  year      = {2024}
}

License

MIT. The underlying Habitat platform and HSSD scenes have their own licenses β€” see the Habitat 3.0 repo for details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Dataset used to train edge-inference/hab3-social-rearrange-baseline