Habitat 3.0 Social Rearrangement — Independent Baseline (No Communication)

Trained weights for the independent policy baseline on the Social Rearrangement task from Habitat 3.0. Two embodied agents — a Boston Dynamics Spot robot and a humanoid — cooperate to rearrange objects across 37 HSSD scenes.

Agents can only observe each other's relative GPS position. They share no explicit messages, no shared context, no coordination protocol — just independent PPO policies learning to not get in each other's way.

For the learned communication variant that achieves 2.3x the task success, see hab3-social-rearrange-fabric.

This work is part of the thesis "Scalable Multi-Agent Coordination Using a Shared-Context Architecture for Embodied Robotics" by Benjamin Kubwimana.

What's in this repo

File	Description
`model.pth`	Final checkpoint after 100M frames (65MB)
`training_curve.log`	Raw training log with per-update metrics

The checkpoint contains the full model state dict for both agents.

Task overview

Each episode drops the two agents into an HSSD home scene with a set of objects that need to be moved to goal locations. The task is structured as a PDDL planning problem with four subgoal stages:

Stage 1.1: Agent 0 (Spot) picks up its target object
Stage 1.2: Agent 0 places the object at the goal
Stage 2.1: Agent 1 (Humanoid) picks up its target object
Stage 2.2: Agent 1 places the object at the goal

Full success (pddl_success) requires both agents to complete all their subgoals within 750 timesteps.

Architecture

Both agents use a hierarchical RL policy:

High-level: Neural network (ResNet18 visual encoder → 2-layer LSTM) that selects which skill to execute
Low-level: Oracle navigation + learned manipulation skills (pick, place, nav_to_obj, etc.)

The high-level policy is what gets trained. Low-level skills use privileged oracle information (perfect pathfinding, etc.).

Observations per agent:

Depth camera image
Binary is_holding flag
GPS+compass to object start/goal positions
Relative GPS to the other agent

Trainer: DD-PPO (Decentralized Distributed PPO) across 7 GPUs, 18 environments per GPU (126 parallel environments total).

Training details

Parameter	Value
Total frames	100M
Updates	6,277
Batch size	128 steps × 126 envs
Learning rate	2.5e-4
PPO epochs	1
Mini-batches	2
Clip param	0.2
Discount (γ)	0.99
GAE (τ)	0.95
Entropy coef	0.0001
Max grad norm	0.2
Backbone	ResNet18
RNN	2-layer LSTM
Trainable params	~8.4M per agent
Wall time	~40 hours (including restarts)
Throughput	~697 fps

Final metrics

Metric	Value
Reward	15.5
Full task success	27.7%
Stage 1.1 (Spot picks)	85.6%
Stage 1.2 (Spot places)	46.4%
Stage 2.1 (Human picks)	81.6%
Stage 2.2 (Human places)	49.4%
Collision rate	29.5%
Cooperation reward	+1.04

Training curve

Frames	Reward	Task success	Collisions	Cooperation
1.6M (upd 100)	1.8	0.0%	91%	-0.42
8M (upd 500)	5.2	1.2%	68%	-0.28
16M (upd 1000)	7.1	3.4%	58%	-0.20
48M (upd 3000)	10.5	11%	47%	+0.26
96M (upd 6000)	15.0	26%	30%	+0.99
100M (upd 6277)	15.5	27.7%	29.5%	+1.04

Cooperation reward flips from negative to positive around 25M frames — that's roughly when the agents stop bumping into each other and start implicitly coordinating.

How to evaluate

Requires Habitat 3.0 (v0.3.3) with habitat-baselines and habitat-sim installed.

python -u -m habitat_baselines.run \
    --config-name=social_rearrange/pop_play.yaml \
    habitat_baselines.evaluate=True \
    habitat_baselines.eval.should_load_ckpt=True \
    habitat_baselines.eval_ckpt_path_dir=model.pth \
    habitat_baselines.test_episode_count=50 \
    habitat_baselines.num_environments=1 \
    habitat.dataset.data_path=data/datasets/hab3_episodes/val/social_rearrange.json.gz \
    habitat.dataset.scenes_dir=data/scene_datasets \
    'habitat_baselines.eval.video_option=["disk"]'

Known quirks

The official hab3_episodes dataset has some inconsistencies:

~39% of episodes reference objects in name_to_receptacle that aren't in the episode's rigid_objs list
Some episode spawn positions don't land on the navmesh

We patched kinematic_relationship_manager.py and rearrange_sim.py to handle these gracefully (skip missing objects, fallback to random navigable points). Without these patches, training will crash intermittently on certain episodes.

Citation

@mastersthesis{kubwimana2026scalable,
  title  = {Scalable Multi-Agent Coordination Using a Shared-Context Architecture for Embodied Robotics},
  author = {Kubwimana, Benjamin},
  year   = {2026},
  school = {Georgia Institute of Technology},
  note   = {Baseline model: \url{https://huggingface.co/edge-inference/hab3-social-rearrange-baseline}}
}

Built on the Habitat 3.0 platform:

@inproceedings{puig2023habitat3,
  title     = {Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots},
  author    = {Puig, Xavier and Undersander, Eric and Szot, Andrew and Cote, Mikael Dallaire and Batra, Dhruv and Berges, Vincent-Pierre and others},
  booktitle = {ICLR},
  year      = {2024}
}

License

MIT. The underlying Habitat platform and HSSD scenes have their own licenses — see the Habitat 3.0 repo for details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

edge-inference
/

hab3-social-rearrange-baseline