Spaces:

weijiang99
/

SpatialBench

Running

App Files Files Community

SpatialBench / README.md

weijiang99

Upload folder using huggingface_hub

5906d8c verified 13 days ago

preview code

raw

history blame contribute delete

831 Bytes

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

metadata

title: SpatialBench
emoji: 🧩
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.23.3
app_file: app.py
pinned: true
short_description: Do LLMs Build Spatial World Models? Evidence from Maze Tasks

SpatialBench

Evaluation platform for "Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks" (ICLR 2026 Workshop).

Three tasks probe whether LLMs construct internal spatial representations:

Task	Type	Description
Maze Navigation	Planning	Find shortest path from start to goal
Sequential Point Reuse	Reasoning	Q3 = Q0 — do models reuse earlier computation?
Compositional Distance	Reasoning	Compose corner→center distances for Q2

Models evaluated: Gemini 2.5 Flash, GPT-5 Mini, Claude Haiku 4.5, DeepSeek Chat.