ROMAN β Scene Graph Node Relevance Classification
Given a 3D scene graph and a natural-language navigation constraint (e.g., "avoid the kitchen"), classify each node as relevant or non-relevant for cost-function generation used by a PRM path planner.
Models in This Repo
V12 GNN Models (SAGEConv / GCNConv)
Lightweight GNN models with enriched categorical features on train_ready_v4_dedup.jsonl.
| Model | Path | Params | Test F1 | HO F1 |
|---|---|---|---|---|
| SAGEConv | v12/SAGE/best.pt |
1.25M | 0.924 | 0.949 |
| GCNConv | v12/GCN/best.pt |
0.80M | 0.921 | 0.946 |
Features (1262 dims): spatial(10) + node_type(3) + floor(4) + material(15) + affordance(78) + parent_room_emb(384) + label_emb(384) + tiled_instr(384)
V11-Token ModernBERT Models
ModernBERT-base backbone with per-token classification (Provence-style text serialization).
| Model | Path | Trainable Params | Test F1 | HO F1 |
|---|---|---|---|---|
| LoRA r=8 | v11t/lora_r8/model_best.pt |
1.7M | 0.910 | 0.949 |
| LoRA r=16 | v11t/lora_r16/model_best.pt |
3.4M | 0.897 | 0.951 |
| Full fine-tune | v11t/full/model_best.pt |
149.0M | 0.907 | 0.958 |
Dataset
dataset/v4_dedup_enriched_1024/ β Pre-tokenized HF Dataset (ModernBERT tokenizer, enriched text mode, max_length=1024)
Source: train_ready_v4_dedup.jsonl β 7,911 records, 88 Matterport3D scenes, 783K nodes (7.6% relevant)
Quick Start
Download
from huggingface_hub import snapshot_download
# Download entire repo
snapshot_download("Catkamakura/roman-scene-graph", local_dir="roman-scene-graph")
# Or download specific files
from huggingface_hub import hf_hub_download
# V12 SAGE model only
hf_hub_download("Catkamakura/roman-scene-graph", "v12/SAGE/best.pt", local_dir=".")
# V11T full fine-tune only
hf_hub_download("Catkamakura/roman-scene-graph", "v11t/full/model_best.pt", local_dir=".")
V12 GNN Inference
import torch
import text_encoders
from SceneGraphDatasetV4E import SceneGraphDatasetV4E
from train_v12_sage_vs_gcn import SceneGraphSAGE, forward_with_tiled_instr
model = SceneGraphSAGE(in_channels=1262, hidden_channels_arr=[256]*3, out_channels=64)
model.load_state_dict(torch.load("v12/SAGE/best.pt", weights_only=True))
model.eval()
te = text_encoders.DictTextEncoder("embeddings/sentence-transformers/all-MiniLM-L6-v2_embeddings_False.pkl")
ds = SceneGraphDatasetV4E("your_input.jsonl", text_encoder=te, include_parent_room=True)
ds.encode_all_node_features()
ds.all_graphs_make_x()
with torch.no_grad():
graph = ds[0]
scores = forward_with_tiled_instr(model, graph, "cpu")
probs = torch.sigmoid(scores)
relevant = probs > 0.5
V11T ModernBERT Inference
import torch
from datasets import Dataset
from model_modernbert import SceneGraphModernBERT
model = SceneGraphModernBERT(backbone="answerdotai/ModernBERT-base", train_mode="full")
model.load_state_dict(torch.load("v11t/full/model_best.pt", weights_only=True))
model.eval()
dataset = Dataset.load_from_disk("dataset/v4_dedup_enriched_1024")
Training
# V12 GNN
python train_v12_sage_vs_gcn.py \
--model both \
--dataset_path training_data_v2/train_ready_v4_dedup.jsonl \
--dictFile embeddings/sentence-transformers/all-MiniLM-L6-v2_embeddings_False.pkl
# V11T ModernBERT
./run_v11_token_v4.sh
Code Dependencies
| File | Purpose |
|---|---|
SceneGraphDatasetV4E.py |
Enriched dataset loader (floor/material/affordance) |
train_v12_sage_vs_gcn.py |
V12 training + SceneGraphSAGE model |
model_v2.py |
SceneGraphGCNv2 model |
model_modernbert.py |
SceneGraphModernBERT model |
SceneGraphDatasetBERT.py |
BERT dataset loader |
text_encoders/ |
DictTextEncoder for pre-computed embeddings |
loss_v2.py |
Loss functions |
embeddings/sentence-transformers/all-MiniLM-L6-v2_embeddings_False.pkl |
Pre-computed embeddings |