File size: 2,441 Bytes
e233680
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
license: cc-by-nc-4.0
task_categories:
  - image-segmentation
tags:
  - glass-surface-detection
  - rgb-d
  - scene-understanding
  - pytorch
pretty_name: RGBD-GSD-Net (RGB-D Glass Surface Detection Network)
---

# RGBD-GSD-Net — RGB-D Glass Surface Detection Network

Pre-trained weights for the model introduced in:

> **Leveraging RGB-D Data with Cross-Modal Context Mining for Glass Surface Detection**  
> Jiaying Lin\*, Yuen-Hei Yeung\*, Shuquan Ye, Rynson W. H. Lau  
> AAAI 2025  
> [arXiv](https://arxiv.org/abs/2206.11250) · [Project Page](https://jiaying.link/aaai2025-rgbdglass/) · [Dataset (RGBD-GSD)](https://huggingface.co/datasets/garrying/RGBD-GSD)

## Model Summary

RGBD-GSD-Net detects glass surfaces by jointly processing RGB images and depth maps. It introduces two novel modules:

- **Cross-Modal Context Mining (CCM)**: adaptively learns individual and mutual context features from RGB and depth information.
- **Depth-Missing Aware Attention (DAA)**: explicitly exploits spatial locations where depth is missing (a strong indicator of glass surfaces) to guide detection.

The backbone is a ResNeXt encoder shared across both modalities.

| File | Description |
|------|-------------|
| `best.pth` | Best checkpoint (204 MB), saved as `{'model': state_dict, ...}` |
| `results/our_best_results.zip` | Model predictions on the RGBD-GSD test set |

## Loading the Weights

```python
import torch
from networks.your_network import RGBDGlassNet   # from the code release

model = RGBDGlassNet()
checkpoint = torch.load("best.pth", map_location="cpu")
model.load_state_dict(checkpoint["model"])
model.eval()
```

Download the checkpoint:
```bash
huggingface-cli download garrying/RGBD-GSD-Net best.pth --local-dir ./weights
```

## Training Dataset

This model was trained and evaluated on **RGBD-GSD**, the first large-scale RGB-D glass surface detection dataset:
- 3,009 RGB-D images with binary glass surface masks and depth maps
- Available at [garrying/RGBD-GSD](https://huggingface.co/datasets/garrying/RGBD-GSD)

## Citation

```bibtex
@article{aaai2025_rgbdglass,
  author    = {Lin, Jiaying and Yeung, Yuen-Hei and Ye, Shuquan and Lau, Rynson W.H.},
  title     = {Leveraging RGB-D Data with Cross-Modal Context Mining for Glass Surface Detection},
  journal   = {AAAI},
  year      = {2025},
}
```

## License

Non-commercial use only — [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).