File size: 2,812 Bytes
a88d290
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
license: cc-by-nc-4.0
task_categories:
  - image-segmentation
tags:
  - glass-surface-detection
  - semantic-segmentation
  - scene-understanding
  - pytorch
pretty_name: GlassSemNet (Glass Semantic Network)
---

# GlassSemNet — Glass Semantic Network

Pre-trained weights for **GlassSemNet**, introduced in:

> **Exploiting Semantic Relations for Glass Surface Detection**  
> Jiaying Lin, Yuen-Hei Yeung, Rynson W. H. Lau  
> NeurIPS 2022  
> [Paper](https://openreview.net/forum?id=WrIrYMCZgbb) · [Project Page](https://jiaying.link/neurips2022-gsds/) · [Dataset (GSD-S)](https://huggingface.co/datasets/garrying/GSD-S)

## Model Summary

GlassSemNet detects glass surfaces by exploiting semantic relations between the glass region and its surrounding scene context. It uses a dual-backbone design:

- **Spatial backbone (SegFormer)**: extracts multi-scale spatial features.
- **Semantic backbone (ResNet-50 + DeepLabV3+)**: encodes 43-class semantic scene features into compact per-class encodings.
- **Semantic-Aware Attention (SAA)**: fuses spatial and semantic features at three scales using the semantic encodings as guidance.
- **Cross-modal Context Aggregation (CCA)**: aggregates cross-scale context at the deepest level.
- **UPerNet decoder**: combines the fused multi-scale features into the final glass surface prediction.

| File | Description |
|------|-------------|
| `GlassSemNet.pth` | Best checkpoint (917 MB), saved as a raw `state_dict` |

## Loading the Weights

```python
import torch
from model.GlassSemNet import GlassSemNet   # from the code release

model = GlassSemNet()
state_dict = torch.load("GlassSemNet.pth", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()
```

Download the checkpoint:
```bash
huggingface-cli download garrying/GlassSemNet GlassSemNet.pth --local-dir ./weights
```

## Inference

```bash
python predict.py -c GlassSemNet.pth -i /path/to/images/ -o /path/to/output/
```

Images are resized to **384 × 384** internally. Predictions are post-processed with CRF refinement and thresholded to produce binary glass surface masks.

## Training Dataset

This model was trained and evaluated on **GSD-S**, the first glass surface detection dataset with semantic annotations:

- 4,519 images (3,511 train / 1,008 test) with binary glass masks, instance segmentation maps, and 43-class semantic labels
- Available at [garrying/GSD-S](https://huggingface.co/datasets/garrying/GSD-S)

## Citation

```bibtex
@article{neurips2022:gsds2022,
  author    = {Lin, Jiaying and Yeung, Yuen-Hei and Lau, Rynson W.H.},
  title     = {Exploiting Semantic Relations for Glass Surface Detection},
  journal   = {NeurIPS},
  year      = {2022},
}
```

## License

Non-commercial use only — [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).