Tingman nielsr HF Staff commited on
Commit
d1657c1
·
verified ·
1 Parent(s): 89719c2

Enhance model card with pipeline tag, project page, and comprehensive details (#1)

Browse files

- Enhance model card with pipeline tag, project page, and comprehensive details (ffc485eaf9312d06574cf8b259d7986383f7d893)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +57 -6
README.md CHANGED
@@ -1,14 +1,65 @@
1
  ---
2
  license: gpl-3.0
 
3
  ---
4
- #MatchAttention
5
 
6
- Paper: https://arxiv.org/abs/2510.14260
7
 
8
- MatchAttention is a contiguous and differentiable sliding-window attention mechanism that enables long-range connection, explict matching, and linear complexity.
9
 
10
- When applied to stereo matching and optical flow, real-time and state-of-the-arts performance can be achieved.
11
 
12
- ## How to Get Started with the Model
13
 
14
- Check out https://github.com/TingmanYan/MatchAttention
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: gpl-3.0
3
+ pipeline_tag: image-to-image
4
  ---
 
5
 
6
+ # MatchAttention: Matching the Relative Positions for High-Resolution Cross-View Matching
7
 
8
+ [📄 Paper](https://arxiv.org/abs/2510.14260) - [🌐 Project Page](https://tingmanyan.github.io/MatchAttention/) - [🤗 Demo](https://huggingface.co/spaces/Tingman/MatchStereo) - [💻 Code](https://github.com/TingmanYan/MatchAttention)
9
 
10
+ MatchAttention is a contiguous and differentiable sliding-window attention mechanism that enables long-range connection, explicit matching, and linear complexity. When applied to stereo matching and optical flow, real-time and state-of-the-art performance can be achieved.
11
 
12
+ ## Introduction
13
 
14
+ This paper proposes an attention mechanism, MatchAttention, that dynamically matches relative positions. The relative position determines the attention sampling center of the key-value pairs given a query. Continuous and differentiable sliding-window attention sampling is achieved by the proposed BilinearSoftmax. The relative positions are iteratively updated through residual connections across layers by embedding them into the feature channels. Since the relative position is exactly the learning target for cross-view matching, an efficient hierarchical cross-view decoder, MatchDecoder, is designed with MatchAttention as its core component. To handle cross-view occlusions, gated cross-MatchAttention and a consistency-constrained loss are proposed. These two components collectively mitigate the impact of occlusions in both forward and backward passes, allowing the model to focus more on learning matching relationships. When applied to stereo matching, MatchStereo-B ranked 1st in average error on the public Middlebury benchmark and requires only 29ms for KITTI-resolution inference. MatchStereo-T can process 4K UHD images in 0.1 seconds using only 3GB of GPU memory. The proposed models also achieve state-of-the-art performance on KITTI 2012, KITTI 2015, ETH3D, and Spring flow datasets. The combination of high accuracy and low computational complexity makes real-time, high-resolution, and high-accuracy cross-view matching possible.
15
+
16
+ ## Key Features
17
+
18
+ * **Efficient and Scalable**: Achieves linear complexity, enabling high-resolution image processing with low GPU memory. MatchStereo-T can process 4K UHD images in 0.1 seconds using only 3GB of GPU memory.
19
+ * **State-of-the-Art Performance**: Ranked 1st in average error on the Middlebury benchmark for stereo matching and achieves state-of-the-art results on KITTI 2012, KITTI 2015, ETH3D, and Spring flow datasets.
20
+ * **Explainable Occlusion Handling**: Introduces gated cross-MatchAttention and a consistency-constrained loss to effectively mitigate the impact of occlusions, allowing the model to focus on learning robust matching relationships.
21
+
22
+ ## Model Weights
23
+
24
+ The following pre-trained model weights are available:
25
+
26
+ | Model | Params | Resolution | FLOPs | GPU Mem | Latency | Checkpoint |
27
+ | :------------ | :----- | :----------- | :---- | :------ | :------ | :--------------------------------------------------------------------------------------------------------- |
28
+ | MatchStereo-T | 8.78M | 1536x1536 | 0.34T | 1.45G | 38ms | [Hugging Face](https://huggingface.co/Tingman/MatchAttention/blob/main/matchstereo_tiny_fsd.pth) |
29
+ | MatchStereo-S | 25.2M | 1536x1536 | 0.98T | 1.73G | 45ms | [Hugging Face](https://huggingface.co/Tingman/MatchAttention/blob/main/matchstereo_small_fsd.pth) |
30
+ | MatchStereo-B | 75.5M | 1536x1536 | 3.59T | 2.94G | 75ms | [Hugging Face](https://huggingface.co/Tingman/MatchAttention/blob/main/matchstereo_base_fsd.pth) |
31
+ | MatchFlow-B | 75.5M | 1536x1536 | 3.60T | 3.22G | 77ms | [Hugging Face](https://huggingface.co/Tingman/MatchAttention/blob/main/matchflow_base_sintel.pth) |
32
+
33
+ ## Sample Usage
34
+
35
+ To get started with inference using the provided scripts, follow the instructions on the [GitHub repository](https://github.com/TingmanYan/MatchAttention).
36
+ Here's an example for running stereo matching on custom images from the command line:
37
+
38
+ ```bash
39
+ # Clone the repository and install dependencies first as per the GitHub README.
40
+ # Then navigate to the MatchAttention directory.
41
+
42
+ # Run stereo matching on custom images
43
+ python run_img.py --img0_dir images/left/ --img1_dir images/right/ --output_path outputs --checkpoint_path checkpoints/matchstereo_tiny_fsd.pth --no_compile
44
+ ```
45
+ For other tasks like optical flow, testing on benchmarks, or running the local Gradio demo, refer to the [GitHub repository](https://github.com/TingmanYan/MatchAttention).
46
+
47
+ ## Citation
48
+
49
+ If you find our work helpful, please cite our paper:
50
+ ```bibtex
51
+ @article{yan2025matchattention,
52
+ title={MatchAttention: Matching the Relative Positions for High-Resolution Cross-View Matching},
53
+ author={Tingman Yan and Tao Liu and Xilian Yang and Qunfei Zhao and Zeyang Xia},
54
+ journal={arXiv preprint arXiv:2510.14260},
55
+ year={2025}
56
+ }
57
+ ```
58
+
59
+ ## Acknowledgement
60
+
61
+ We would like to thank the authors of [UniMatch](https://github.com/autonomousvision/unimatch), [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo), [MetaFormer](https://github.com/sail-sg/metaformer), and [TransNeXt](https://github.com/DaiShiResearch/TransNeXt) for their code release. Thanks to the author of [FoundationStereo](https://github.com/NVlabs/FoundationStereo) for the release of the FSD dataset.
62
+
63
+ ## Contact
64
+
65
+ Please reach out to [Tingman Yan](mailto:tingmanyan@dlut.edu.cn) for questions.