| --- |
| license: cc-by-4.0 |
| library_name: pytorch |
| tags: |
| - computer-vision |
| - object-tracking |
| - spiking-neural-networks |
| - visual-streaming-perception |
| - energy-efficient |
| - cvpr-2025 |
| pipeline_tag: object-detection |
| --- |
| |
| # ViStream: Law-of-Charge-Conservation Inspired Spiking Neural Network for Visual Streaming Perception |
|
|
| **ViStream** is a novel energy-efficient framework for Visual Streaming Perception (VSP) that leverages Spiking Neural Networks (SNNs) with Law of Charge Conservation (LoCC) properties. |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| - **Developed by:** Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He |
| - **Model type:** Spiking Neural Network for Visual Streaming Perception |
| - **Language(s):** PyTorch implementation |
| - **License:** CC-BY-4.0 |
| - **Paper:** [CVPR 2025](https://openaccess.thecvf.com/content/CVPR2025/papers/You_VISTREAM_Improving_Computation_Efficiency_of_Visual_Streaming_Perception_via_Law-of-Charge-Conservation_CVPR_2025_paper.pdf) |
| - **Repository:** [GitHub](https://github.com/Intelligent-Computing-Research-Group/ViStream) |
|
|
| ### Model Architecture |
|
|
| ViStream introduces two key innovations: |
| 1. **Law of Charge Conservation (LoCC)** property in ST-BIF neurons |
| 2. **Differential Encoding (DiffEncode)** scheme for temporal optimization |
|
|
| The framework achieves significant computational reduction while maintaining accuracy equivalent to ANN counterparts. |
|
|
| ## Uses |
|
|
| ### Direct Use |
|
|
| ViStream can be directly used for: |
| - **Multiple Object Tracking (MOT)** |
| - **Single Object Tracking (SOT)** |
| - **Video Object Segmentation (VOS)** |
| - **Multiple Object Tracking and Segmentation (MOTS)** |
| - **Pose Tracking** |
|
|
| ### Downstream Use |
|
|
| The model can be fine-tuned for various visual streaming perception tasks in: |
| - Autonomous driving |
| - UAV navigation |
| - AR/VR applications |
| - Real-time surveillance |
|
|
| ## Bias, Risks, and Limitations |
|
|
| ### Limitations |
| - Requires specific hardware optimization for maximum energy benefits |
| - Performance may vary with different frame rates |
| - Limited to visual perception tasks |
|
|
| ### Recommendations |
| - Test thoroughly on target hardware before deployment |
| - Consider computational constraints of edge devices |
| - Validate performance on domain-specific datasets |
|
|
| ## How to Get Started with the Model |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| import torch |
| |
| # Download the checkpoint |
| checkpoint_path = hf_hub_download( |
| repo_id="AndyBlocker/ViStream", |
| filename="checkpoint-90.pth" |
| ) |
| |
| # Load the model (requires ViStream implementation) |
| checkpoint = torch.load(checkpoint_path, map_location='cpu') |
| ``` |
|
|
| For complete usage examples, see the [GitHub repository](https://github.com/Intelligent-Computing-Research-Group/ViStream). |
|
|
| ## Training Details |
|
|
| ### Training Data |
|
|
| The model was trained on multiple datasets for various visual streaming perception tasks including object tracking, video object segmentation, and pose tracking. |
|
|
| ### Training Procedure |
|
|
| **Training Details:** |
| - Framework: PyTorch |
| - Optimization: Energy-efficient SNN training with Law of Charge Conservation |
| - Architecture: ResNet-based backbone with spike quantization layers |
|
|
| ## Evaluation |
|
|
| The model demonstrates competitive performance across multiple visual streaming perception tasks while achieving significant energy efficiency improvements compared to traditional ANN-based approaches. Detailed evaluation results are available in the [CVPR 2025 paper](https://openaccess.thecvf.com/content/CVPR2025/papers/You_VISTREAM_Improving_Computation_Efficiency_of_Visual_Streaming_Perception_via_Law-of-Charge-Conservation_CVPR_2025_paper.pdf). |
|
|
| ## Model Card Authors |
|
|
| Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He |
|
|
| ## Model Card Contact |
|
|
| For questions about this model, please open an issue in the [GitHub repository](https://github.com/Intelligent-Computing-Research-Group/ViStream). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{you2025vistream, |
| title={VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network}, |
| author={You, Kang and Wei, Ziling and Yan, Jing and Zhang, Boning and Guo, Qinghai and Zhang, Yaoyu and He, Zhezhi}, |
| booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference}, |
| pages={8796--8805}, |
| year={2025} |
| } |
| ``` |