| --- |
| title: Civic Pulse — Crowd Counting |
| emoji: 👥 |
| colorFrom: blue |
| colorTo: indigo |
| sdk: docker |
| pinned: false |
| --- |
| |
| # Civic Pulse — Tactical Crowd Intelligence |
|
|
| A full-stack AI drone monitoring dashboard built on **P2PNet** (ICCV 2021). |
| FastAPI backend + React/Vite frontend with real-time WebSocket video streaming. |
|
|
| ## 🚀 Live Deployment (Free Tier) |
|
|
| | Component | Platform | URL | |
| |-----------|----------|-----| |
| | **Frontend** | Vercel | `https://crowd-counting.vercel.app` | |
| | **Backend API** | FastAPI (Docker/HuggingFace Spaces) | `Set your deployed backend URL in frontend/.env.production` | |
| | **Model Weights** | HuggingFace Hub | `Set HF_WEIGHTS_REPO to your deployed weights repo` | |
|
|
| > ⚠️ The HF Space may sleep after 15 min of inactivity. Open the app 30 seconds before your demo. |
|
|
| ## ⚡ Quick Local Setup |
|
|
| ```bash |
| # Backend |
| pip install -r requirements.txt |
| uvicorn api:app --reload |
| |
| # Frontend (in another terminal) |
| cd frontend |
| npm install |
| npm run dev |
| ``` |
|
|
| --- |
|
|
| # P2PNet (ICCV2021 Oral Presentation) |
|
|
| This repository contains codes for the official implementation in PyTorch of **P2PNet** as described in [Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework](https://arxiv.org/abs/2107.12746). |
| |
| A brief introduction of P2PNet can be found at [机器之心 (almosthuman)](https://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==&mid=2650827826&idx=3&sn=edd3d66444130fb34a59d08fab618a9e&chksm=84e5a84cb392215a005a3b3424f20a9d24dc525dcd933960035bf4b6aa740191b5ecb2b7b161&mpshare=1&scene=1&srcid=1004YEOC7HC9daYRYeUio7Xn&sharer_sharetime=1633675738338&sharer_shareid=7d375dccd3b2f9eec5f8b27ee7c04883&version=3.1.16.5505&platform=win#rd). |
|
|
| The codes is tested with PyTorch 1.5.0. It may not run with other versions. |
|
|
| ## Visualized demos for P2PNet |
| <img src="vis/congested1.png" width="1000"/> |
| <img src="vis/congested2.png" width="1000"/> |
| <img src="vis/congested3.png" width="1000"/> |
|
|
| ## The network |
| The overall architecture of the P2PNet. Built upon the VGG16, it firstly introduce an upsampling path to obtain fine-grained feature map. |
| Then it exploits two branches to simultaneously predict a set of point proposals and their confidence scores. |
|
|
| <img src="vis/net.png" width="1000"/> |
|
|
| ## Comparison with state-of-the-art methods |
| The P2PNet achieved state-of-the-art performance on several challenging datasets with various densities. |
|
|
| | Methods | Venue | SHTechPartA <br> MAE/MSE |SHTechPartB <br> MAE/MSE | UCF_CC_50 <br> MAE/MSE | UCF_QNRF <br> MAE/MSE | |
| |:----:|:----:|:----:|:----:|:----:|:----:| |
| CAN | CVPR'19 | 62.3/100.0 | 7.8/12.2 | 212.2/**243.7** | 107.0/183.0 | |
| Bayesian+ | ICCV'19 | 62.8/101.8 | 7.7/12.7 | 229.3/308.2 | 88.7/154.8 | |
| S-DCNet | ICCV'19 | 58.3/95.0 | 6.7/10.7 | 204.2/301.3 | 104.4/176.1 | |
| SANet+SPANet | ICCV'19 | 59.4/92.5 | 6.5/**9.9** | 232.6/311.7 | -/- | |
| DUBNet | AAAI'20 | 64.6/106.8 | 7.7/12.5 | 243.8/329.3 | 105.6/180.5 | |
| SDANet | AAAI'20 | 63.6/101.8 | 7.8/10.2 | 227.6/316.4 | -/- | |
| ADSCNet | CVPR'20 | <u>55.4</u>/97.7 | <u>6.4</u>/11.3 | 198.4/267.3 | **71.3**/**132.5**| |
| ASNet | CVPR'20 | 57.78/<u>90.13</u> | -/- | <u>174.84</u>/<u>251.63</u> | 91.59/159.71 | |
| AMRNet | ECCV'20 | 61.59/98.36 | 7.02/11.00 | 184.0/265.8 | 86.6/152.2 | |
| AMSNet | ECCV'20 | 56.7/93.4 | 6.7/10.2 | 208.4/297.3 | 101.8/163.2| |
| DM-Count | NeurIPS'20 | 59.7/95.7 | 7.4/11.8 | 211.0/291.5 | 85.6/<u>148.3</u>| |
| **Ours** |- | **52.74**/**85.06** | **6.25**/**9.9** | **172.72**/256.18 | <u>85.32</u>/154.5 | |
| |
| Comparison on the [NWPU-Crowd](https://www.crowdbenchmark.com/resultdetail.html?rid=81) dataset. |
| |
| | Methods | MAE[O] |MSE[O] | MAE[L] | MAE[S] | |
| |:----:|:----:|:----:|:----:|:----:| |
| MCNN | 232.5|714.6 | 220.9|1171.9 | |
| SANet | 190.6 | 491.4 | 153.8 | 716.3| |
| CSRNet | 121.3 | 387.8 | 112.0 | <u>522.7</u> | |
| PCC-Net | 112.3 | 457.0 | 111.0 | 777.6 | |
| CANNet | 110.0 | 495.3 | 102.3 | 718.3| |
| Bayesian+ | 105.4 | 454.2 | 115.8 | 750.5 | |
| S-DCNet | 90.2 | 370.5 | **82.9** | 567.8 | |
| DM-Count | <u>88.4</u> | 388.6 | 88.0 | **498.0** | |
| **Ours** | **77.44**|**362** | <u>83.28</u>| 553.92 | |
| |
| The overall performance for both counting and localization. |
| |
| |nAP$_{\delta}$|SHTechPartA| SHTechPartB | UCF_CC_50 | UCF_QNRF | NWPU_Crowd | |
| |:----:|:----:|:----:|:----:|:----:|:----:| |
| $\delta=0.05$ | 10.9\% | 23.8\% | 5.0\% | 5.9\% | 12.9\% | |
| $\delta=0.25$ | 70.3\% | 84.2\% | 54.5\% | 55.4\% | 71.3\% | |
| $\delta=0.50$ | 90.1\% | 94.1\% | 88.1\% | 83.2\% | 89.1\% | |
| $\delta=\{{0.05:0.05:0.50}\}$ | 64.4\% | 76.3\% | 54.3\% | 53.1\% | 65.0\% | |
|
|
| Comparison for the localization performance in terms of F1-Measure on NWPU. |
|
|
| | Method| F1-Measure |Precision| Recall | |
| |:----:|:----:|:----:|:----:| |
| FasterRCNN | 0.068 | 0.958 | 0.035 | |
| TinyFaces | 0.567 | 0.529 | 0.611 | |
| RAZ | 0.599 | 0.666 | 0.543| |
| Crowd-SDNet | 0.637 | 0.651 | 0.624 | |
| PDRNet | 0.653 | 0.675 | 0.633 | |
| TopoCount | 0.692 | 0.683 | **0.701** | |
| D2CNet | <u>0.700</u> | **0.741** | 0.662 | |
| **Ours** |**0.712** | <u>0.729</u> | <u>0.695</u> | |
|
|
| ## Installation |
| * Clone this repo into a directory named P2PNET_ROOT |
| * Organize your datasets as required |
| * Install Python dependencies. We use python 3.6.5 and pytorch 1.5.0 |
| ``` |
| pip install -r requirements.txt |
| ``` |
| |
| ## Organize the counting dataset |
| We use a list file to collect all the images and their ground truth annotations in a counting dataset. When your dataset is organized as recommended in the following, the format of this list file is defined as: |
| ``` |
| train/scene01/img01.jpg train/scene01/img01.txt |
| train/scene01/img02.jpg train/scene01/img02.txt |
| ... |
| train/scene02/img01.jpg train/scene02/img01.txt |
| ``` |
| |
| ### Dataset structures: |
| ``` |
| DATA_ROOT/ |
| |->train/ |
| | |->scene01/ |
| | |->scene02/ |
| | |->... |
| |->test/ |
| | |->scene01/ |
| | |->scene02/ |
| | |->... |
| |->train.list |
| |->test.list |
| ``` |
| DATA_ROOT is your path containing the counting datasets. |
| |
| ### Annotations format |
| For the annotations of each image, we use a single txt file which contains one annotation per line. Note that indexing for pixel values starts at 0. The expected format of each line is: |
| ``` |
| x1 y1 |
| x2 y2 |
| ... |
| ``` |
|
|
| ## Training |
|
|
| The network can be trained using the `train.py` script. For training on SHTechPartA, use |
|
|
| ``` |
| CUDA_VISIBLE_DEVICES=0 python train.py --data_root $DATA_ROOT \ |
| --dataset_file SHHA \ |
| --epochs 3500 \ |
| --lr_drop 3500 \ |
| --output_dir ./logs \ |
| --checkpoints_dir ./weights \ |
| --tensorboard_dir ./logs \ |
| --lr 0.0001 \ |
| --lr_backbone 0.00001 \ |
| --batch_size 8 \ |
| --eval_freq 1 \ |
| --gpu_id 0 |
| ``` |
| By default, a periodic evaluation will be conducted on the validation set. |
|
|
| ## Testing |
|
|
| A trained model (with an MAE of **51.96**) on SHTechPartA is available at "./weights", run the following commands to launch a visualization demo: |
|
|
| ``` |
| CUDA_VISIBLE_DEVICES=0 python run_test.py --weight_path ./weights/SHTechA.pth --output_dir ./logs/ |
| ``` |
|
|
| ## Civic Pulse Application |
|
|
| The supported application stack in this repository is: |
|
|
| - FastAPI backend in `api.py` |
| - React/Vite frontend in `frontend/` |
|
|
| The application loads `weights/SHTechA.pth` by default. |
|
|
| For this application, use the pretrained P2PNet weights directly for inference. Manual point-labeling and one-image fine-tuning are not part of the recommended workflow because they are too small and unstable to improve model quality. |
|
|
| ## Acknowledgements |
|
|
| - Part of codes are borrowed from the [C^3 Framework](https://github.com/gjy3035/C-3-Framework). |
| - We refer to [DETR](https://github.com/facebookresearch/detr) to implement our matching strategy. |
|
|
|
|
| ## Citing P2PNet |
|
|
| If you find P2PNet is useful in your project, please consider citing us: |
|
|
| ```BibTeX |
| @inproceedings{song2021rethinking, |
| title={Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework}, |
| author={Song, Qingyu and Wang, Changan and Jiang, Zhengkai and Wang, Yabiao and Tai, Ying and Wang, Chengjie and Li, Jilin and Huang, Feiyue and Wu, Yang}, |
| journal={Proceedings of the IEEE/CVF International Conference on Computer Vision}, |
| year={2021} |
| } |
| ``` |
|
|
| ## Related works from Tencent Youtu Lab |
| - [AAAI2021] To Choose or to Fuse? Scale Selection for Crowd Counting. ([paper link](https://ojs.aaai.org/index.php/AAAI/article/view/16360) & [codes](https://github.com/TencentYoutuResearch/CrowdCounting-SASNet)) |
| - [ICCV2021] Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting. ([paper link](https://arxiv.org/abs/2107.12619) & [codes](https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet)) |
|
|