Soldier-Offier Window self-Attention (SOWA)

Description

Visual anomaly detection is critical in industrial manufacturing, but traditional methods often rely on extensive normal datasets and custom models, limiting scalability. Recent advancements in large-scale visual-language models have significantly improved zero/few-shot anomaly detection. However, these approaches may not fully utilize hierarchical features, potentially missing nuanced details. We introduce a window self-attention mechanism based on the CLIP model, combined with learnable prompts to process multi-level features within a Soldier-Offier Window selfAttention (SOWA) framework. Our method has been tested on five benchmark datasets, demonstrating superior performance by leading in 18 out of 20 metrics compared to existing state-of-the-art techniques.

Installation

Pip

# clone project
git clone https://github.com/huzongxiang/sowa
cd sowa

# [OPTIONAL] create conda environment
conda create -n sowa python=3.9
conda activate sowa

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Conda

# clone project
git clone https://github.com/huzongxiang/sowa
cd sowa

# create conda environment and install dependencies
conda env create -f environment.yaml -n sowa

# activate conda environment
conda activate sowa

How to run

Train model with default configuration

# train on CPU
python src/train.py trainer=cpu data=sowa_visa model=sowa_hfwa

# train on GPU
python src/train.py trainer=gpu data=sowa_visa model=sowa_hfwa

Results

Comparisons with few-shot (K=4) anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.

Metric	Dataset	WinCLIP	April-GAN	Ours
AC AUROC	MVTec-AD	95.2±1.3	92.8±0.2	96.8±0.3
	Visa	87.3±1.8	92.6±0.4	92.9±0.2
	BTAD	87.0±0.2	92.1±0.2	94.8±0.2
	DAGM	93.8±0.2	96.2±1.1	98.9±0.3
	DTD-Synthetic	98.1±0.2	98.5±0.1	99.1±0.0
AC AP	MVTec-AD	97.3±0.6	96.3±0.1	98.3±0.3
	Visa	88.8±1.8	94.5±0.3	94.5±0.2
	BTAD	86.8±0.0	95.2±0.5	95.5±0.7
	DAGM	83.8±1.1	86.7±4.5	95.2±1.7
	DTD-Synthetic	99.1±0.1	99.4±0.0	99.6±0.0
AS AUROC	MVTec-AD	96.2±0.3	95.9±0.0	95.7±0.1
	Visa	97.2±0.2	96.2±0.0	97.1±0.0
	BTAD	95.8±0.0	94.4±0.1	97.1±0.0
	DAGM	93.8±0.1	88.9±0.4	96.9±0.0
	DTD-Synthetic	96.8±0.2	96.7±0.0	98.7±0.0
AS AUPRO	MVTec-AD	89.0±0.8	91.8±0.1	92.4±0.2
	Visa	87.6±0.9	90.2±0.1	91.4±0.0
	BTAD	66.6±0.2	78.2±0.1	81.2±0.2
	DAGM	82.4±0.3	77.8±0.9	94.4±0.1
	DTD-Synthetic	90.1±0.5	92.2±0.0	96.6±0.1

Performance Comparison on MVTec-AD and Visa Datasets.

Method	Source	MVTec-AD AC AUROC	MVTec-AD AS AUROC	MVTec-AD AS PRO	Visa AC AUROC	Visa AS AUROC	Visa AS PRO
SPADE	arXiv 2020	84.8±2.5	92.7±0.3	87.0±0.5	81.7±3.4	96.6±0.3	87.3±0.8
PaDiM	ICPR 2021	80.4±2.4	92.6±0.7	81.3±1.9	72.8±2.9	93.2±0.5	72.6±1.9
PatchCore	CVPR 2022	88.8±2.6	94.3±0.5	84.3±1.6	85.3±2.1	96.8±0.3	84.9±1.4
WinCLIP	CVPR 2023	95.2±1.3	96.2±0.3	89.0±0.8	87.3±1.8	97.2±0.2	87.6±0.9
April-GAN	CVPR 2023 VAND workshop	92.8±0.2	95.9±0.0	91.8±0.1	92.6±0.4	96.2±0.0	90.2±0.1
PromptAD	CVPR 2024	96.6±0.9	96.5±0.2	-	89.1±1.7	97.4±0.3	-
InCTRL	CVPR 2024	94.5±1.8	-	-	87.7±1.9	-	-
SOWA	Ours	96.8±0.3	95.7±0.1	92.4±0.2	92.9±0.2	97.1±0.0	91.4±0.0

Comparisons with few-shot anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.

Visualization

Visualization results under the few-shot setting (K=4).

Mechanism

Hierarchical Results on MVTec-AD Dataset. A set of images showing the real outputs of the model, illustrating how different layers (H1 to H4) process various feature modes. Each row represents a different sample, with columns showing the original image, segmentation mask, heatmap, and feature outputs from H1 to H4, and fusion.

Inference Speed

Inference performance comparison of different methods on a single NVIDIA RTX3070 8GB GPU.

Citation

Please cite the following paper if this work helps your project:

@article{hu2024sowa,
  title={SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection},
  author={Hu, Zongxiang and Zhang, zhaosheng},
  journal={arXiv preprint arXiv:2407.03634},
  year={2024}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for zongxiang/sowa

SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection

Paper • 2407.03634 • Published Jul 4, 2024