Papers
arxiv:2107.06278

Per-Pixel Classification is Not All You Need for Semantic Segmentation

Published on Jul 13, 2021
Authors:
,
,

Abstract

A unified mask classification model, MaskFormer, achieves superior performance for both semantic and panoptic segmentation tasks, especially when the number of classes is large.

AI-generated summary

Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this observation, we propose MaskFormer, a simple mask classification model which predicts a set of binary masks, each associated with a single global class label prediction. Overall, the proposed mask classification-based method simplifies the landscape of effective approaches to semantic and panoptic segmentation tasks and shows excellent empirical results. In particular, we observe that MaskFormer outperforms per-pixel classification baselines when the number of classes is large. Our mask classification-based method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2107.06278
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 49

Browse 49 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2107.06278 in a dataset README.md to link it from this page.

Spaces citing this paper 170

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.