Abstract

Teaser image


Deep learning has led to remarkable strides in scene understanding with panoptic segmentation emerging as a key holistic scene interpretation task. However, the performance of panoptic segmentation is severely impacted in the presence of out-of-distribution (OOD) objects i.e. categories of objects that deviate from the training distribution. To overcome this limitation, we propose Panoptic Out-of Distribution Segmentation for joint pixel-level semantic in-distribution and out-of-distribution classification with instance prediction. We extend two established panoptic segmentation benchmarks, Cityscapes and BDD100K, with out-of-distribution instance segmentation annotations, propose suitable evaluation metrics, and present multiple strong baselines. Importantly, we propose the novel PoDS architecture with a shared backbone, an OOD contextual module for learning global and local OOD object cues, and dual symmetrical decoders with task-specific heads that employ our alignment-mismatch strategy for better OOD generalization. Combined with our data augmentation strategy, this approach facilitates progressive learning of out-of-distribution objects while maintaining in-distribution performance. We perform extensive evaluations that demonstrate that our proposed PoDS network effectively addresses the main challenges and substantially outperforms the baselines.

What is Panoptic Out-of-Distribution Segmentation?

Overview of our task

Recent advances in deep learning have substantially improved the capabilities of autonomous systems to interpret their surroundings. Central to these advancements is panoptic segmentation, which integrates semantic segmentation with instance segmentation, providing a holistic understanding of the environment. However, a significant challenge is that these models yield overconfident predictions of object categories out of the distribution they were trained on, known as out-of-distribution (OOD) objects. Segmenting these OOD objects poses a major challenge as they can vary significantly in appearance and semantics, include fine-grained details, and share visual characteristics with in-distribution objects, leading to ambiguity. Moreover, learning to jointly segment both OOD objects and in-distribution categories is extremely challenging. Given the potential consequences of autonomous systems malfunctioning due to unexpected inputs, it is crucial to ensure the safe and robust deployment.

To directly address these challenges at the task level, we introduce panoptic out-of-distribution segmentation that focuses on holistic scene understanding while effectively segmenting OOD objects. The proposed task that aims to predict both the semantic segmentation of stuff classes and instance segmentation of thing classes as well as an OOD class. An object is considered OOD if it is not present in the training distribution but appears in the testing/deployment stages. Thus, Panoptic out-of-distribution segmentation aims to assign each pixel \(i\) of an input image to an output pair \((c_i, \kappa_i) \in (C \cup O) \times N\). Here, \(C\) denotes known semantic classes, while \(O\) represents the out-of-distribution class, such that \(C \cap O = \emptyset\), and \(N\) is the total number of instances. \(C\) is further divided into stuff labels \(C^S\) (e.g., sidewalks) and thing labels \(C^T\) (e.g., pedestrians). In this task, the variable \(c_i\) can be a semantic or OOD class, and \(\kappa_i\) indicates the corresponding instance ID. For stuff classes, \(\kappa_i\) is not applicable.



Technical Approach

Overview of our task

As the first novel approach to addressing the task of Panoptic Out-of-Distribution Segmentation, we propose the PoDS architecture. PoDS builds on top of a base panoptic segmentation network that has a shared backbone and task-specific decoders (purple) by incorporating modules specially designed to embed out-of-distribution capabilities based on prior knowledge of in-distribution classes. We incorporate an OOD contextual module (blue) that complements the robust in-distribution semantic features of the shared backbone with both global discriminatory and fine local OOD object representations. Subsequently, we introduce an additional task-specific decoder (green), equipped with dynamic modules, alongside the existing ones. This design allows for adaptive integration of OOD features while preserving the in-distribution features of the high-performing base panoptic network. The unique dual task-specific decoder configuration benefits further from our novel alignment-mismatch loss. This loss encourages learning finer details within in-distribution semantic classes and what lies outside by balancing consensus and divergence between the two heads. Furthermore, we incorporate a data augmentation strategy to facilitate the training of our novel modules. Please refer to our paper for more details.

Video

Code

Coming soon...

Publications

If you find our work useful, please consider citing our paper:

Rohit Mohan, Kiran Kumaraswamy, Juana Valeria Hurtado, Kürsat Petek, and Abhinav Valada
Panoptic Out-of-Distribution Segmentation

(PDF) (BibTeX)

Authors

Acknowledgment

This work was funded by the German Research Foundation Emmy Noether Program grant number 468878300 and an academic grant from NVIDIA.