Hartwig Adam

Affiliations

Google DeepMind

papers

3,514

total citations

papers (21)

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

ECCV 2020arXiv

789

citations

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

CVPR 2020arXiv

662

citations

MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

CVPR 2021arXiv

597

citations

VideoPoet: A Large Language Model for Zero-Shot Video Generation

ICML 2024arXiv

420

citations

VIP-DeepLab: Learning Visual Perception With Depth-Aware Video Panoptic Segmentation

CVPR 2021arXiv

165

citations

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation

ECCV 2020arXiv

118

citations

MnasFPN: Learning Latency-Aware Pyramid Architecture for Object Detection on Mobile Devices

CVPR 2020arXiv

citations

Improving Zero-Shot Generalization and Robustness of Multi-Modal Models

CVPR 2023arXiv

citations

TubeFormer-DeepLab: Video Mask Transformer

CVPR 2022arXiv

citations

Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

CVPR 2021arXiv

citations

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

NEURIPS 2023arXiv

citations

Distilling Vision-Language Models on Millions of Videos

CVPR 2024arXiv

citations

k-Means Mask Transformer

ECCV 2022arXiv

citations

Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision

CVPR 2022arXiv

citations

Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset

ECCV 2022arXiv

citations

Unified Visual Relationship Detection with Vision and Language Models

ICCV 2023arXiv

citations

Hartwig Adam

Affiliations

papers (21)

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

VideoPoet: A Large Language Model for Zero-Shot Video Generation

VIP-DeepLab: Learning Visual Perception With Depth-Aware Video Panoptic Segmentation

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation

Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

View-Invariant Probabilistic Embedding for Human Pose

Adaptive Transformers for Robust Few-Shot Cross-Domain Face Anti-Spoofing

VideoPrism: A Foundational Visual Encoder for Video Understanding

MnasFPN: Learning Latency-Aware Pyramid Architecture for Object Detection on Mobile Devices

Improving Zero-Shot Generalization and Robustness of Multi-Modal Models

TubeFormer-DeepLab: Video Mask Transformer

Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

Distilling Vision-Language Models on Millions of Videos

k-Means Mask Transformer

Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision

Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset

Unified Visual Relationship Detection with Vision and Language Models

papers (21)

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

VideoPoet: A Large Language Model for Zero-Shot Video Generation

VIP-DeepLab: Learning Visual Perception With Depth-Aware Video Panoptic Segmentation

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation

Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

View-Invariant Probabilistic Embedding for Human Pose

Adaptive Transformers for Robust Few-Shot Cross-Domain Face Anti-Spoofing

VideoPrism: A Foundational Visual Encoder for Video Understanding

MnasFPN: Learning Latency-Aware Pyramid Architecture for Object Detection on Mobile Devices

Improving Zero-Shot Generalization and Robustness of Multi-Modal Models

TubeFormer-DeepLab: Video Mask Transformer

Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

Distilling Vision-Language Models on Millions of Videos

k-Means Mask Transformer

Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision

Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset

Unified Visual Relationship Detection with Vision and Language Models