FaceXFormer: A Unified Transformer for Facial Analysis

37
citations
#73
in ICCV 2025
of 2701 papers
4
Top Authors
8
Data Points

Abstract

In this work, we introduce FaceXFormer, an end-to-end unified transformer model capable of performing ten facial analysis tasks within a single framework. These tasks include face parsing, landmark detection, head pose estimation, attribute prediction, age, gender, and race estimation, facial expression recognition, face recognition, and face visibility. Traditional face analysis approaches rely on task-specific architectures and pre-processing techniques, limiting scalability and integration. In contrast, FaceXFormer employs a transformer-based encoder-decoder architecture, where each task is represented as a learnable token, enabling seamless multi-task processing within a unified model. To enhance efficiency, we introduce FaceX, a lightweight decoder with a novel bi-directional cross-attention mechanism, which jointly processes face and task tokens to learn robust and generalized facial representations. We train FaceXFormer on ten diverse face perception datasets and evaluate it against both specialized and multi-task models across multiple benchmarks, demonstrating state-of-the-art or competitive performance. Additionally, we analyze the impact of various components of FaceXFormer on performance, assess real-world robustness in "in-the-wild" settings, and conduct a computational performance evaluation. To the best of our knowledge, FaceXFormer is the first model capable of handling ten facial analysis tasks while maintaining real-time performance at 33.21 FPS. Code: https://github.com/Kartik-3004/facexformer

Citation History

Jan 24, 2026
0
Jan 26, 2026
0
Jan 26, 2026
0
Jan 27, 2026
0
Feb 3, 2026
36+36
Feb 13, 2026
37+1
Feb 13, 2026
37
Feb 13, 2026
37