IEIE - Journal of the Institute of Electronics and Information Engineers

Mobile QR Code

Main Menu

Journal Search


Title	Design and Performance Analysis of a Cross-attention Transformer Model for Single-person 3D Keypoint Detection
Authors	신인영(In-Yeong Shin) ; 이승호(Seung-Ho Lee)
DOI	https://doi.org/10.5573/ieie.2026.63.4.78
Page	pp.78-83
ISSN	2287-5026
Keywords	Feature fusion; 3d; Pose estimation; Deep learning; Transformers
Abstract	In this paper, we propose a novel Spatio-Temporal Feature Fusion Transformer model based on a cross-attention mechanism to simultaneously maximize the accuracy and computational efficiency of single-person 3D pose estimation. Conventional 2D keypoint-based approaches often suffer from instability in 3D reconstruction due to noise and jittering inherent in the input data. To address this issue, we introduce the Discrete Cosine Transform (DCT) as a preprocessing step. By filtering out unnecessary high-frequency components from the time-series data, this method effectively suppresses noise and ensures the temporal continuity of motion. Furthermore, we utilize a spatial transformer to embed the geometric relationships between human joints into vectors and apply a cross-attention structure to integrate these with temporal features. This fusion model enhances estimation precision by learning spatial structural information and temporal dynamic information in a mutually complementary manner. Consequently, this study aims to demonstrate that the synergy between frequency-domain preprocessing and the spatio-temporal integrated attention mechanism leads to significant improvements in both the robustness and performance of single-person 3D pose estimation.

Copyright © IEIE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.