KIEE - The Transactions of the Korean Institute of Electrical Engineers

Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers

QR CODE : The Transactions of the Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

The Transactions of the Korean Institute of Electrical Engineers

ISO Journal TitleTrans. Korean. Inst. Elect. Eng.

SCImago Journal & Country Rank

Main Menu

Journal Search

XML PDF INFO REF


Title	WINter-ViT : Window Interaction Vision Transformer with Head-Aware Attention
Authors	김주명(Ju-Myung Kim) ; 김재혁(Jae-Hyeok Kim) ; 박소윤(So-Yun Park) ; 유진우(Jin-Woo Yoo)
DOI	https://doi.org/10.5370/KIEE.2025.74.9.1581
Page	pp.1581-1590
ISSN	1975-8359
Keywords	Image Classification; Vision Transformer; Computer Vision; Deep Learning
Abstract	While the Swin Transformer effectively reduces computational cost using window-based attention, it struggles to model global dependencies across windows. Prior work, such as the Refined Transformer, attempts to overcome this limitation by incorporating CBAM-style channel and spatial attention mechanisms. However, these sequential attention operations often introduce representational bias by overemphasizing specific features. To address this, we propose two key components: (1) the Efficient Head Self-Attention (EHSA) module, which dynamically calibrates the relative contribution of each attention head within a window, and (2) the Hierarchical Local-to-Global Spatial Attention (HLSA) module, which captures long-range interactions across windows in a hierarchical manner. By integrating these into a Swin-T backbone, our architecture improves both local detail modeling and global context aggregation. Experiments on ImageNet-1K and ImageNet100 demonstrate that our model surpasses the Refined Transformer and other window-based approaches in accuracy, while maintaining a comparable level of computational efficiency. These results validate the effectiveness of our design in enhancing local-global interactions within Vision Transformers.

© KIEE All right's reserved

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution and reproduction in any medium, provided the original work is property cited.