• 대한전기학회
Mobile QR Code QR CODE : The Transactions of the Korean Institute of Electrical Engineers
  • COPE
  • kcse
  • 한국과학기술단체총연합회
  • 한국학술지인용색인
  • Scopus
  • crossref
  • orcid

References

1 
A. Krizhevsky, I. Sutskever and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, 2012. DOI:10.1145/3065386DOI
2 
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, Sept. 2014. DOI:10.48550/arXiv.1409.1556DOI
3 
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, “Going deeper with convolutions,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, 2015. DOI:10.1109/CVPR.2015.7298594DOI
4 
K. He, X. Zhang, S. Ren and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016. DOI:10.1109/CVPR.2016.90DOI
5 
G. Huang, Z. Liu, L. van der Maaten and K. Q. Weinberger, “Densely connected convolutional networks,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4700-4708, 2017. DOI:10.1109/CVPR.2017.243DOI
6 
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017. DOI:10.48550/arXiv.1706.03762DOI
7 
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit and N. Houlsby, “An image is worth 16×16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, Oct. 2020. DOI:10.48550/arXiv.2010.11929.DOI
8 
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” Proc. IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 10012-10022, 2021. DOI:10.1109/ICCV48922.2021.00988DOI
9 
D. Yu and J. Yu, “A study on improving image classification performance using Vision Transformer with window attention in a refined feature space,” The Transactions of the Korean Institute of Electrical Engineers, vol. 73, no. 6, pp. 1004-1011, 2024. DOI:10.5370/KIEE.2024.73.6.1004DOI
10 
S. Woo, J. Park, J. Y. Lee and I. S. Kweon, “CBAM: Convolutional block attention module,” Proc. European Conf. on Computer Vision (ECCV), pp. 3-19, 2018. DOI:10.1007/978-3-030-01234-2_1DOI
11 
Y. Si, H. Xu, X. Zhu, W. Zhang, Y. Dong, Y. Chen and H. Li, “SCSA: Exploring the synergistic effects between spatial and channel attention,” Neurocomputing, vol. 634, pp. 129866, Jun. 2025. DOI:10.1016/j.neucom.2024.129866DOI
12 
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 248-255, 2009. DOI:10.1109/CVPR.2009.5206848DOI
13 
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles and H. Jégou, “Training data-efficient image transformers & distillation through attention,” Proc. Int. Conf. on Machine Learning (ICML), pp. 10347-10357, Jul. 2021. DOI:10.48550/arXiv.2012.12877DOI
14 
L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z. H. Jiang, F. E. H. Tay, J. Feng and S. Yan, “Tokens-to-token ViT: Training vision transformers from scratch on ImageNet,” Proc. IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 558-567, 2021. DOI:10.1109/ICCV48922.2021.00061DOI
15 
K. Han, A. Xiao, E. Wu, J. Guo, C. Xu and Y. Wang, “Transformer in transformer,” Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 15908-15919, Dec. 2021. DOI:10.48550/arXiv.2103.00112DOI
16 
J. Guo, K. Han, H. Wu, Y. Tang, X. Chen, Y. Wang and C. Xu, “CMT: Convolutional neural networks meet vision transformers,” Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 12175-12185, 2022. DOI:10.1109/CVPR52688.2022.01187DOI
17 
H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan and L. Zhang, “CvT: Introducing convolutions to vision transformers,” Proc. IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 22-31, 2021. DOI:10.1109/ICCV48922.2021.00011DOI
18 
X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, L. Yuan, D. Chen and B. Guo, “CSWin Transformer: A general vision transformer backbone with cross-shaped windows,” Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 12124-12134, 2022. DOI:10.1109/CVPR52688.2022.01180DOI
19 
S. Wu, T. Wu, H. Tan and G. Guo, “Pale Transformer: A general vision transformer backbone with pale-shaped attention,” Proc. AAAI Conf. on Artificial Intelligence, vol. 36, no. 3, pp. 2731-2739, Jun. 2022. DOI:10.1609/aaai.v36i3.20244.DOI
20 
Q. Zhang, Y. Xu, J. Zhang and D. Tao, “VSA: Learning varied-size window attention in vision transformers,” Proc. European Conf. on Computer Vision (ECCV), pp. 466-483, Oct. 2022. DOI:10.1007/978-3-031-19806-9_27DOI
21 
T. Yu, G. Zhao, P. Li and Y. Yu, “BOAT: Bilateral local attention vision transformer,” arXiv preprint arXiv:2201.13027, Jan. 2022. DOI:10.48550/arXiv.2201.13027DOI
22 
J. Hu, L. Shen and G. Sun, “Squeeze-and-excitation networks,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 7132-7141, 2018. DOI:10.1109/CVPR.2018.00745DOI
23 
Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo and Q. Hu, “ECA-Net: Efficient channel attention for deep convolutional neural networks,” Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 11534-11542, 2020. DOI:10.1109/CVPR42600.2020.01155DOI
24 
W. Xu and Y. Wan, “ELA: Efficient local attention for deep convolutional neural networks,” arXiv preprint arXiv:2403.01123, Mar. 2024. DOI:10.48550/arXiv.2403.01123DOI
25 
Y. Xu, Q. Zhang, J. Zhang and D. Tao, “ViTAE: Vision transformer advanced by exploring intrinsic inductive bias,” Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 28522-28535, Dec. 2021. DOI:10.48550/arXiv.2106.03348DOI
26 
W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” Proc. IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 568-578, 2021. DOI:10.1109/ICCV48922.2021.00062DOI
27 
C. F. R. Chen, Q. Fan and R. Panda, “CrossViT: Cross-attention multi-scale vision transformer for image classification,” Proc. IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 357-366, 2021. DOI:10.1109/ICCV48922.2021.00042DOI
28 
X. Chu, Z. Tian, Y. Wang, B. Zhang, H. Ren, X. Wei, H. Xia and C. Shen, “Twins: Revisiting the design of spatial attention in vision transformers,” Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 9355-9366, Dec. 2021. DOI:10.48550/arXiv.2104.13840DOI
29 
J. Yang, C. Li, P. Zhang, X. Dai, B. Xiao, L. Yuan and J. Gao, “Focal self-attention for local-global interactions in vision transformers,” arXiv preprint arXiv:2107.00641, Jul. 2021. DOI:10.48550/arXiv.2107.00641DOI
30 
X. Pan, T. Ye, Z. Xia, S. Song and G. Huang, “Slide-transformer: Hierarchical vision transformer with local self-attention,” Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2082-2091, 2023. DOI:10.1109/CVPR52729.2023.00213DOI
31 
J. Fang, L. Xie, X. Wang, X. Zhang, W. Liu and Q. Tian, “MSG-Transformer: Exchanging local spatial information by manipulating messenger tokens,” Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 12063-12072, 2022. DOI:10.1109/CVPR52688.2022.01177DOI