IEIE - Journal of the Institute of Electronics and Information Engineers

Mobile QR Code

Main Menu


Title	Design of Tensor Core Archiecture for Softmax Acceleration
Authors	김성우(Sungwoo Kim) ; 노원우(Won Woo Ro)
DOI	https://doi.org/10.5573/ieie.2025.62.8.3
Page	pp.3-9
ISSN	2287-5026
Keywords	GPU; Tensor core; Softmax
Abstract	Large Language Models (LLMs) require increasing computational resources as they process more tokens, with Multi-Head Attention (MHA) intensifying the demand for matrix multiplication and softmax operations. While GPUs utilize Tensor Cores for efficient matrix multiplication, softmax remains reliant on CUDA Cores, leading to performance bottlenecks due to thread synchronization overhead and lower computational throughput.This paper proposes Soft-Tensor Core (Soft-TC), a novel approach that integrates a Lookup Table (LUT) and reciprocal unit into Tensor Cores to accelerate softmax computation. By leveraging Tensor Cores’ high throughput and warp-level execution, Soft-TC enhances both exponential function processing and summation efficiency. This optimization extends GPU acceleration beyond matrix multiplication, significantly improving LLM inference and training performance.

IEIEJournal of
the Institute of Electronics and Information Engineers