Title |
Design of Tensor Core Archiecture for Softmax Acceleration |
Authors |
김성우(Sungwoo Kim) ; 노원우(Won Woo Ro) |
DOI |
https://doi.org/10.5573/ieie.2025.62.8.3 |
Keywords |
GPU; Tensor core; Softmax |
Abstract |
Large Language Models (LLMs) require increasing computational resources as they process more tokens, with Multi-Head Attention (MHA) intensifying the demand for matrix multiplication and softmax operations. While GPUs utilize Tensor Cores for efficient matrix multiplication, softmax remains reliant on CUDA Cores, leading to performance bottlenecks due to thread synchronization overhead and lower computational throughput.This paper proposes Soft-Tensor Core (Soft-TC), a novel approach that integrates a Lookup Table (LUT) and reciprocal unit into Tensor Cores to accelerate softmax computation. By leveraging Tensor Cores’ high throughput and warp-level execution, Soft-TC enhances both exponential function processing and summation efficiency. This optimization extends GPU acceleration beyond matrix multiplication, significantly improving LLM inference and training performance. |