MoghaddasiIraj1
                     NamByeong-Gyu1
               
                  - 
                           
                        (Department of Computer Science and Engineering, Chungnam National University, Daejeon
                        305-764, Korea)
                        
 
            
            
            Copyright © The Institute of Electronics and Information Engineers(IEIE)
            
            
            
            
            
               
                  
Index Terms
               
                DNN accelerator,  lifetime,  inference engine,  aging,  NBTI,  safety-critical,  serial processing
             
            
          
         
            
                  I. Introduction	
               Deep neural networks (DNNs) are increasingly used in safety-critical disciplines like
                  automotive [1] and aerospace [2], where dependability is paramount due to the potential dire consequences of failures,
                  such as loss of human life or significant environmental harm. Among dependability
                  attributes, lifetime resilience [3] emerges as a primary concern of intelligent edge devices in safety-critical applications,
                  demanding particular attention. Meanwhile, the ever-increasing complexity of DNN models
                  poses challenges on edge devices with limited resources. Thereby, the primary focus
                  of previous research on DNN deployment has been enhancing computation efficiency,
                  such as through compression and quantization which reduces error resilience of special-purpose
                  accelerators by intrinsic redundancy elimination [4,5]. 
               
               Whereas continuous scaling of CMOS technology feature size has introduced high-performance
                  and computation-efficient processing platforms, it has significantly threatened the
                  lifetime resilience of modern DNN inference engines, primarily due to aging factors.
                  Bias Temperature Instability (BTI), including NBTI and PBTI, stands out as a dominant
                  aging factor in advanced CMOS technologies [6,7]. BTI causes timing degradation through continual electrical stress on transistors,
                  leading to performance reduction, timing errors, and eventual lifetime loss. Various
                  operational parameters such as temperature, voltage, and stress affect BTI degradation
                  [8,9]. The conventional method of conservative guard banding appears insufficient for modern
                  hardware accelerators due to performance loss by considering more than 20% safety
                  margin over the lifetime [10]. Nonetheless, state-of-the-art research focuses on enhancing the lifetime of DNN
                  accelerators through approximation techniques and temperature mitigation, but with
                  a slight reduction in accuracy [11]. Other leading research efforts have attempted to extend the lifetime of DNN accelerators
                  by reducing stress in activation and weight memories using aging-aware data encoding
                  [12], iterative power gating, and memory bank switching techniques [13]. Conversely, the appropriate data representation became an important research direction
                  in designing DNN accelerators [18]. Conventional number systems appear suboptimal for the design of specialized DNN
                  accelerators. Previously, diverse alternative number systems have been explored in
                  DNN accelerators [17]. 
               
               This research investigates aging mitigation for lifetime resilience enhancement by
                  reducing stress and increasing redundancies through employing Redundant Number System
                  (RDNS) [19]. Adapting RDNS for DNN acceleration, we have a serious obstacle of redundancies and
                  overheads arising from computing in RDNS. To manage the overheads, we propose the
                  Binary Signed Digit (BSD)-serial processing over bit-serial processing elements in
                  the conventional binary number system (BNS) [14-16]. In brief, serial processing can increase efficiency through 1) dynamic precision
                  adjustment, 2) computation active pruning, and 3) circuit design simplification. 
               
               Overall, we focus on combining the concepts of (a) serial processing and (b) computing
                  in RDNS to evaluate their collaborative impact on improving the lifetime resilience
                  of DNN accelerators. Herein, we evaluate the lifetime of BSD-serial PEs compared to
                  conventional bit-serial PEs operating in the BNS. Based on the literature, execution
                  elements are among the most timing-critical units for evaluating the overall lifetime
                  of a processor. [20]. To clarify the RDNS computing effectiveness for lifetime extension, we have used
                  a cross-layer workflow to evaluate BTI in the data path of the DNN accelerator. The
                  proposed approach jointly exploits input (activation and weight) and number-system
                  impacts on lifetime resilience. Experimental results demonstrate the proposed design
                  can improve lifetime resilience, via stress and degradation mitigation, while conserving
                  computational efficiency. Computing in RDNS can efficiently address the strict lifetime
                  constraints of safety-critical disciplines. 
               
               The major contributions can be summarized as:
               1) For the first time, we explored number-system and workload effects on BTI stress
                  of accelerators data path. Computing in RDNS contributes to 36% lower stress on average
                  for diverse workloads.
               
               2) We evaluated the lifetime of PEs considering all affecting factors, i.e., stress,
                  temperature, and voltage variations. BSD-serial processing causes an average 35.5%
                  higher lifetime in MTTF (mean time to failure) compared to the baseline.
               
               3) We introduced a cross-layer workflow to evaluate aging in the DNN accelerator data
                  path.
               
               The rest of this paper is organized as follows. Section II presents the preliminaries.
                  The proposed RDNS-based processing approach is overviewed in Section III. Section
                  IV clarifies the detailed architecture of processing elements. Section V presents
                  the experimental results and comparison over the baseline. Finally, Section VI concludes
                  the paper.
               
             
            
                  II. Background	
               In PMOS transistors, NBTI results in the threshold voltage shift increasing the critical
                  path delays and eventually causing timing errors. NBTI occurs when the PMOS transistors
                  are under stress with negative voltage bias ($V_{GS}=-V_{dd}$). Two major theories
                  for the NBTI process explanation are reaction-diffusion (RD) and trapping/de-trapping
                  (TD) models [8]. Both models explain NBTI as a 2-stage process including stress and recovery phases.
                  The long-term NBTI degradation model based on RD theory is described by [21]:
               
               
               where A depends on operational parameters such as temperature ($T$) and voltage ($V_{dd}$),
                  $K$ is a fitting parameter, $Y$ is stress, $t$ is service time and n is assumed between
                  $1/4$ and $1/6$ depending on the diffusing species [21]. In this work, we use RD for NBTI aging evaluation using $n=1/6$. According to [22], $A\left(T,\,\,V_{dd}\right)$ can be calculated as:
               
               
               where $t_{ox}$ is the effective oxide thickness and KB is the Boltzmann constant $(8.6\times
                  10-5eV/K),$ while $E_{0}=0.1897eV$and $B=0.075eV\,nm/V$ are fitting parameters. As
                  seen, the main workload factor in NBTI aging is stress which can be evaluated by internal
                  nodes’ duty cycles or signal probability ($SP$). For PBTI, $SP$ is the ratio of the
                  time with logic one at a gate input to total service time, while the probability of
                  0’s determines the NBTI-induced degradation in PMOS devices. Thus, both $SP$ and $1-SP$
                  are important to cover both NBTI and PBTI. These factors are given to the model to
                  estimate the delay degradation. 
               
               Also, the operating temperature of a chip can be calculated from [23]:
               
               
               where $T_{chip}$ is the average temperature, $T_{a}$ is the ambient temperature ($T_{a}=25^{\circ}C$),
                  $P_{tot}$ is the total power consumption, A (in $cm^{2}$) is the chip area, and $R_{\theta
                  }$ is the equivalent thermal resistance. 
               
               In this study, lifetime is defined and estimated as a period in which $\Delta V_{th}$
                  reaches to 10% of the initial nominal value [24].
               
             
            
                  III. BSD-serial Processing Approach	
               The target operation of PEs can be described by $\sum _{i=0}^{l-1}W_{i}\times A_{i}$,
                  where $W_{i}$ and $A_{i}$ represent weights and activations, respectively. Binary
                  serial PE is the baseline of this study as conventional Bit-serial. Compared to fix-precision
                  PEs, serial processing can improve performance by 2.33X on average via on-the-fly
                  per-layer bit precision adjustment [14]. Here, activations and weights arrive in binary bit-serial and bit-parallel, respectively.
                  In each cycle, a bit column of input activations is bitwise ANDed by corresponding
                  parallel weight bits to produce partial products. Then, partial products are fed to
                  a compressor to generate the sum of products. Fig. 1(a) shows the computing approach of conventional bit-serial architecture. Serial engine
                  process inputs in p-cycle length loops, where $p$ is the activations precision in
                  bits. In the first cycle of a loop, MSB bit of $l$ concurrent input activations are
                  ANDed, which produce $l$ p-bit terms. The compressor sums these terms into a partial
                  sum using an adder tree. For the remaining $p-1$ cycles of a phase, the accumulator
                  shifts the previous residual by one bit, while accumulating the new some of the product.
                  According to Fig. 1(b), the processing approach of BSD-serial PE is like Bit-serial while replacing BNS
                  with RDNS in input activations, partial products, partial sums, and generated output.
                  Herein, we keep radix = 2 with the corresponding redundant digit set $\left\{-1,0,1\right\}$.
                  In this regard, a digit is represented with a couple of positive and negative bits
                  with 0 or 1 and -1 or 0 arithmetic values, respectively. For example, $N~ =~ +11$
                  can be represented as $1011$ or $110\overline{1}$, where in Positive/ Negative representation,
                  it reads as ($N+=1011,$ $N-=1111$ or $N+=1100,$ $N-=1110$).
               
               The BSD-serial processing approach performs inference with a minimized latency (or
                  maximized frequency) independent of weight bit-precision, based on limited carry propagation
                  in RDNS adders. Fig. 2 illustrates an N-digit RDNS adder, featuring an overall architecture similar to carry-save
                  addition.
               
               
                     
                     
Fig. 1. Different processing approaches in accelerator PEs: (a) BNS Bit-serial; (b)
                        RDNS BSD-serial.
                     
                   
               
                     
                     
Fig. 2. N-digit RDNS adder based on carry-save addition.
                   
             
            
                  IV. Detailed Design of PEs	
               BSD-serial PE architecture is composed of 3 main subunits including an RDNS partial
                  product generator (R-PPG), an RDNS compressor (R-C), and an RDNS accumulator (R-AC).
                  Fig. 3 illustrates the overall BSD-serial architecture extending binary bit-serial PE to
                  RDNS with 16-bit precision activation and weight inputs. Activations arrive in RDNS
                  while weights are in binary 2’s complement. In R-C and R-A, binary adders are replaced
                  with Carry save adders which can decrease the latency by eliminating Carry propagation.
                  The R-PPG gets 16${\times}$16-bit synapse weights in binary and 16 BSD-serial activations.
                  The R-PPG generates 16 partial products each in 16-BSD, feeding the R-C in $X_{0}$
                  to $X_{15}$ inputs to generate compressed partial sum output. R-AC operates in 16-cycle
                  loops, starting with 0 residual, and accumulates compressed partial sums produced
                  by R-C. 
               
               
                     
                     
Fig. 3. Overall BSD-Serial PE Architecture.
                   
             
            
                  Fig. 4 summarizes the proposed cross-layer workflow including required tasks in different
                  layers for lifetime evaluation. In this context, we first describe the Bit-serial
                  and BSD-serial architectures in RTL and synthesize them using a 28 nm cell library
                  and 0.9 V nominal voltage to produce the netlist and standard delay format (SDF) files.
                  Table 1 demonstrates an overview of generated reports by the Design Compiler synthesis tool.
                  In parallel, different DNN models e.g., ResNet18 on the ImageNet dataset, are deployed
                  in Python to profile evaluation benchmarks including weights and activations. Next,
                  post-synthesis cycle-accurate simulations are conducted on benchmarks to produce timing
                  reports and value change dump (VCD) outputs. Then, VCDs are explored to generate power
                  and stress (signal probability) reports using a power analysis (Primetime) tool. After
                  that, the maximum temperature is estimated based on previously measured power and
                  area and using the Hotspot tool based on Eq. (3). Finally, we used Matlab to predict degradation and lifetime for different architectures
                  and workloads considering all factors.
               
               
                     
                     
Table 1. The Summary of synthesis results for one PE
                  
                  
                        
                           
                              | Feature \ Architecture | Bit-Serial | BSD-Serial | Improvement | 
                        
                              | Area (mm2)
                               | 0.002780 | 0.003720 | - 25 % | 
                        
                              | Leakage Power | 25 μW | 27.6 μW | - 10 % | 
                        
                              | Dynamic Power | 246 μW | 310 μW | - 26 % | 
                        
                              | Cycle Time (Latency) | 2.81 ns | 1.37 ns | + 51 % | 
                        
                              | Maximum Frequency | 355 MHz | 730 MHz | + 106 % | 
                        
                              | Max. Performance/Area | 127697 | 196236 | + 53 % | 
                     
                  
                
               
                     
                     
Fig. 4. The proposed cross-layer evaluation workflow.
                   
               This workflow is cross-layer because it analyzes interactions across multiple layers.
                  It mainly differs from conventional ASIC design flows because it adds BTI stress estimation
                  and lifetime prediction tasks. In this study, Lenet5 is utilized on the MNIST dataset
                  for the primary evaluation. VGG16 and ResNet18 are applied to the ImageNet dataset
                  for supplementary evaluation and to extend the experimental results.
               
               
                     1. Experimental Results
                  In this section, we will initially evaluate the impacts of architecture and number-system
                     effects on the internal stress of the PEs while running different DNN models. Fig. 5 illustrates the histogram of the stress ($SP$) variations for the target designs
                     running different DNNs. We can see that by computing in RDNS, the stress (and recovery)
                     phases have become more balanced. Moreover, average stress variations among 16 different
                     filters of ResNet18 for 10 different input images are illustrated in Fig. 6. Accordingly, on average BSD-serial architecture represents 36% lower stress compared
                     to conventional Bit-serial PE. As mentioned, both $SP$ and $1-SP$ are considered for
                     comparison. 
                  
                  Next, the efficacy of considering stress in aging and lifetime estimation is explored
                     over bit-serial and BSD-serial designs. For the sake of clarity, we have calculated
                     the $\mathrm{V}_{\mathrm{th}}$ degradation for successive runs of LeNet5, VGG16, and
                     ResNet18 on introduced PEs. We assumed the worst-case temperature and fixed $V_{dd}$
                     among PE matrices. In this regard, only stress varies a long DNN executions. According
                     to synthesis results, the temperature is almost equal between Bit-serial and BSD-serial
                     designs because the area and power almost equal increase considering (3). Fig. 7 illustrates NBTI-induced aging degradation when running different DNNs, in the form
                     of $\mathrm{V}_{\mathrm{th}}$ shift, which starts from 0 and increases up to 0.03
                     V (10% of initial value). Moreover, Fig. 8 illustrates the lifetime improvement in MTTF considering stress variations among
                     different designs. As seen, the BSD-serial design extends the lifetime by 35.5% compared
                     to the baseline Bit-serial due to balancing BTI stress.
                  
                  
                        
                        
Fig. 5. The histograms of stress variations among architectures running LeNet5, VGG16
                           (MNIST) and ResNet18 (ImageNet).
                        
                      
                  
                        
                        
Fig. 6. BTI stress variations among different architectures, filters, and input images
                           running ResNet-18.
                        
                      
                  
                        
                        
Fig. 7. NBTI degradation in target architectures running different DNN filters considering
                           stress variations.
                        
                      
                  
                        
                        
Fig. 8. Lifetime comparison among designs running LeNet5, VGG16, and ResNet18 filters
                           considering stress variations.
                        
                      
                
               
                     2. Discussion
                  Table 2 illustrates an overview of the evaluation results which are prepared by cycle-accurate
                     post-synthesis simulations running diverse DNN models and datasets. In this regard,
                     the BSD-serial design demonstrated a balanced BTI stress over the baselines in the
                     form of a 36% duty cycle or Signal Probability (SP) reduction. Based on the decrease
                     in stress, BSD-serial PE mitigated Vth degradation over the bit-parallel baseline
                     (7%), which caused almost 35.5% lifetime extension over the bit-serial baseline.
                  
                  
                        
                        
Table 2. Summary of simulation results
                     
                     
                           
                              
                                 | Architecture \ DNN | LeNet5 | VGG16 | ResNet18 | Overall | 
                           
                                 | BTI Stress - Duty Cycle (SP) | 
                           
                                 | Bit-serial | 0.75 | 0.70 | 0.68 | 0.71 | 
                           
                                 | BSD-serial | 0.52 | 0.52 | 0.53 | 0.52 | 
                           
                                 | Improvement | 37% | 35% | 33% | 36% | 
                           
                                 | Degradation (mV) during 10 years | 
                           
                                 | Bit-serial | 29.58 | 29.23 | 29.14 | 29.31 | 
                           
                                 | BSD-serial | 27.83 | 27.86 | 27.90 | 27.86 | 
                           
                                 | Improvement | - | - | - | 32 (%) | 
                           
                                 | Lifetime (year) | 
                           
                                 | Bit-serial | 10.89 | 11.69 | 11.92 | 11.50 | 
                           
                                 | BSD-serial | 15.69 | 15.61 | 15.47 | 15.59 | 
                           
                                 | Improvement | - | - | - | 35.5 (%) | 
                        
                     
                   
                
             
            
                  VI. Conclusion	
               Hardware accelerators have shown high computation efficiency, making them the first
                  choice for DNN acceleration on edge devices. Considering processing elements (PEs)
                  array as the heart of DNN accelerators with multiply-accumulate functionality, this
                  research combines computing in RDNS with a serial processing approach to extend the
                  lifetime resilience, investigating the BTI stress variations on the accelerator’s
                  data path. The proposed technique extended the lifetime by 35.5% compared to conventional
                  bit-serial PE by 36% reducing the stress on average. 
               
               For future research, we aim to extend computing in RDNS to emerging DNN accelerators
                  with systolic array architecture to improve lifetime resilience. Additionally, we
                  will explore operating voltage scaling, leveraging performance gained by computing
                  in RDNS to further improve lifetime resilience, and apply the BSD-serial computing
                  concept to systolic array-based DNN accelerators. Ultimately, we intend to expand
                  the RDNS capability in lifetime extension to support training and inference computing
                  phases.
               
             
          
         
            
                  ACKNOWLEDGMENTS
               
                  				This work was supported in part by the research fund of Chungnam National University,
                  and in part by the National Research Foundation of Korea (NRF) grant funded by the
                  Korean government (MSIT) (No. 2022R1A5A8026986).
                  			
               
             
            
                  
                     References
                  
                     
                        
                        S. Alcaide, L. Kosmidis, C. Hernandez, and J. Abella, “High-integrity gpu designs
                           for critical real-time automotive systems,” in 2019 Design, Automation & Test in Europe
                           Conference & Exhibition (DATE), 2019, pp. 824-829.

 
                     
                        
                        C. Adams, A. Spain, J. Parker, M. Hevert, J. Roach, and D. Cotten, “Towards an integrated
                           GPU accelerated SoC as a flight computer for small satellites,” in 2019 IEEE Aerospace
                           Conference, 2019, pp. 1-7.

 
                     
                        
                        I. Moghaddasi, S. Gorgin, and J.-A. Lee, “Dependable DNN Accelerator for Safety-critical
                           Systems: A Review on the Aging Perspective,” IEEE Access, 2023.

 
                     
                        
                        A. Arunachalam, S. Kundu, A. Rahat, S. Banerjee, and K. Basu, “Fault Resilience of
                           DNN Accelerators for Compressed Sensor Inputs,” in 2022 IEEE Computer Society Annual
                           Symposium on VLSI (ISVLSI), 2022, pp. 329-332.

 
                     
                        
                        M. Riera, J. M. Arnau, and A. Gonzalez, “DNN pruning with principal component analysis
                           and connection importance estimation,” Journal of Systems Architecture, vol. 122,
                           p. 102336, 2022.

 
                     
                        
                        D. S. Huang et al., “Comprehensive device and product level reliability studies on
                           advanced CMOS technologies featuring 7nm high-k metal gate FinFET transistors,” in
                           2018 IEEE International Reliability Physics Symposium (IRPS), 2018, pp. 6F-7.

 
                     
                        
                        C. Liu et al., “Systematical study of 14nm FinFET reliability: From device level stress
                           to product HTOL,” in 2015 IEEE International Reliability Physics Symposium, 2015,
                           pp. 2F-3.

 
                     
                        
                        I. Hill, P. Chanawala, R. Singh, S. A. Sheikholeslam, and A. Ivanov, “CMOS Reliability
                           from Past to Future: A Survey of Requirements, Trends, and Prediction Methods,” IEEE
                           Transactions on Device and Materials Reliability, vol. 22, no. 1. Institute of Electrical
                           and Electronics Engineers Inc., pp. 1-18, Mar. 01, 2022. doi: 10.1109/TDMR.2021.3131345

 
                     
                        
                        I. Moghaddasi, A. Fouman, M. E. Salehi, and M. Kargahi, “Instruction-level NBTI stress
                           estimation and its application in runtime aging prediction for embedded processors,”
                           IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.
                           38, no. 8, pp. 1427-1437, 2018.

 
                     
                        
                        W. Wang, S. Yang, S. Bhardwaj, S. Vrudhula, F. Liu, and Y. Cao, “The impact of NBTI
                           effect on combinational circuit: Modeling, simulation, and analysis,” IEEE Trans Very
                           Large Scale Integr VLSI Syst, vol. 18, no. 2, pp. 173-183, 2009.

 
                     
                        
                        G. Zervakis et al., “Thermal-aware design for approximate dnn accelerators,” IEEE
                           Transactions on Computers, vol. 71, no. 10, pp. 2687-2697, 2022.

 
                     
                        
                        M. A. Hanif and M. Shafique, “DNN-Life: An Energy-Efficient Aging Mitigation Framework
                           for Improving the Lifetime of On-Chip Weight Memories in Deep Neural Network Hardware
                           Architectures,” in Proceedings -Design, Automation and Test in Europe, DATE, Institute
                           of Electrical and Electronics Engineers Inc., Feb. 2021, pp. 729-734. doi: 10.23919/DATE51398.2021.9473943

 
                     
                        
                        N. Landeros Muñoz, A. Valero, R. G. Tejero, and D. Zoni, “Gated-CNN: Combating NBTI
                           and HCI aging effects in on-chip activation memories of Convolutional Neural Network
                           accelerators,” Journal of Systems Architecture, vol. 128, Jul. 2022. doi: 10.1016/j.sysarc.2022.102553

 
                     
                        
                        P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos, “Stripes: Bit-serial
                           deep neural network computing,” in 2016 49th Annual IEEE/ACM International Symposium
                           on Microarchitecture (MICRO), 2016, pp. 1-12.

 
                     
                        
                        J. Lee, C. Kim, S. Kang, D. Shin, S. Kim, and H.-J. Yoo, “UNPU: An energy-efficient
                           deep neural network accelerator with fully variable weight bit precision,” IEEE J
                           Solid-State Circuits, vol. 54, no. 1, pp. 173-185, 2018.

 
                     
                        
                        M. Capra, F. Conti, and M. Martina, “A Multi-Precision Bit-Serial Hardware Accelerator
                           IP for Deep Learning Enabled Internet-of-Things,” in 2021 IEEE International Midwest
                           Symposium on Circuits and Systems (MWSCAS), pp. 192-197.

 
                     
                        
                        V. Sakellariou, V. Paliouras, I. Kouretas, H. Saleh, and T. Stouraitis, “A multiplier-Free
                           RNS-Based CNN accelerator exploiting bit-Level sparsity,” IEEE Trans Emerg Top Comput,
                           pp. 1-16, 2023. doi: 10.1109/TETC.2023.3301590

 
                     
                        
                        G. Alsuhli, V. Sakellariou, H. Saleh, M. Al-Qutayri, B. Mohammad, and T. Stouraitis,
                           “Conventional Number Systems for DNN Architectures,” in Number Systems for Deep Neural
                           Network Architectures, Springer, 2023, pp. 17-25.

 
                     
                        
                        G. Jaberipur, “Redundant number system-based arithmetic circuits,” Arithmetic Circuits
                           for DSP Applications, pp. 273-312, 2017.

 
                     
                        
                        F. Oboril, F. Firouzi, S. Kiamehr, and M. Tahoori, “Reducing NBTI-induced processor
                           wearout by exploiting the timing slack of instructions,” in Proceedings of the eighth
                           IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis,
                           2012, pp. 443-452.

 
                     
                        
                        Y. Chen, A. Calimera, E. Macii, and M. Poncino, “Characterizing the activity factor
                           in NBTI aging models for embedded cores,” in Proceedings of the 25th edition on Great
                           Lakes Symposium on VLSI, 2015, pp. 75-78.

 
                     
                        
                        V. B. Kleeberger, M. Barke, C. Werner, D. Schmitt-Landsiedel, and U. Schlichtmann,
                           “A compact model for NBTI degradation and recovery under use-profile variations and
                           its application to aging analysis of digital integrated circuits,” Microelectronics
                           Reliability, vol. 54, no. 6, pp. 1083-1089, 2014.

 
                     
                        
                        M. Pedram and S. Nazarian, “Thermal modeling, analysis, and management in VLSI circuits:
                           Principles and methods,” Proceedings of the IEEE, vol. 94, no. 8, pp. 1487-1501, 2006.

 
                     
                        
                        J. W. McPherson and J. W. McPherson, “Time-to-failure modeling,” Reliability Physics
                           and Engineering: Time-To-Failure Modeling, pp. 37-49, 2013.

 
                   
                
             
            
            
               			Iraj Moghaddasi  received the B.Sc. degree in computer engineering from Shahid
               Beheshti University, Tehran, in September 2000, the M.Sc. degree in computer engineering
               from the Iran University of Science and Technology, Tehran, in May 2003, and the Ph.D.
               degree from the School of Electrical and Computer Engineering, University of Tehran,
               Tehran, in September 2018. From 2019 to 2021, he was a Research Associate with the
               Iran Telecommunication Research Center (ITRC). He is currently a Postdoctoral Researcher
               at Chungnam National University, Daejeon, Korea. His research interests include computer
               architecture, reliable and high-performance computing, hardware modeling & architectural
               exploration of DNN edge accelerators, embedded systems, and processing in memory for
               computation-efficient machine learning.
               		
            
            
            
               			Byeong-Gyu Nam  (Senior Member, IEEE) received his B.S. degree (summa cum laude)
               in computer engineering from Kyungpook National University, Daegu, Korea, in 1999,
               M.S. and Ph.D. degrees in electrical engineering and computer science from Korea Advanced
               Institute of Science and Technology (KAIST), Daejeon, Korea, in 2001 and 2007, respectively.
               Dr. Nam is currently with Chungnam National University, Daejeon, Korea, as a professor.
               His current interests include machine learning processors, graphics processors, and
               low-power SoC design. He has served as the Chair of the Digital Architectures and
               Systems (DAS) subcommittee of ISSCC from 2017 to 2019 and was a member of the TPC
               for IEEE ISSCC, IEEE A-SSCC, IEEE COOL Chips, and ASP-DAC. He served as an Associate
               Editor of the IEIE Journal of Semiconductor Technology and Science (JSTS) and a Guest
               Editor for the IEEE Journal of Solid-State Circuits (JSSC) in 2013.