ChungGoohyung1
                     ChoKyoungub1
                     OhTaehyoun*
               
                  - 
                           
                        (Department of Electronic Engineering, Kwangwoon University, 615, Bima, 20, Gwangun-ro,
                        Nowon-gu, Seoul 139-701, Korea)
                        
 
               
             
            
            
            Copyright © The Institute of Electronics and Information Engineers(IEIE)
            
            
            
            
            
               
                  
Index Terms
               
                CMOS,  IO transceiver,  scalable,  delay compensation,  pre-emphasis,  FIR driver
             
            
          
         
            
                  I. INTRODUCTION
               Recently demands for high resolution display have increased per-pin data rate for
                  high data throughput between chip-to-chip. Data transmission speed for display has
                  variation (even during real-time operation) depending on image contents and various
                  standards need to be covered by the interface circuits with restricted power and noise
                  budget. In order for the interface scheme to operate with various speeds, high-speed
                  digital logics should be scalable. In [1], diverse clock phases with correlations are used to solve the hold-time violation
                  (HTV) problem and scalable operation could be achieved. However, multi-phase clock
                  should be available and maintaining the phase gap equally spaced at high speed is
                  an issue. A scheme that selects the clock polarity adaptively after detecting HTV,
                  has been suggested [2]. Since the cascade digital logics may require multiple selections along the path,
                  multiple adaptive loops need to be implemented to remove all HTV in the scheme.
               
               In this paper, we theoretically analyze the mechanism of HTV event at multiple data
                  speeds and propose an efficient design methodology to avoid HTV for scalable data
                  speeds. The entire high-speed digital paths in our transceiver have been designed
                  scalably via delay matching technique. The transceiver covers the speed range of 2.65
                  Gb/s-6.4 Gb/s, which meets various standards such as DP1.4 (5.4 Gb/s), LPDDR5 (6.4
                  Gb/s), SATA3 (6 Gb/s) and XAUI (3.125 Gb/s). The measurement performances are compared
                  to the similar applications [3,4]. In addition, half-rate design of drivers and sampler in the front-end could reduce
                  the power significantly.
               
             
            
                  II. ARCHITECTURE
               Fig. 1 presents our proposed 2 channel transceiver that operates in scalable data speed.
                  The pseudo random bit sequence (PRBS) generates 18 lanes 147-356 Mb/s parallel signals
                  with $2^{23}-1$, $2^{31}-1$ pattern lengths. The following 18:2 serializers assemble
                  them into 2 lanes 1.325-3.2 Gb/s EVEN/ODD data. As shown in Fig. 1(b), the tap signal generator delays the half-rate D$_{\mathrm{ODD}}$/D$_{\mathrm{EVEN}}$
                  signals to generate the PRE/MAIN/POST signals. The tap signal generator consists of
                  consecutive latches, and the data are delayed by 0.5 UI of data speed for each stage
                  of latches. As shown in the timing diagram of Fig. 1(b), the PRE/MAIN/POST signals from the appropriate nodes where the three tap signals
                  are aligned are provided to the drivers. As shown in Fig. 1(c), compared to the current mode driver our voltage mode driver consumes 1/4 current
                  power. The driver consists of three taps (PRE/MAIN/POST) and each taps have 3, 15,
                  7 segments, respectively. The number of ON segments of each tap is adjusted through
                  the PU SEG, and the amplitude of pre-emphasis is adjusted for various channel losses.
               
               
                     Fig. 1. (a) Block diagram of the 2channel TRx; (b) tap signal generator; (c) FIR driver; (d) sampler.
 
               The equalizing drivers generate 2.65-6.4 Gb/s differential non-return-to-zero (NRZ)
                  signals. Along the path, all high-speed logics are scalable because the HTV could
                  be avoided by matching delays for various speed. In our receivers, the 1-stage continuous-time
                  linear equalizers (CTLE) mitigate the channel inter-symbol interference and improve
                  bit-error rate (BER) performance. As shown in Fig. 1(c), each sampler is designed with a strong-arm latch topology and SR latch by transforming
                  the output of strong-arm latch return-to-zero (RZ) signal to NRZ signals. The strong
                  arm latch compares the voltage-level of the differential inputs (V$_{\mathrm{in,p}}$,
                  V$_{\mathrm{in,n}}$) of 2.65-6.4 Gb/s data speed at rising edge of the recovered clock
                  from the CDR. If V$_{\mathrm{in,p}}$ is larger than V$_{\mathrm{in,n}}$, OUTP is determined
                  as 1 and if V$_{\mathrm{in,p}}$ is smaller than V$_{\mathrm{in,n}}$, OUTP is determined
                  as 0.
               
               The following 2:18 deserializers parallelize the EVEN/ODD data into 18 lanes $\times
                  $ 147-356 Mb/s signals and the PRBS checkers detect errors in the received signals.
                  The BER counters can count the number of errors up to $2^{40}$ and monitor the error
                  count in real-time via serial-to-parallel interface (SPI). Scalable logics make possible
                  data operation at various speeds under maximum limitation comes from clock speed constraint.
               
               Fig. 2 illustrates the delay matching technique for scalable speed operation of high-speed
                  logics in our architecture. Fig. 2(a) shows a typical case of consecutive positive edge-triggered flipflops (FF) that share
                  a single clock source. Then Input clock is inverted for FF2 because PVT variation
                  and line delay mismatch can cause timing mismatch between data and clock on FF2 and
                  may cause setup and hold time violations. The data delay, $t_{d}$ and clock delay,
                  $t_{c}$ occur from combinational logic propagation delay required for making logical
                  functions (i.e. muxing/demuxing/clock dividing) or clock-to-Q delay. In all cases,
                  $t_{d}$ and $t_{c}$ do not depend on the data speed and clock speed but on the propagation
                  delay of logic circuits. Fig. 2(b) is a simple implementation of logic blocks made up of inverter chain to find out
                  the changes in $t_{d}$ and $t_{c}$ by PVT variation. The value of $t_{d}$} and $t_{c}$
                  are 468.9 ps and 157.8 ps at 3.2 Gb/s, typical corner, 27℃ and Table 1 summarizes the value of data and clock delay with corners and temperature. The $t_{d,corner}$,
                  $t_{c,corner}$  in Table 1 are the values of data delay and clock delay at each corner and temperature and $t_{d,var}$
                  and $t_{c,var}$ are defined as $t_{d,var}= t_{d,corner}- t_{d}$, $t_{c,var}= t_{c,corner}-
                  t_{c}$. When Clock B locates at the optimal point of Data B at typical corner, 27℃,
                  the deviation of Clock B from the optimal point of Data B is defined as $\left| t_{d,var}-
                  t_{c,var}\right| $. As stated in Table 1, the maximum value of $\left| t_{d,var}- t_{c,var}\right| $at the maximum data rate(3.2
                  Gb/s) of our circuit is 95 ps, 0.3UI at ss corner, 120℃. Usually, eye opening is secured
                  over 0.8UI in a digital circuit. Since the difference in delay due to PVT variation
                  does not vary with data speed, the lower the data speed, the narrower the portion
                  of $\left| t_{d,var}- t_{c,var}\right| $within 1UI of the data speed. As a result,
                  this circuit reduces the HTV due to PVT variation.
               
               
                     Table 1. Data and Clock Delay with PVT variations
                  
                        
                           
                              | 
                                 
                              								
                               Corner 
                              							
                            | 
                           
                                 
                              								
                               Temperature [℃] 
                              							
                            | 
                           
                                 
                              								$t_{d,corner}$
                              								
                               [ps] 
                              							
                            | 
                           
                                 
                              								$t_{c,corner}$
                              								
                               [ps] 
                              							
                            | 
                           
                                 
                              								$t_{d,var}$
                              								
                               [ps] 
                              							
                            | 
                           
                                 
                              								$t_{c,var}$
                              								
                               [ps] 
                              							
                            | 
                           
                                 
                              								$\left| t_{d,var}- t_{c,var}\right| $
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               [ps] 
                              							
                            | 
                           
                                 
                              								
                               [UI @ 3.2Gbps] 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               tt 
                              							
                            | 
                           
                                 
                              								
                               -40 
                              							
                            | 
                           
                                 
                              								
                               460.6 
                              							
                            | 
                           
                                 
                              								
                               156.9 
                              							
                            | 
                           
                                 
                              								
                               -8.3 
                              							
                            | 
                           
                                 
                              								
                               -0.9 
                              							
                            | 
                           
                                 
                              								
                               7.4 
                              							
                            | 
                           
                                 
                              								
                               0.02 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               120 
                              							
                            | 
                           
                                 
                              								
                               485.3 
                              							
                            | 
                           
                                 
                              								
                               161.7 
                              							
                            | 
                           
                                 
                              								
                               16.4 
                              							
                            | 
                           
                                 
                              								
                               3.9 
                              							
                            | 
                           
                                 
                              								
                               12.5 
                              							
                            | 
                           
                                 
                              								
                               0.04 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               ss 
                              							
                            | 
                           
                                 
                              								
                               -40 
                              							
                            | 
                           
                                 
                              								
                               590.5 
                              							
                            | 
                           
                                 
                              								
                               200 
                              							
                            | 
                           
                                 
                              								
                               121.6 
                              							
                            | 
                           
                                 
                              								
                               42.2 
                              							
                            | 
                           
                                 
                              								
                               79.4 
                              							
                            | 
                           
                                 
                              								
                               0.25 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               120 
                              							
                            | 
                           
                                 
                              								
                               607.5 
                              							
                            | 
                           
                                 
                              								
                               201.4 
                              							
                            | 
                           
                                 
                              								
                               138.6 
                              							
                            | 
                           
                                 
                              								
                               43.6 
                              							
                            | 
                           
                                 
                              								
                               95 
                              							
                            | 
                           
                                 
                              								
                               0.3 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               ff 
                              							
                            | 
                           
                                 
                              								
                               -40 
                              							
                            | 
                           
                                 
                              								
                               368.4 
                              							
                            | 
                           
                                 
                              								
                               124.7 
                              							
                            | 
                           
                                 
                              								
                               -100.5 
                              							
                            | 
                           
                                 
                              								
                               -33.1 
                              							
                            | 
                           
                                 
                              								
                               67.4 
                              							
                            | 
                           
                                 
                              								
                               0.22 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               120 
                              							
                            | 
                           
                                 
                              								
                               398.2 
                              							
                            | 
                           
                                 
                              								
                               132.9 
                              							
                            | 
                           
                                 
                              								
                               -70.7 
                              							
                            | 
                           
                                 
                              								
                               -24.9 
                              							
                            | 
                           
                                 
                              								
                               45.8 
                              							
                            | 
                           
                                 
                              								
                               0.15 
                              							
                            | 
                        
                     
                  
                
               
                     Fig. 2. Simulation testbench for delay matching technique: (a) Typical flip-flop logic where timing issue occurs for various speed; (b) logic blocks made up of inverter chain; (c) timing illustration of Data B and Clock B.
 
               Fig. 3(a) shows a timing diagram in the case that $t_{d}$ is $3\alpha $, where $\alpha $ is
                  assumed to be 0.5UI for the illustration purpose. In method1 and method2 we can delay
                  $t_{c}$ by $\alpha $ and $3\alpha $, each respectively, to avoid HTV. If both the
                  data and clock speed become half, as shown in Fig. 3(c), method1 results in HTV. Fig. 3(b) shows that it operates without HTV in both method1 and method2 at 2/3 data speed.
                  Method 2 can enable avoiding HTV at the continuous wide-range data rate between max
                  data speed and 0.5 ${\times}$ max data speed.
               
               
                     Fig. 3. Illustration of delay matching techniques for scalable speed: (a) Timing diagram for maximum speed ( $t_{d}$ > $t_{c}$ case); (b) Timing diagram for 2/3 speed of the maximum ( $t_{d}$ > $t_{c}$ case); (c) Timing diagram for 0.5 speed of the maximum ( $t_{d}$ > $t_{c}$ case); (d) Timing diagram for maximum speed ( $t_{d}$ < $t_{c}$ case); (e) Timing diagram for 2/3 speed of the maximum ( $t_{d}$ < $t_{c}$ case); (f) Timing diagram for 0.5 speed of the maximum ( $t_{d}$ < $t_{c}$ case).
 
               Whereas the clock trigger timing still remains at optimal data BER for method2, this
                  consecutive FF scheme can operate without HTV regardless of various data speed. Fig. 3(d)-(f) show the case that $t_{c}$ is $3\alpha $ and the delay compensation is made on
                  $t_{d}$ by $\alpha $ and $3\alpha $. Similarly, method2 ($t_{d}$=$3\alpha $) can enable
                  avoiding HTV for various data speed. 
               
               In Fig. 2(a), the speed of input data is same as the speed of the input clock, and the value of
                  $\alpha $ at the maximum data rate (3.2 Gb/s) is set to 156.25 ps. In the case of
                  Method 1, $t_{d}=3\alpha $, $t_{c} = \alpha $, and in the case of Method 2, $t_{d}=3\alpha
                  $ and $t_{d}=3\alpha $. It is a simulation in which the pattern checker determines
                  an error and calculates the BER for each frequency when the speed of the input clock
                  changed to 0.1-3.2 GHz. Fig. 4 shows the BERs of method1 and method2 in wide data rate through this simulation.
                  In Method 2, the BER is close to 0 across 0.1-3.2 GHz while in Method 1, the BER increase
                  near 0.5 ${\times}$ maximum data rate.
               
               
                     Fig. 4. Simulation results of BER – Frequency by method1 and method 2.
 
               Fig. 5(a) and (b) shows the circuits of 3:1 serializer in the transmitter and 1:3 deserializer
                  in the receiver, where the timing issues occur on 2nd FF in the consecutive FFs with
                  a single clock source. In the serializer, as shown in Fig. 5(a) the mux has to use a divided-by-3 clock and $t_{d}$ is larger than $t_{c}$. For scalable
                  operation the delay compensation should be made on $t_{c}$ by adding a chain of buffers.
                  In the deserializer, on the other hand, $t_{d}$ is smaller than $t_{c}$. In the same
                  manner, the compensation is made on $t_{d}$, as shown in Fig. 5(b). We have options to place the delay compensation buffers on A or B for the deserializer.
                  Choosing A will affect the timing issue in FF1, so B is a better choice.
               
               
                     Fig. 5. Delay matching techniques used in our transceiver IP: (a) 3:1 Serializer; (b) 1:3 Deserializer.
 
             
            
                  III. MEASUREMENT
               Fig. 6 presents the measurement results of our transceiver for 3.2 Gb/s and 6.4 Gb/s. Tektronix
                  TDS6154C has been used to measure the Tx eye performances and the built-in BER counter
                  in Rx measures the BER by sweeping the sampler clock phase horizontally. The estimated
                  parasitic loading of Tx output, PAD and channel is 7.5 pF, which results in 17.8 dB
                  channel loss at Nyquist rate. The measured vertical eye openings for channel1 and
                  channel2 are 94.8 mV/993 mV and 59.5 mV/997 mV each respectively at 6.4 Gb/s without
                  pre-emphasis, as shown in Fig. 6(a) and (b). With the pre-emphasis on, the vertical eye-openings are improved to 221.6
                  mV/577.8 mV and 185.1 mV/534.3 mV. Fig. 6(c) shows the Rx horizontal bathtub curve measured from the built-in BER counter in our
                  IP at 3.2 Gb/s and 6.4 Gb/s with and without pre-emphasis. The horizontal eye-opening
                  is improved by 0.23 UI and 0.25 UI at $10^{-9}$ BER. Our transceiver has been fabricated
                  in
               
               
                     Fig. 6. Measurement results of our transceiver at 3.2 Gb/s and 6.4 Gb/s: (a) Tx output eye opening w/ and w/o FIR at channel1 (6.4 Gb/s); (b) Tx output eye opening w/ and w/o FIR at channel2 (6.4 Gb/s); (c) Rx bathtub curves w/ and w/o FIR for 3.2 and 6.4 Gb/s (channel1).
 
               65 nm CMOS process and occupies 1.02 $\mathrm{mm}^{2}$ die area. Fig. 7 shows layout of our IP and the measurement setup. Table 2 summarizes the measured performances of our transceiver, and they are compared to
                  the prior arts. The proposed transceiver shows successful data transmission in measurement
                  within all speed range of 2.65 Gb/s - 6.4 Gb/s by scalable design technique. Our transceiver
                  consumes 72 mW/ch from 1.2 V power supply.
               
               
                     Fig. 7. Layout for 2-ch transceivers (1.02 mm2) and measurement setup.
 
               
                     Table 2. Comparison Table
                  
                        
                           
                              | 
                                 
                              								
                               
                              							
                             | 
                           
                                 
                              								
                               [3] 
                              							
                            | 
                           
                                 
                              								
                               [4] 
                              							
                            | 
                           
                                 
                              								
                               This work 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               Technology 
                              							
                            | 
                           
                                 
                              								
                               28 nm CMOS 
                              							
                            | 
                           
                                 
                              								
                               90 nm CMOS 
                              							
                            | 
                           
                                 
                              								
                               65 nm CMOS 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               Data rate (bit/s) 
                              							
                            | 
                           
                                 
                              								
                               0.5 - 6.6 G 
                              							
                            | 
                           
                                 
                              								
                               4 G 
                              							
                            | 
                           
                                 
                              								
                               2.65 - 6.4 G 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               Supply (V) 
                              							
                            | 
                           
                                 
                              								
                               1 
                              							
                            | 
                           
                                 
                              								
                               - 
                              							
                            | 
                           
                                 
                              								
                               1.2 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               Power (mW/ch) 
                              							
                            | 
                           
                                 
                              								
                               129 
                              							
                            | 
                           
                                 
                              								
                               56 
                              							
                            | 
                           
                                 
                              								
                               72 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               Channel Loss (dB) 
                              							
                            | 
                           
                                 
                              								
                               22 
                              							
                            | 
                           
                                 
                              								
                               18.2 
                              							
                            | 
                           
                                 
                              								
                               17.8 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               Tx Vertical eye opening (mV) 
                              							
                            | 
                           
                                 
                              								
                               180 
                              							
                            | 
                           
                                 
                              								
                               - 
                              							
                            | 
                           
                                 
                              								
                               221.6 
                              								
                              (FR4) 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               Rx Horizontal eye opening (UI) 
                              							
                            | 
                           
                                 
                              								
                               0.25 (@10-9)
                               
                              							
                            | 
                           
                                 
                              								
                               0.2 (@10-9)
                               
                              							
                            | 
                           
                                 
                              								
                               0.25 (@10-9)
                               
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               Swing (mV) 
                              							
                            | 
                           
                                 
                              								
                               - 
                              							
                            | 
                           
                                 
                              								
                               250 - 1000 
                              							
                            | 
                           
                                 
                              								
                               577.8 
                              							
                            | 
                        
                        
                              | 
                                 
                              								
                               Single Tx/Rx Area (mm2/ch)
                               
                              							
                            | 
                           
                                 
                              								
                               0.64 
                              							
                            | 
                           
                                 
                              								
                               1.11 
                              							
                            | 
                           
                                 
                              								
                               0.51 
                              							
                            | 
                        
                     
                  
                
             
            
                  IV. CONCLUSIONS
               A design methodology of high-speed clock-triggered logics for scalable speed operation,
                  has been proposed and used to implement the whole 2-channel IO transceivers. The HTV
                  timing issue for various data speed has been dealt with theoretical backgrounds. The
                  IP shows successful data transmission over the speed range of 2.65 Gb/s-6.4 Gb/s with
                  error-free.
               
             
          
         
            
                  ACKNOWLEDGMENTS
               
                  				This work was supported in part by the ATC+ (Advanced Technology Center plus)
                  Program through the Korea Evaluation Institute of Industrial Technology under Grant
                  20017980 and was supported by the Research Grant of Kwangwoon University in 2022.
                  The EDA tool was supported by the IC Design Education Center (IDEC), Korea.
                  			
               
             
            
                  
                     References
                  
                     
                        
                        Frans, Y., Carey, D., Erett, M. et al: ‘A 0.5-16.3 Gb/s Fully Adaptive Flexible-Reach
                           Transceiver for FPGA in 20 nm CMOS’ , IEEE Jornal of Solid-State Circuits, 2015, 50,
                           8, pp. 1932-1944, doi:10.1109/JSSC.2015.2413849.

 
                      
                     
                        
                        Abdollahi, R., Hadidi, K. and Khoei, A.: ‘A Simple and Reliable System to Detect and
                           Correct Setup/Hold Time Violations in Digital Circuits’,IEEE Transactions on Circuits
                           and Systems I: Regular Paper 2016, 63, 10, pp. 1682-1689, doi:10.1109/TCSI.2016.2582239.

 
                      
                     
                        
                        Savoj, J., Hsieh, K.C.H., An, F.T. et al: ‘A Low-Power 0.5-6.6Gb/s Wireline Transceiver
                           Embedded in Low-Cost 28nm FPGAs’, IEEE Journal of Solid-State Circuits, 2013, 48,
                           11, pp. 2582-2594, doi:10.1109/JSSC.2013.2274824.

 
                      
                     
                        
                        Faust, A.C., Narasimha, R.L., Bhatia, K. et al: ‘FEC-based 4 Gb/s backplane transceiver
                           in 90nm CMOS’, Proceedings of the IEEE 2012 Custom Integrated Circuits Conference,
                           San Jose, CA, USA, 9-12 Sept. 2012, doi:10.1109/CICC.2012.6330665.

 
                      
                   
                
             
            
            
               			Goohyung Chung received the Bachelor of Science (B.S.) degree in the department
               of electronic engi-neering from Kwangwoon university, Korea, in 2022. His Master of
               Science (M.S.) degree is in progress in Kwangwoon university, Korea. His current research
               field is designing of clock and data recovery (CDR) circuits including high-speed
               IO circuits.
               		
            
            
            
               			Kyoungub Cho received the Bachelor of Science (B.S.) degree in the department of
               electronic engi-neering from Kwangwoon university, Korea, in 2022. His Master of Science
               (M.S.) degree is in progress in Kwangwoon university, Korea. His current research
               field is designing of clock generation circuits which are including phase-locked loop
               (PLL) and high-speed IO circuits.
               		
            
            
            
               			Taehyoun Oh (S’05)  received the Bachelor of Science (B.S.) and Master of Science
               (M.S.) degrees in Electrical Engineering from Seoul National University in 2005 and
               2007, respectively. He received his Ph.D. degree in Electrical Engi-neering from the
               University of Minnesota, Minneapolis under the supervision of Dr. Ramesh Harjani.
               His doctoral research is focused on high-speed I/O circuits and architectures. During
               the summer of 2010, he worked on I/O channel modeling at AMD Boston Design Center,
               MA. In the fall semester of 2011, he researched on I/O architecture and jitter budgeting
               of the link at Intel Corp., CA. From fall of 2012, he joined the IBM system technology
               group, NY. and worked on performance verification of high-speed decision feedback
               equalizer for server processors. Since spring of 2013, he joined at the department
               of electronic engineering in Kwangwoon university in Seoul, Korea as an assistant
               professor. His current research interest is focused on clock generation IC design.