I. INTRODUCTION
               Recently, the demand for high-speed data transmission has increased rapidly with the
                  development of AI deep-learning autonomous vehicles based on camera sensors.     
                  
               
               For high-speed data transmission, serial links are preferred over parallel links due
                  to low power consumption and cost. But high-speed data transmission in serial links
                  is limited by channel bandwidth, which is essentially a low-pass characteristic [1]. Therefore, for high-speed data transmission within a given channel bandwidth, a
                  pulse modulation scheme that increases the number of transmission bits per symbol
                  is used in serial links.
               
               The most common pulse modulation scheme in high-speed serial links is a PAM-X. As
                  shown in Fig. 1(a) and (b), PAM-X is a pulse modulation that increases the number of differential levels (X)
                  and reduces the symbol rate by $\log _{2}X$ times compared to binary signaling (PAM-2).
                  However, for a given SNR satisfaction, the full output swing of the signal must be
                  increased, which causes the power consumption of the transmitter output driver to
                  be increased [2].
               
               Another pulse modulation scheme is a PWM-X. As shown in Fig. 1(c), PWM-X increases the number of bits per symbol by increasing the number of falling
                  edges (X). In other words, a PWM-X signal has a rising edge and a falling edge per
                  symbol. So, unlike a PAM-X, a clock and data recovery (CDR) is replaced by a phase-locked
                  loop (PLL) in the receiver [3]. In addition, a PWM-X always uses two differential levels, which is why the power
                  consumption of the PWM-X transceiver is lower than that of PAM-X. Also, a PAM driver
                  is implemented with a current mode logic (CML), but a PWM driver is implemented with
                  a CMOS logic. Therefore, PWM-X improves power efficiency by technology scaling more
                  than PAM-X. However, an increase in the number of falling edges (X) leads to a decrease
                  in the minimum pulse width, causing an increase in inter-symbol interference (ISI)
                  induced by channel loss.
               
               For power efficiency improvement and high data rate, the dual-mode PAM-10 scheme was
                  introduced as shown in Fig. 1(d) [2]. This scheme can reduce the power consumption of the transmitter output driver by
                  decreasing the number of differential levels (X) through common mode modulation. Also,
                  the dual-mode PAM-10 scheme ensures the same symbol rate as PAM-16. However, many
                  the number of differential levels (X=10) still require high supply voltage. Also,
                  the dual-mode PAM-10 employs the static output driver. Therefore, its power efficiency
                  improvement by technology scaling may be hard compared to PWM-X used on a CMOS logic.
               
               For the reduction of pin-count and high-speed data transmission, the conventional
                  PWAM scheme was introduced as shown in Fig. 1(e) [4]. This scheme uses only 5 differential levels compared to the existing 4-bit/symbol
                  PAM-X (i.e., dual-mode PAM-10 [2] and PAM-16 [13]). So, the conventional PWAM scheme can reduce the power consumption of the transmitter
                  output driver. However, PWM-4 restricts the minimum pulse width to $\frac{8}{7}T_{b}$,
                  which is similar to PAM-2, thus limiting high-speed data transmission.
               
               In summary, PAM-X, PWM-X, dual-mode PAM-10 and conventional PWAM have restrictions
                  on high-speed data transmission or power efficiency improvement by technology scaling.
                  Therefore, we propose a novel PWAM signaling scheme as shown in Fig. 1(f) to achieve high data rate transmission and power efficiency improvements by technology
                  scaling simultaneously.
               
               This paper is organized as follows: in Section II, the proposed PWAM signaling scheme
                  is presented and compared to the conventional 4-bit/symbol pulse modulation scheme.
                  the transceiver implementation of the proposed scheme is described in Section II.
                  Section III shows the simulation results of the 10-Gb/s transceiver designed in a
                  180 nm CMOS process for power efficiency verification, and Section IV concludes.
               
               
                     Fig. 1. Waveforms of various pulse modulations: (a) PAM-2; (b) PAM-4; (c) PWM-4; (d) dual-mode PAM-10; (e) conventional PWAM (PWM-4 and PAM-4); (f) proposed PWAM (dual-mode PAM-4 and PWM-2).
 
             
            
                  II. PROPOSED PWAM SCHEME
               
                     1. Proposed PWAM Signaling
                  The proposed PWAM signaling transmits 4-bits per symbol in a combination of dual-mode
                     PAM-4 and PWM-2. As shown in Fig. 2, the dual-mode PAM-4 uses both common-mode and differential-mode, unlike PAM-X, which
                     employs only differential-mode. Consequently, the dual-mode PAM-4 scheme can modulate
                     3-bit data to eight differential levels through three common levels $\left(V_{cm2},V_{cm1},V_{cm0}\right)$.
                     In other words, it has the same transmission capability as PAM-8. For that reason,
                     the proposed PWAM scheme can change PWM-4, employed in the conventional PWAM scheme,
                     to PWM-2. Also, the minimum pulse width of the proposed PWAM scheme is increased.
                  
                  As shown in Fig. 3(a), since the differential levels of $V_{cm2}$ and $V_{cm0}$ overlap the differential
                     levels of $V_{cm1}$, the number of differential levels (X) of the proposed PWAM scheme
                     are 5 including a zero level for PWM-2. That is, the number of differential levels
                     (X) is decreased compared to the 4-bit/symbol pulse modulation scheme (i.e., dual-mode
                     PAM-10 [2] and PAM-16 [13]), thereby reducing power consumption. In addition, PWM-2 drivers based on CMOS logic
                     help further improve power efficiency by technology scaling.
                  
                  The important features for the proposed PWAM signaling and comparisons with 4-bit/symbol
                     pulse modulation schemes can be summarized as follows.
                  
                  1) The proposed PWAM scheme improves the minimum pulse width to $1.5T_{b}$ compared
                     to the conventional PWAM scheme. And the inter-symbol interference (ISI) induced by
                     channel loss is reduced. This is due to a combination of dual-mode PAM-4 and PWM-2.
                     In this work, the falling edge of the proposed PWAM signal is synchronized to CLK-135
                     and CLK-225. So, the minimum pulse width becomes three over eights for the 1-unit
                     interval of the proposed PWAM signal. And assuming that the 1-unit interval of a 1-bit/symbol
                     PAM-2 is 1$T_{b}$, the 1-unit interval of the 4-bit/symbol proposed scheme is 4$T_{b}$.
                     Therefore, the minimum pulse width ($T_{p}$) of the proposed PWAM signal is calculated
                     as follows.
                  
                  
                  2) The proposed PWAM scheme has an increased SNR compared to dual-mode PAM-10 [2] and PAM-16 [13]. This is because it uses only 5 differential levels compared to the other 4-bit/symbol
                     pulse modulation schemes mentioned above.
                  
                  3) Compared to dual-mode PAM-10 [2] and PAM-16 [13], the power consumption of the transceiver can be reduced and the power efficiency
                     by technology scaling can be further improved. This is possible because the proposed
                     PWAM scheme has fewer differential levels (X=5) and a 1-bit PAM driver is replaced
                     by a 1-bit PWM driver compared to the other 4-bit/symbol pulse modulation schemes
                     mentioned above.
                  
                  4) Since PWM-2 has a rising edge and a falling edge for each symbol, the clock can
                     be recovered by a PLL instead of a CDR in the receiver and an 8B10B encoder for CDR
                     is not required in the transmitter. That is, PWM-2 simplifies the circuits for clock
                     recovery in PAM-X.
                  
                  5) Under a lossy channel environment, the differential-mode is a dominant factor for
                     BER performance than the common-mode. The minimum pulse width of the common-mode is
                     $4T_{b}$, which is larger than that (=$1.5T_{b}$) of the differential-mode. This means,
                     When the voltage difference between adjacent levels in the differential-mode and the
                     voltage difference between adjacent levels in the common-mode is the same, the ISI
                     of the differential-mode is greater than the ISI of the common-mode. Therefore, under
                     a lossy channel environment, the BER is determined by the differential-mode.
                  
                  Fig. 4 shows a block diagram of the proposed PWAM transceiver. In the transmitter, Tx-PLL
                     generates multi-phased Tx-CLKs required for serial to parallel converter and PWM driver
                     as an external reference clock (REF CLK). The serial to parallel converter converts
                     serial data into 4-bit parallel data (Tx-bit0, Tx-bit1, Tx-bit2 and Tx-bit3) through
                     multi-phased Tx-CLKs. As shown in Fig. 4, only Tx-bit3 is modulated with PWM signal (Tx-PWM) by the PWM driver, and the remaining
                     3-bit parallel data (Tx-bit0, Tx-bit1 and Tx-bit2) and Tx-PWM is processed by the
                     PAM encoder for dual-mode PAM operation. Then, the PAM driver generates the proposed
                     PWAM signal as an output of the PAM encoder. In the receiver, the reference clock
                     (Rx-REF CLK) is extracted from the proposed PWAM signal by CLK sampler, and it is
                     recovered by Rx-PLL for generating multi-phased Rx-CLKs. The flash ADC detects the
                     differential-mode PAM, common mode PAM and PWM using the recovery clocks (Rx-CLKs)
                     and threshold voltages, and it determines the thermometer codes. Then, the thermometer
                     codes are converted or recovered to 4-bit parallel data (Rx-bit0, Rx-bit1, Rx-bit2,
                     and Rx-bit3) by the decoder.
                  
                  
                        Fig. 2. Single-ended waveform of dual-mode PAM-4: (a) 2-differential levels at $V_{cm2}$ case; (b) 4-differential levels at $V_{cm1}$ case; (c) 2-differential level at $V_{cm0}$ case.
 
                  
                        Fig. 3. The proposed PWAM (dual-mode PAM-4 and PWM-2) format: (a) differential-mode; (b) common-mode.
 
                  
                        Fig. 4. The proposed PWAM (dual-mode PAM-4 and PWM-2) transceiver block diagram.
 
                
               
                     2. Transmitter Architecture and Design
                  As shown in Fig. 4, the transmitter consists of Tx-PLL, serial to parallel converter, PWM driver, PAM
                     encoder, and PAM driver.
                  
                  As shown in Fig. 5, Tx-PLL is based on a conventional charge pump phase-locked loop (CPPLL), and it
                     includes a phase frequency detector (PFD), a charge pump (CP), a low-pass filter (LPF),
                     a voltage-controlled oscillator (VCO), a duty cycle corrector (DCC), and divider.
                     DCC and four-stage differential ring VCO are employed in Tx-PLL for the exact phase
                     of eight multi-phased Tx-CLKs. If a 45-degree phase difference between eight multi-phased
                     Tx-CLKs is not guaranteed, a bit error may occur due to serial to parallel converter
                     and PWAM demodulation, and the minimum pulse width of 1.5$T_{b}$ cannot be guaranteed.
                     Therefore, the DCC shown in Fig. 5 and four-stage differential ring VCO are designed for Tx-PLL. In this work, 10-Gb/s
                     serial data is transmitted, so the Tx-PLL must generate a 2.5 GHz clock through an
                     external reference clock (REF CLK).
                  
                  The serial to parallel converter is a circuit that converts serial data into 4-bit
                     parallel data (Tx-bit0, Tx-bit1, Tx- bit2 and Tx-bit3). If REF CLK is synchronized
                     with serial data, serial data can be converted into 4-bit parallel data by 4-different
                     phase clocks with a 90-degree difference. Fig. 6 shows the block diagram of the serial to parallel converter, indicating that the
                     first stage flip-flops sample the serial data into parallel data through 4-different
                     phase clocks with a 90-degree difference, and that the parallel data is synchronized
                     to CLK-0 at the second stage flip-flops. In this work, an extended-true single phase
                     clock (E-TSPC) flip flop was used for the serial to parallel converter, and it features
                     high-speed operation, lower power consumption, and smaller area due to the fewer number
                     of transistors than the conventional TSPC flip-flop [5].
                  
                  As shown in Fig. 7, PWM driver consists of a phase selector and a phase combiner [4]. In the phase selector, NMOS transistors on the left determine the rising edge of
                     Tx-PWM, and the phase combiner maintains the value of Tx-PWM at ‘1 for a while, and
                     then NMOS transistors on the right of the phase selector decide the falling edge of
                     Tx-PWM. As shown in Fig. 8, in order for the Tx-PWM signal to have one rising edge and two different falling
                     edges for 1 unit interval, its rising edge is synchronized to CLK-0, and its falling
                     edges are determined by CLK-135 or CLK-225. In this work, if Tx-bit3 is ‘0’, the falling
                     edge of Tx-PWM is synchronized with CLK-135. And if Tx-bit3 is ‘1’, it is synchronized
                     with CLK-225. Thus, CLK-180 can be used as a threshold phase ($P_{th}$) for demodulating
                     bit3 information in the receiver. Also, the phase difference of 1$T_{b}$ between CLK-135
                     and CLK-225 becomes the sampling time margin for demodulating Tx-bit3 in the receiver.
                  
                  The PAM encoder is a circuit for making the minimum pulse width of common-mode 4Tb
                     as shown in Fig. 3(b), and its truth table is listed in Table 1. Also, the PAM encoder is shown in Fig. 9 and it is designed with CMOS logic to improve power efficiency by technology scaling.
                     The overall behavior of the PAM encoder is as follows: 1) the common-mode decision
                     circuit determines $V_{cm}$<2:0> from Tx-bit<2:0>. This is to pick up a common level
                     among three common levels $(V_{cm2},V_{cm1},$ $V_{cm0})$. 2) the encoder generates
                     all differential-mode outputs of S<6:0> and Sb<5:0> when Tx-PWM is '1', and all common-mode
                     outputs of S<6:0> and Sb<5:0> when Tx-PWM is '0'. The outputs of each mode are listed
                     in Table 1. 3) in the 3 to 1 MUX array, differential-mode outputs and common-mode outputs corresponding
                     to the common level are selected among all outputs from the encoder. 4) in flip-flop
                     array, the selected differential-mode outputs and common-mode outputs are sampled
                     by Tx-PWM. 5) in the 2 to 1 MUX array, when Tx-PWM is '1', S<6:0> and Sb<5:0> becomes
                     the selected differential-mode outputs, and when Tx-PWM is '0', S<6:0> and Sb<5:0>
                     becomes the selected common-mode outputs. This is to sustain the common level when
                     the differential level is zero level.
                  
                  The PAM driver is designed with a current mode logic (CML) and employs current steering
                     topology for stable current source operations [2,6]. As shown in Fig. 10, PAM driver consists of left, center, and right current sources for the dual-mode
                     PAM operation. The left current sources drive 2I, so it is a driver for $V_{cm2}$.
                     The center and the left current sources together drive 6I, so they are drivers for
                     $V_{cm1}$, and NMOS transistors for S<6> are added for current steering topology when
                     the common-mode is $V_{cm2}$. Lastly, the right current sources drive 10I together
                     with the left and the center current sources, so they are drivers for $V_{cm0}$. In
                     addition, the current sources of the PAM driver are designed as a cascode current
                     source for stable current when the common level is changed.
                  
                  The differential output (OUTP - OUTN) and common output ([OUTP + OUTN]/2) by S<6:0>
                     and Sb<5:0> in the PAM driver are summarized in Table 1, which uses the gray-code mapping method. This is to ensure one-bit error between
                     adjacent differential outputs [7].
                  
                  
                        Table 1. Truth table for PAM encoder and PAM driver output
 
                  
                        Fig. 5. Tx-PLL based on a conventional charge pump phase-locked loop.
 
                  
                        Fig. 6. Serial to parallel converter.
 
                  
                        Fig. 7. PWM driver based CMOS logic.
 
                  
                        Fig. 8. PWM signal (Tx-PWM) modulated by Tx-bit3.
 
                  
                  
                
               
                     3. Receiver Architecture and Design
                  The receiver consists of CLK sampler, Rx-PLL, flash ADC, and decoder including retimer,
                     as shown in Fig. 4.
                  
                  The CLK sampler is a circuit for extracting Rx-REF CLK from the proposed PWAM signal
                     and consists of CM blocking circuit, continuous time linear equalizer (CTLE), variable
                     gain amplifier (VGA), and PWM sampler, as shown in Fig. 11.
                  
                  Since the conventional differential amplifier cannot perform common-mode rejection
                     for high-frequency common-mode voltage [8], CM blocking circuit is required. For example, if the high-frequency common-mode
                     of the proposed PWAM signal is input to the conventional differential amplifier, the
                     gate-source voltage ($V_{GS}$) of the NMOS differential pair cannot be fixed. In that
                     case, the drain current ($I_{D}$) of the NMOS differential pair becomes unstable and
                     causes a ripple in the common-mode voltage. After all, since it means that the bias
                     of the circuit is unstable, the RC-degenerated differential pair [9] and PWM sampler based on the conventional differential amplifier cannot be worked
                     properly. However, when the CM blocking circuit based on the CTLE with negative resistance
                     and capacitance [10] is designed as $I_{SS1}<I_{SS2}$, its common-mode voltage is generated by $I_{SS2}$
                     driven by DC bias rather than $I_{SS1}$ driven by high-frequency common-mode. For
                     that reason, compared to the conventional differential amplifier, the ripple of the
                     common-mode voltage can be reduced, and the high-frequency common-mode of the proposed
                     PWAM signal can be blocked. Therefore, in order for the circuit based on the conventional
                     differential amplifier to work properly, the CM blocking circuit must be the first
                     stage of the CLK sampler.
                  
                  As shown in Fig. 11, the CTLE designed as $I_{SS1}>I_{SS2}$ becomes the second stage of the CLK sampler
                     to suppress the ISI induced by channel loss, and VGA is followed to compensate for
                     the signal amplitude reduced by the CM blocking circuit.
                  
                  The PWM sampler, the last stage of the CLK sampler, extracts the reference clock (Rx-REF
                     CLK) from the differential-mode of the proposed PWAM signal. In addition, Fig. 12 shows the operation process of the PWM sampler through only amplification and digital
                     operations without any feedback topology, and its operation process is as follows:
                     1) The differential amplifier with cross-coupled PMOS load and resistor load amplifies
                     the differential input so that one of the positive and negative signals is at a level
                     below the inverter logic threshold. 2) The amplified positive and negative signals
                     are inverted with the inverters. 3) When performing XOR operation on the inverted
                     positive signal and the negative signal, the reference clock (Rx-REF CLK) is extracted
                     from the proposed PWAM signal. Meanwhile, under a lossy channel environment, the reference
                     clock (Rx-REF CLK) may include data-dependent jitter, so the jitter should be filtered
                     by Rx-PLL.
                  
                  As shown in Fig. 13, Rx-PLL has a structure similar to that of the Tx-PLL. Also, a four-stage differential
                     ring VCO and DCC are employed in the Rx-PLL to demodulate the proposed PWAM signal
                     without the occurrence of bit error. However, the divider is excluded to generate
                     a full-rate clock. And a variable delay circuit (VDC) is added to minimize the phase
                     difference between the rising edge of the proposed PWAM signal and the rising edge
                     of the recovered clock (Rx-CLK0). Assuming that the phase offset of Rx-PLL is the
                     value of '0', the phase difference is caused by the delay ($\Delta T$) of the CLK
                     sampler as shown in Fig. 4. If it is not minimized, a bit error may occur during the demodulation of dual-mode
                     PAM-4 and PWM-2. Therefore, to minimize the phase difference, a method in which recovered
                     clocks (Rx-CLKs) is delayed by the time for $1\mathrm{UI}-\Delta T$ is used. That
                     is, as shown in Fig. 13, VDC should be designed to have a delay of $1\mathrm{UI}-\Delta T$. Additionally,
                     since a conventional CPPLL has low-pass characteristics with respect to the input
                     reference clock [11], the bandwidth of the Rx-PLL should be narrowly set to filter data-dependent jitter
                     of the reference clock (Rx-REF CLK), and high-order low-pass filter (LPF) should be
                     considered.
                  
                  As shown in Fig. 14, the flash ADC determines the thermometer codes from the proposed PWAM signal to
                     recover 4-bit parallel data, and it consists of a differential-mode PAM demodulator,
                     a common-mode PAM demodulator and a PWM demodulator. The differential-mode PAM demodulator
                     detects the differential-mode level with three threshold voltages ($\mathrm{V}_{\mathrm{DM},\mathrm{th}0},0,\,\,\mathrm{V}_{\mathrm{DM},\mathrm{th}3}$)
                     shown in Fig. 3(a), and it determines the three thermometer codes ($\mathrm{T}_{\mathrm{DM}}$<2:0>).
                     Also, in order for the differential-mode PAM demodulator to operate in the PAM window,
                     it should be operated by Rx-CLK90. The common-mode PAM demodulator detects th3.e common-mode
                     level with two threshold voltages ($\mathrm{V}_{\mathrm{CM}.\mathrm{th}0},\,\,\mathrm{V}_{\mathrm{CM}.\mathrm{th}1}$)
                     shown in Fig. 3(b), and it decides the two thermometer codes ($\mathrm{T}_{\mathrm{CM}}$<1:0>). Also,
                     the common-mode PAM demodulator should be operated by Rx-CLK180 which is aligned at
                     the center timing of the common-mode signal. The PWM demodulator detects the PWM signal
                     with two threshold voltages ($\mathrm{V}_{\mathrm{DM},\mathrm{th}1},\,\,\mathrm{V}_{\mathrm{DM},\mathrm{th}2}$)
                     and a threshold phase ($\mathrm{P}_{\mathrm{th}}$) shown in Fig. 3(a), and it determines the two thermometer codes ($\mathrm{T}_{\text{PWMP}},\,\,\mathrm{T}_{\text{PWMN}}$).
                     In order to demodulate Rx-bit3 information, the PWM demodulator should be operated
                     by Rx-CLK180 which is the threshold phase ($\mathrm{P}_{\mathrm{th}}$). In addition,
                     the slicer employed in the flash ADC is the track and regenerate slicer [12], which can be operated at higher speeds than the strong-arm slicer.
                  
                  The decoder converts the output codes of the flash ADC ($\mathrm{T}_{\mathrm{DM}}$<2:0>,
                     $\mathrm{T}_{\mathrm{CM}}$<1:0>, $\mathrm{T}_{\text{PWMP}}$ and $\mathrm{T}_{\text{PWMN}}$)
                     into binary codes, and it is implemented with standard CMOS logic and the truth table
                     of Table 2. Then, the four retimers recover the binary codes, and their outputs become 4-bit
                     parallel data (Rx-bit0, Rx-bit1, Rx-bit2 and Rx-bit3).
                  
                  In this work, the threshold voltages required in the demodulators is generated by
                     a resistor ladder, and each threshold voltage level is as follows: three threshold
                     voltages for differential-mode PAM ($\mathrm{V}_{\mathrm{DM},\mathrm{th}0},0,\,\,\mathrm{V}_{\mathrm{DM},\mathrm{th}3}$)
                     are $-3I\cdot R_{L},0,+3I\cdot R_{L},$ two threshold voltages for common-mode PAM
                     ($\mathrm{V}_{\mathrm{CM}.\mathrm{th}0},\,\,\mathrm{V}_{\mathrm{CM}.\mathrm{th}1}$)
                     are $V_{DD}-2I\cdot R_{L},$ $V_{DD}-4I\cdot R_{L}$, and two threshold voltages for
                     PWM ($\mathrm{V}_{\mathrm{DM},\mathrm{th}1},\,\,\mathrm{V}_{\mathrm{DM},\mathrm{th}2}$)
                     are $-I\cdot R_{L},\,\,I\cdot R_{L}$.
                  
                  
                        Table 2. Truth table for the decoder
 
                  
                        Fig. 11. CLK sampler: CM blocking circuit, CTLE, VGA, PWM sampler.
 
                  
                        Fig. 12. Timing diagram for PWM sampler
 
                  
                        Fig. 13. Rx-PLL based on a conventional charge pump phase-locked loop
 
                  
                        Fig. 14. Flash ADC: differential mode PAM demodulator, common mode PAM demodulator, PWM demodulator.
 
                
             
            
                  III. SIMULATION RESULTS
               To verify the power efficiency of the proposed PWAM signaling scheme, the 10-Gb/s
                  transceiver was designed in a 180 nm CMOS process. In addition, FR4 type 315 mm channel
                  was used for verification, and PRBS31's 10-Gb/s serial data and 250 MHz external reference
                  clock (REF CLK) were applied to the transmitter inputs.
               
               Fig. 15 shows the simulated S21 of the channel to verify. In this work, the proposed transceiver
                  is designed to target 10Gb/s. So, the differential-mode frequency of the proposed
                  PWAM signal is approximately 3.34 GHz and the channel loss at that frequency is -6.08
                  dB. Also, the common-mode frequency of the proposed PWAM signal is 1.25 GHz and the
                  channel loss at that frequency is -2.72 dB. That is, since the minimum pulse width
                  ($=1.5T_{b}$) of the differential-mode is shorter than that ($=4T_{b}$) of the common-mode,
                  the channel loss of the differential-mode has a relatively large value compared to
                  that of the common mode.
               
               Fig. 16 shows the simulated Tx-PWM signal eye-diagram. The duty cycle of the Tx-PWM signal
                  is 38.2\% or 64\%, and it verifies that the Tx-PWM signal is modulated by Tx-bit3.
                  Also, the peak-to-peak jitter of Tx-PWM is 5.02 ps.
               
               Fig. 17 shows the simulated eye-diagram of the transmitter output. And it shows the differential-mode
                  and common-mode are generated by the PAM driver, and the voltage difference (${\Delta}$V)
                  between each adjacent level is approximately 200 mV. In addition, Fig. 17(a) shows that the differential-mode is synchronized to the Tx-PWM signal. Meanwhile,
                  the glitch shown in Fig. 17(b) may occur due to the operation of 2 to 1 MUX array to make the minimum pulse width
                  of the common-mode 4$T_{b}$. And the glitch appearing in common-mode causes an unstable
                  zero level shown in Fig. 17(a). However, since the glitch is a very high-frequency component of 30 GHz or higher,
                  it can be filtered by the channel. As shown in Fig. 18(b), the glitch is suppressed by channel loss. So, as shown in Fig. 18(a), the unstable zero level induced by the glitch rarely appears in the differential-mode
                  of the receiver input. 18(a). That is, the unstable zero level does not affect the
                  middle eye and BER.
               
               Fig. 18 shows the simulated eye-diagram of the receiver input. Due to the channel loss of
                  6.08 dB at 3.34 GHz, the voltage difference (${\Delta}$V) in PAM window is approximately
                  100 mV. Also, the voltage difference (${\Delta}$V) in common-mode is approximately
                  124 mV due to a channel loss of 2.72 dB at 1.25 GHz. That is, it is larger than the
                  voltage difference (${\Delta}$V) of differential-mode. Therefore, this analysis shows
                  that, under a lossy channel environment, differential-mode operation is more critical
                  for BER performance than common-mode operation.
               
               Fig. 19 shows the simulated eye-diagram for common-mode voltage of CM blocking circuit. Because
                  of the CM blocking circuit, the high-frequency common-mode of the proposed PWAM signal
                  rarely appears in the output node of the CM Blocking circuit. In other words, it can
                  be blocked.
               
               Fig. 20 shows the simulated eye-diagram of Rx-REF CLK. And it shows that Rx-REF CLK can be
                  extracted by only amplification and digital operations without any feedback system,
                  and the simulated peak-to-peak jitter of Rx-REF CLK is 51.82 ps.
               
               The simulated eye-diagram of recovered clock (Rx-CLK0) is shown in Fig. 21, and the simulated peak-to-peak jitter of Rx-CLK0 is 12.53 ps. Since the PLL removes
                  the jitter for the input reference clock [11], Rx-CLK0 has a smaller jitter compared to the jitter of Rx-REF CLK. Additionally,
                  Fig. 21 shows that the phase difference between the differential-mode PWAM signal and Rx-CLK0
                  is almost '0' by VDC having a delay of $1\mathrm{UI}-\Delta T$.
               
               Among the four-bit recovered data (Rx-bit0, Rx-bit1, Rx-bit2 and Rx-bit3), the eye-diagram
                  of Rx-bit0 is shown in Fig. 22. The simulated peak-to-peak jitter of the recovered data (Rx-bit0) is 11.52 ps.
               
               In this work, the supply voltage of the transceiver is 1.8 -V, and equalization is
                  not applied for better power efficiency. However, for Rx-REF CLK extraction, a small
                  equalization block was inserted in the CLK sampler.
               
               The transmitter for 10-Gb/s serial data transmission consumes 134 mW in a 180 nm CMOS
                  process. The Tx-PLL, the serial to parallel converter, the PWM driver, the PAM encoder,
                  and the PAM driver consume 16.26 mW, 4.43 mW, 16.39 mW, 24.2 mW, and 72.72 mW, respectively.
                  the receiver consumes 95 mW. The CLK sampler, the Rx-PLL, the flash ADC and the decoder
                  consume 32 mW, 34.29 mW, 14.4 mW, and 14.29 mW, respectively. Also, the power consumption
                  for each sub-block in the transmitter and receiver is shown in Fig. 23.
               
               Fig. 24 shows the normalized power consumption of the proposed 10-Gb/s transmitter designed
                  in a 180 nm CMOS process and a 65 nm CMOS process. The PAM driver reduces the power
                  consumption by 1.5 times only by supply voltage reduction without reducing the static
                  current for a fixed output swing. However, the power consumption is reduced by more
                  than 4 times because other circuits, including the PWM driver, are designed with a
                  standard CMOS logic. This analysis means that a standard CMOS logic has a greater
                  reduction in power consumption by technology scaling. This also suggests that the
                  proposed PWAM scheme, which includes PWM-2, over the existing 4-bit pulse modulation
                  schemes (e.g., PAM-16, dual-mode PAM-10) can further improve power efficiency by technology
                  scaling. Meanwhile, to verify the improvement of the power efficiency, the proposed
                  transmitter was also designed in a 65 nm CMOS process.
               
               The simulation results and performance of the transceiver employing the proposed PWAM
                  signaling scheme are summarized in Table 3 and it includes the performance of the transceiver for dual-mode PAM-10 [2], PWAM [4], PAM-16 [13], and PAM-4 [14-16] scheme introduced in the past.
               
               The power consumption of the 10-Gb/s transceiver employing the proposed scheme is
                  229 mW. Compared to dual-mode PAM-10 [2], the power consumption of the proposed PWAM transceiver with the same data rate and
                  the same 180 nm CMOS process was reduced by 1.86 times and the power efficiency was
                  improved by 1.86 times. This is because the proposed scheme has fewer differential
                  levels (X=5) than the dual-mode PAM-10 scheme.
               
               To compare other works [13-16] designed in different process, the relative power efficiency of the proposed transceiver
                  ($\mathrm{RPE}$) is defined as
               
               
               where $\mathrm{S}$ is the relative speed rate, $\mathrm{V}$ is the relative supply
                  voltage, $\mathrm{T}$ is '1' if the transmitter driver type of other work is the same
                  current-mode logic (CML) and is one over fours if it is the source-series-terminated
                  (SST) driver, $\mathrm{PE}_{\mathrm{Tx}}$ is the power efficiency of the proposed
                  transmitter, and $\mathrm{PE}_{\mathrm{Rx}}$ is the power efficiency of the proposed
                  receiver. For example, for 64-Gb/s transceiver [14], S is one over 6.4, V is one over twos, T is one over fours, $\mathrm{PE}_{\mathrm{Tx}}$
                  is 13.4 pJ/bit, and $\mathrm{PE}_{\mathrm{Rx}}$ is 9.5 pJ/bit. And, to consider the
                  device performance difference between a FinFET process and a CMOS process, V is considered
                  as the low supply voltage among the dual supply voltages of 64-Gb/s transceiver [14]. Therefore, the relative power efficiency of the proposed transceiver ($\mathrm{RPE}$)
                  for 64-Gb/s transceiver [14] is approximately 1 pJ/bit by Eq. (1), and it is smaller than 2.96 pJ/bit, the power efficiency of 64-Gb/s transceiver
                  [14] designed in the most advanced process among other works [13-16]. In the same way, the relative power efficiencies of the proposed transceiver for
                  PAM-16 [13], and PAM-4 [15,16] are 2.23 pJ/bit, 1.98 pJ/bit, and 1.14 pJ/bit, respectively, by Eq. (1). They are smaller than the power efficiencies of 2.38 pJ/bit, 4.92 pJ/bit, and 2.29
                  pJ/bit of PAM-16 [13] and PAM-4 [15,16]. Therefore, it is suggested that the proposed scheme further improves power efficiency
                  by technology scaling.
               
               To check bit errors in the modulation and demodulation process of the proposed transceiver,
                  the simulation was additionally performed by a delay circuit and an XOR circuit under
                  a noisy power supply environment. If Tx-bits are delayed by a delay circuit having
                  the propagation delay of the transceiver and channel, the delayed Tx-bits will be
                  synchronized with Rx-bits. That is, the bit errors can be confirmed by XOR operating
                  them. The simulation result for checking bit error showed that all four outputs of
                  the XOR circuits showed a value of '0'. Therefore, no bit error occurred during modulation
                  and demodulation of the transceiver.
               
               
                     Table 3. Performance summary and comparison
 
               
                     Fig. 15. Simulated S21 of the channel.
 
               
                     Fig. 16. Simulated eye-diagram for Tx-PWM signal.
 
               
                     Fig. 17. Simulated eye-diagram of the transmitter output: (a) differential-mode; (b) common-mode.
 
               
                     Fig. 18. Simulated eye-diagram of the receiver input: (a) differential-mode; (b) common-mode.
 
               
                     Fig. 19. Simulated eye-diagram for common-mode voltage of CM blocking circuit.
 
               
                     Fig. 20. Simulated eye-diagram of Rx-REF CLK.
 
               
                     Fig. 21. Simulated eye-diagram of differential-mode PWAM signal and recovered clock (Rx-CLK0).
 
               
                     Fig. 22. Simulated eye-diagram of the recovered data (Rx-bit0).
 
               
                     Fig. 23. The power consumption for each sub-block: (a) transmitter; (b) receiver.
 
               
                     Fig. 24. Normalized power consumption of the proposed 10-Gb/s transmitters designed in a 180 nm CMOS process and a 65 nm CMOS process.
 
             
            
                  IV. CONCLUSIONS
               This paper proposed a novel PWAM signaling scheme, which combines a dual mode PAM-4
                  and a PWM-2. The proposed scheme improves the insufficient minimum pulse width of
                  the conventional PWAM to enable high-speed data transmission. In addition, since the
                  4-bit/symbol proposed scheme uses only 5 differential levels compared to the existing
                  4-bit/symbol PAM scheme (e.g., PAM-16, dual-mode PAM-10), the power consumption of
                  the transceiver can be reduced. Also, due to PWM-2, the proposed scheme further can
                  improve power efficiency by technology scaling.
               
             
          
         
            
                  ACKNOWLEDGMENTS
               
                  				This research was supported by the National Research Foundation of Korea (NRF)
                  (No.2020R1F1A1077088), National R&D Program through the National Research Foundation
                  of Korea (NRF) funded by Ministry of Science and ICT (No. 2020M3H2A1076786), and the
                  MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology
                  Research Center) support program (IITP-2021-0-02052) supervised by the IITP (Institute
                  for Information & Communications Technology Planning & Evaluation). Authors also thank
                  the IDEC program and for its hardware and software assistance for the design and simulation.
                  			
               
             
            
                  
                     References
                  
                     
                        
                        Granberg T., 2004, Handbook of Digital Techniques for High-Speed Design, Englewood
                           Cliffs, NJ: Prentice Hall PTR

 
                      
                     
                        
                        Song B., Kim K., Lee J., Burm J., Feb. 2013, A 0.18 ${\mu}$m CMOS 10- Gb/s Dual-Mode
                           10-PAM Serial Link Transceiver, Circuits and Systems I, IEEE Transactions on, Vol.
                           60, No. 2, pp. 457-468

 
                      
                     
                        
                        Chen W.-H., Dehng G.-K., Chen J.-W., Liu S.-I., Oct. 2001, A CMOS 400-Mb/s serial
                           link for AS-memory systems using a PWM scheme, Solid-State Circuits, IEEE Journal
                           of, Vol. 36, No. 10, pp. 1498-1505

 
                      
                     
                        
                        Yang C.-Y., Lee Y., May. 2008, A PWM and PAM Signaling Hybrid Technology for Serial-Link
                           Transceivers, Instrumentation and Measurement, IEEE Transcations on, Vol. 57, No.
                           5, pp. 1058-1070

 
                      
                     
                        
                        Jung M., Fuhrmann J., Ferizi A., Fischer G., Weigel R., Ussmueller T., Dec. 2011,
                           Design of a 12 GHz Low-Power Extended True Single Phase Clock (E-TSPC) Prescaler in
                           0.13${\mu}$m CMOS technology, Microwave Conference 2011, 2011. APMC 2011. IEEE Asia-Pacific,
                           Vol. 5, No. 8, pp. 1238-1241

 
                      
                     
                        
                        Cheng H., Musa F. A., Carusone A. C., Aug. 2009, A 32/16-Gb/s Dual-Mode Pulsewidth
                           Modulation Pre-Emphasis (PWM-PE) Transmitter With 30-dB Loss Compensation Using a
                           High-Speed CML Design Methodology, Circuits and System I, IEEE Transacations on, Vol.
                           56, No. 8, pp. 1794-1806

 
                      
                     
                        
                        Farjad-Rad R., Yang C.-K. K., Horowitz M. A., Lee T. H., May. 1999, A 0.4- ${\mu}$m
                           CMOS 10-Gb/s 4-PAM pre-emphasis serial link transmitter, Solid-State Circuits, IEEE
                           Journal of, Vol. 34, No. 5, pp. 580-585

 
                      
                     
                        
                        Razavi B., 2001, Design of Analog CMOS Integrated Circuits, New York: McGraw-Hill

 
                      
                     
                        
                        Gondi S., Razavi B., Sep. 2007, Eqaulization and Clock and Data Recovery Techniques
                           for 10-Gb/s CMOS Serial-Link Receivers, Solid-State Circuits, IEEE Journal of, Vol.
                           42, No. 9, pp. 1999-2011

 
                      
                     
                        
                        Lim B., Yoo C., Nov. 2017, A 12-Gb/s Continuous-time Linear Equalizer with Offset
                           Canceller, Semiconductor Technology and Science, IEIE Journal of, Vol. 19, No. 2,
                           pp. 220-226

 
                      
                     
                        
                        Gardner F. M., 2005, Phaselock Techniques, 3$^{\mathrm{rd}}$ ed. Hoboken

 
                      
                     
                        
                        Chen K. -C., Kuo W. W. -T., Emami A., Mar. 2021, A 60- Gb/s PAM4 Wireline Receiver
                           With 2-Tap Direct Decision Feedback Equalization Employing Track-and-Regenerate Slicer
                           in 28-nm CMOS, Solid-State Circuits, IEEE Journal of, Vol. 56, No. 3, pp. 750-762

 
                      
                     
                        
                        Celik F., Akkaya A., Leblebici Y., Feb. 2021, A 32 Gb/s PAM-16 Tx and ADC-Based Rx
                           AFE with 2-tap embedded analog FFE in 28 nm FDSOI, Microelectronics Journal, Vol.
                           108, pp. Aritcle 104967

 
                      
                     
                        
                        Wang L., Fu Y., LaCroix M., Chong E., Carusone A. C., Mar. 2018, A 64Gb/s PAM-4 transceiver
                           utilizing an adaptive threshold ADC in 16nm FinFET, Solid-State Circuits, IEEE International
                           Coference on, pp. 110-111

 
                      
                     
                        
                        Depaolio E., et al. , Jan. 2019, A 64 Gb/s Low-Power Transceiver for Short-Reach PAM-4
                           Electrical Links in 28-nm FDSOI CMOS, Solid-State Circuits, IEEE Journal of, Vol.
                           54, No. 1, pp. 6-17

 
                      
                     
                        
                        Ye B., et al , Feb. 2022, A 2.29pJ/b 112Gb/s Wireline Transceiver with RX 4-Tap FFE
                           for Medium-Reach Applications in 28nm CMOS, Solid-State Circuits, IEEE International
                           Coference on, pp. 118-119

 
                      
                   
                
             
            
            
               			HwanUng Kim received the B.S. degree in Electronic Engineering from Inha University,
               Incheon, South Korea, in 2021. He is currently pursuing the M.S degree in Electrical
               and Computer Engineering with Inha University. His research interests include PLL,
               CDR, high-speed serial interface, and transceiver design for PAM/PWM signaling
               		
            
            
            
               			Jin-Ku Kang received the Ph.D. degree in electrical and computer engineering from
               North Carolina State University, Raleigh, NC, USA. From 1983 to 1988, he was with
               Samsung Electronics, Inc., South Korea, where he was involved in memory and ASIC design.
               In 1988, he was with Texas Instruments, South Korea. From 1996 to 1997, he was with
               Intel Corp., Portland, OR, USA, as a Senior Design Engineer, where he was involved
               in high-speed I/O and timing circuits for microprocessors. Since 1997, he has been
               with Inha University, Incheon, South Korea, where he is currently a professor and
               leads the System IC Design Laboratory in the Department of Electronics Engineering.
               His research interests include high-speed/low-power mixed-mode circuit design for
               high-speed serial interfaces.