MinKyunghwan1
LeeSanggeun1
OhTaehyoun1*
-
(Department of Electronic Engineering, Kwangwoon University, 615, Bima, 20, Gwangun-ro,
Nowon-gu, Seoul 139-701, Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
CDR, CMOS, data rate, HDMI, high-speed, integrated circuit, IO, loop, majority vote
I. INTRODUCTION
From requirement of high per-pin data speed between chip-to-chip communications, CDR
circuits generate an optimized clock timing aligned to incoming random data with a
small unit interval (UI). A CDR loop has been traditionally designed using charge
pump-based loop filters along with analog voltage-controlled oscillators (VCO) (1-3). Without careful design on the supply node, the control voltage can be coupled to
the supply and it can be modulated to phase noise at the VCO output. In the digital-friendly
circuits, the process-voltage-temperature variation and the effect of supply noise
can be mitigated, and portability to a newer process can be improved as well (4). The digital CDRs shown in (5-7) that operate with digital loop filters (DLF) have a good portability. IO schemes
for many applications, such as DP, HDMI, PCI express, should support multi-channel
timing recovery in receivers where frequency offsets between channels are zero. Implementing
multiple PIs for each channel and sharing one oscillator can allow to achieve efficient
power and area performances (8,9). PI-based CDRs should be able to catch up instantaneous frequency offsets between
incoming data and recovered clock for resilient timing alignment for each channel.
The integral path gain of each CDR with a bang-bang phase detector (PD) decides the
frequency catch-up speed (10). In this paper, we propose a 6 Gbit/s PI-based all-digital CDR that can support multi-channel
implementation for HDMI 2.0 standard. The suggested majority voting logics update
the digital loop filter without wasting deserialized edge information and the smooth
movement of PI control signal can be achieved. As a result, a low jitter performance
has been measured when the loop locks. Due to the usage of two’s complement format
for the voted data, the counter blocks that only accumulates the vote can be obviated.
Binary shifting gain extender at the end of the integration accumulators allows us
to reduce the size of adders and to achieve a wide gain control range, while the previous
scheme accumulate the data using counter (11,12). In the circuit’s perspective, we have made a new slew rate control approach for
PI to improve the DNL of the output clock phases. The better linearity between input
digital control of the PI and output phase shifting steps contributes a stable loop
gain. A Gray mapping can remove the abrupt transition of the codes.
The rest of the paper is organized as follows. Section II reviews the overall architecture
of the proposed PI-based digital CDR. Section III presents the majority voting logics
used for deserialized data. The section shows our DLF scheme, gain range extending
structure, circuit diagram of our PI and the slew rate control scheme as well. Section
IV shows the measurement results of our IP and Section V concludes this paper.
II. ARCHITECTURE
Fig. 1(a) presents the proposed 6 Gbit/s majority voting-assisted CDR architecture. As shown
in Fig. 1(b), three half-rate strong-arm latches triggered by 3 GHz clocks at 0°, 180° and 90°
timing, sample 3 Gbit/s ODD, EVEN, EDGE data, respectively. The early/late info is
updated from the EDGE data when there is a transition in the DATAIN signal. In that
case, either ODD or EVEN data is sampled as data 1 and thus the XOR gating of these
two signals is one. 3:24 de-serializers transform 3×3 Gbit/s data into 24×375 Mbit/s
parallel data and mitigate the power consumption and timing aligning difficulty from
high-speed operation. The low-speed parallel CDR logics generate 8 EARLY and 8 LATE
data from the previously sampled and deserialized ODD, EVEN, EDGE data. Integration
of EARLY [7:0] and LATE [7:0] without losing EARLY/LATE info, requires 8 independent
DLFs which increase power and chip area considerably. The majority voter reduces the
parallel early/late data into 1-bit early/late data and thus the effect of parallel
early/late data from parallel CDR logics can be statistically reflected on the 1-bit
early/late signal (11,12). In the DLF, the programmable gain KP and KI control the closed loop bandwidth and
jitter tolerance of CDR. The 5 LSB (least significant bit) bits out of 7-bit binary
DLF outputs are encoded as a thermometer format in the binary-to-thermometer (B2T)
block. The phase interpolators rotate the recovered clock phases from 0° to 360° for
an optimal timing alignment. In front of the PI, the slew rate of clock signals coming
from the oscillator is controlled by slew control blocks and the linearity of phase
shifting steps is improved significantly. For a real-time measurement purpose, the
7-bit DAC monitors the DLF output in the analog data format. The PRBS checker with
a BER calculator confirms if the received data accord with the transmitted data.
Fig. 1. (a) Architecture of the proposed PI-based CDR, (b) Illustration of the waveforms
for 3 half-rate sampler latches at the front-end.
III. CIRCUIT DESCRIPTION
1. Voting Logics
Fig. 2(a) shows a logic circuit to decide the updating sign and Fig. 2(b) shows the majority voting logics. The EARLY [n] and LATE [n] signals are transformed
to the voting number (+1:UP / 0:HOLD / ̶ 1:DOWN) by using Sign[n] and Mag[n] signals,
where n is the integer number from 0 to 7. When both EARLY [n] and LATE [n] are identical
in which case clock signal and data signal are aligned, the output Sign [n] Mag [n]
= 00 and no vote is contributed on Sum [4:0] signal. In the majority voting logics,
the 8-bit voting numbers (+1 / 0 / ̶ 1) are summed up and the result spans the range
from +8 to ̶ 8, which requires the Sum signal to have 5-bit two’s complement representation.
+1 ~ +8 / 0 / ̶ 1 ~ ̶ 8 finally make the VEARLY, VLATE to be [10], [00], [01], respectively.
When the CDR loop locks and Sum [4:0] stay on [00000], no update is made on the input
of the following DLF. The EARLY [7:0] and LATE [7:0] signals are the recovered deserialized
data [15:0] in our half-rate scheme. The built-in PRBS checker can measure BER to
the range of 2-40 ≈ 10-12 level (13).
Fig. 2. Majority voting logic (a) binary mapping, (b) logic diagram.
2. Digital Loop Filter Scheme
Fig. 3presents our DLF circuit schematic. VEARLY and VLATE bits mapped in two’s complement
form are scaled by proportional gain (KP) and integral gain (KI) and the values are
controlled via binary shifting as shown in Fig. 3. Expanding gain range of KP and KI has advantages of wide catch-up speed options
for both phase offset and frequency offset in presence of a non-linear bang-bang PD
(14). Increasing KI gain enables the loop to catch large frequency offset but concurrently
widens the closed loop bandwidth of CDR and input jitter suppression effect is mitigated.
The gain extender located in a proper position along the path does additional binary-shifting
to increase the KI range without concentrating large sized adders on the 1st accumulator
only. If data and recovered clock have a frequency offset, the gain shifting degree
in the gain extender controls follow-up speed as well. In a low frequency offset and
large jitter environment at the input, the loop bandwidth is reduced by decreasing
KI gain. As like the scheme shown in (15,16), a control signal of the PI (PI_CODE [6:0]) is monitored via the 7-bit DAC and the
control patterns are sent out to the measurement equipment through an on-chip pad
in the analog format.
Fig. 3. Digital loop filter (DLF) circuit diagram and illustration of gain control
procedure.
3. Phase Interpolator and Slew Rate Control
Fig. 4(a) shows the circuit schematic diagram of our 7-bit PI that recovers the required timing
by manipulating quadrature input clocks from 3 GHz oscillator. A current-steering
PI can provide a highly linear phase shift (17,18). The resolution of phase interpolators depends on the binary bit number of PI_CODE
[6:0] and increasing bit number aggravates the circuit complexity. As shown in the
illustration of Fig. 4(b) our PI shows the resolution of 2.8 degree/LSB. To reduce a glitch, QUAD [1:0] are
generated from 2 MSB bits of PI_CODE [6:0] using Gray-mapping. The rest PI_CODE [4:0],
the binary LSB codes, are transformed to a thermometer code THERM [30:0] for fine
DAC switching (IODD, IEVEN). Fig. 4(c) shows a block diagram of overall PI circuits for half-rate CDR. 3 GHz quadrature
clock signals are generated and provided from an oscillator for phase interpolation.
The slew control blocks improve the linearity by increasing the slew rate, but the
power performance should be traded-off. The half-rate CDR uses both CLK 0° / CLK 180°
and CLK 90° for ODD/EVEN data sampling and EDGE data sampling, respectively. Since
one PI can generate only two clock phases - 0°/180° or 90°/270°, two PIs are required
to operate concurrently to all the required recovered clocks (0°, 180°, 90°). For
initial CDR lock, CLK 0° and CLK 90° move simultaneously and 90° phase difference
between 0°/180° and 90°/270° is maintained by adding 32 to the PI mapping block in
the front.
Fig. 4. (a) Circuit schematic of 7-bit phase interpolator, (b) Quadrature mapping
of phase interpolator, (c) Description of a scheme for 2 phase interpolators for 0°,
180° and 90° clock timing recovery.
Fig. 5(a) presents an architecture description of 2 slew rate control blocks for CLK 0°/180°
PI, CLK 90°/270° PI. For slew rate control blocks to have equivalent loadings/timing
delays, the clock signal CLK 0°/90°/180° /270° drive equally distributed gate loadings
at PI inputs. The PI control code and shifted phase at the PI output have a non-linear
relation due to the non-linear characteristics of devices and signals. Maintaining
a good linearity for all range of control code results in a constant loop gain and
stabilizes the loop transfer function. As shown in
Fig. 5(b), 2-bit slew rate control blocks have been designed and placed at the input of PI
block. The slew rate is controlled by turning on and off the segment unit. When the
segmented block is disabled, the current mirror is switched off by S1 and the loadings
by S2 and S3. As the number of enabled block increases, gm grows. Thus, the output
clock signal makes a transition sharply. The graph on
Fig. 5(b) shows the simulation results of the mapped PI_CODE [6:0] versus shifted output phase
DNL in LSB unit for various slew options. Where N ranges from 0 to 127. As SLEW_THERM
[3:0] increases from 0001 to 1111, the standard deviations of the DNLs are improved
as 0.95, 0.86, 0.75 and 0.54 LSB. Using a fast slew for incoming quadrature clocks
improves the linearity. However, the power increases as the number of enabled blocks
increases. In our CDR, each segment unit consumes 0.285 mW. To achieve 0.86 DNL performance
and 0.6 mW concurrently, we enable 2 segment units in the slew rate control blocks.
Since the input of the block, SLEW_THERM [3:0], is thermometer coded. We enable 2
units out of 4 units.
Fig. 5. (a) Circuit description of 2 slew rate control blocks and CLK 0°/180° PI,
CLK 90°/270° PI, (b) Schematic for slew rate control, (c) DNL simulation results of
phase shifting linearity.
IV. MEASUREMENT RESULTS
Fig. 1(a) shows the measurement nodes of our proposed CDR. To test our CDR, 6 Gbit/s 231-1
PRBS NRZ signal with 2.05 ps RMS jitter and 1 Vdpp swing at the input is generated
by Synthesis Research BERT 7500B, as shown in Fig. 6(a). Fig. 6(b) presents the recovered clock jitter when the loop is locked and Tektronix TDS 6154C
oscilloscope is used for measurement. The measured peak-to-peak and RMS jitter of
the recovered clock (divided by 16) are 12.2 ps and 1.826 ps, respectively. The phase
noise of recovered clock is measured by HP E4401B and is measured as -114.72 dBc/Hz
at 1 MHz, as shown in Fig. 6(c). The lock pattern of the digital control of PI input is measured via a 7-bit DAC
output via a signal from on-chip pad when the loop is initially turned on and finds
the lock position, as shown in Fig. 6(d). The measured lock time of the loop is 54.5 ns. The built-in PRBS checker shows the
BER of under 10-12 at 6 Gbit/s at the centre of data eye. Fig. 7 shows the die photograph of the proposed PI-based all digital CDR. The prototype
has been fabricated in 65 nm CMOS process and occupies 0.073 mm2 chip area (excluding PRBS checker). In Table 1, the measurement results of the proposed CDR are summarized and compared to the prior
arts. (3) presents a CDR scheme charge pump-based loop filter and it uses an analog Vcontrol.
The chip area is comparably large to reduce the ripple on the control voltage. (6,8) have shown digital-type CDRs with digital loop filters and (8) generates the clock source from an analog charge pump PLL with a ring VCO. Our CDR
is a digital filter-based and aligns the linearized PI output phase to data timing
with a low jitter performance by an assistance from majority voting logics. The proposed
CDR shows 17.4 mW of power consumption at 6 Gbit/s and the best jitter performances
among the results of compared papers.
Fig. 6. (a) 6 Gbit/s input NRZ signal used for the measurement, (b) Recovered output
clock jitter during steady state, (c) Measured output phase noise of the recovered
clock, (d) Measured lock time of the CDR loop.
Fig. 7. Die photograph of the proposed CDR.
Table 1. Performance comparison table
|
(3)
|
(6)
|
(8)
|
This work
|
Technology
|
65 nm
CMOS
|
130 nm
CMOS
|
180 nm
CMOS
|
65 nm
CMOS
|
Architecture
|
Analog filter-based CDR
|
Digital filter-based CDR
|
Digital filter-based CDR
|
Digital filter-based CDR
|
Supply (V)
|
1.0
|
1.2
|
1.4
|
1.0
|
Data rate
(Gbit/s)
|
0.75-3.0
|
1.0-4.0
|
0.2 - 4.0
|
6.0
|
Peak-to-peak
Jitter (ps)
|
37.2
(@ 3.0 Gbit/s)
|
29.2
(@ 3.0 Gbit/s)
|
115.1
(@ 2.0 Gbit/s)
|
12.2
(@ 6 Gbit/s)
|
RMS
Jitter (ps)
|
5.69(@ 3.0 Gbit/s)
|
3.58(@ 3.0 Gbit/s)
|
28(@ 2.0 Gbit/s)
|
1.83(@ 6 Gbit/s)
|
BER
|
< 10-12
|
< 10-14
|
< 10-12
|
< 10-12
|
Power (mW)
|
15.5
|
11.4
|
14
|
17.4
|
FoM
(mW/Gbit/s)
|
5.1
|
3.8
|
7
|
2.9
|
Area (mm2)
|
0.35
|
0.074
|
0.8
|
0.073
|
V. CONCLUSIONS
A half-rate PI-based all-digital CDR has been proposed. The segmented slew rate scheme
improves the linearity of the phase steps. The proposed CDR is designed with all digital
scheme and can be ported to other processes with reduced manpower. Our CDR consumes
17.4 mW power from 1.0 V supply at 6 Gbit/s. The prototype CDR occupies 0.073 mm2 chip area and has been fabricated in 65 nm CMOS process.
ACKNOWLEDGMENTS
The work reported in this paper was supported by the National Research Foundation
of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2020R1F1A1057497)
and present research has been conducted by the Excellent researcher support project
of Kwangwoon University in 2021. The EDA Tool was supported by the IC Design Education
Center.
REFERENCES
Shivnaraine R., July 2014, An 8-11 Gb/s Reference-Less Bang-Bang CDR Enabled by "Phase
Reset", IEEE Trans. on Circuits and Systems-I, Vol. 61, pp. 2129-2138
Kiaei A., Sep 2009, A 10 Gb/s NRZ receiver with feedforward equalizer and glitch-free
phase-frequency detector, Proceeding of European Solid-State Circuits Conference,
pp. 372-375
Jin J., Oct 2018, A 0.75-3.0-Gb/s Dual-Mode Temperature-Tolerant Referenceless CDR
With a Deadzone-Compensated Frequency Detector, IEEE J. Solid-State Circuits, Vol.
53, pp. 2994-3003
Elshazly A., A 0.4-to-3 GHz Digital PLL With PVT Insensitive Supply Noise Cancellation
Using Deterministic Background Calibration, IEEE J. Solid-State Circuits, Vol. 46,
No. 12, pp. 2759-2771
Shu G., feb 2014, A 4-to-10.5 Gb/s 2.2 mW/Gb/s continuous rate digital CDR with automatic
frequency acquisition in 65nm CMOS, in IEEE Int. Solid-State Circuits Conf. Tech.
Dig., San Francisco, CA, pp. 150-151
Song H., Oct 2010, A 1.0-4.0-Gb/s all-digital CDR with 1.0-ps resolution DCO and adaptive
proportional gain control, IEEE J. Solid-State Circuits, Vol. 46, pp. 424-434
Sonntag J.L., Stonick J., July 2006, A Digital Clock and Data Recovery Architecture
for Multi-Gigabit/s Binary Links, IEEE J. Solid-State Circuits, Vol. 41, pp. 1867-1875
Hanumolu P. K., Wei G., Moon U., Jan 2008, A Wide-Tracking Range Clock and Data Recovery
Circuit, IEEE J. Solid-State Circuits, Vol. 43, pp. 425-439
Kromer C., Nov 2006, A 25-Gb/s CDR in 90-nm CMOS for High-Density Interconnects, IEEE
J. Solid-State Circuits, Vol. 41, pp. 2921-2929
Wenjing Yin , A 0.7-to-3.5 GHz 0.6-to-2.8 mW Highly Digital Phase-Locked Loop With
Bandwidth Tracking, IEEE Journal of Solid-State Circuits, Vol. 46, pp. 1870-1880
Bueren R., Holzer D., Schmatz M., Nov 2008, 5.75 to 44 Gb/s quarter rate CDR with
data rate selection in 90nm bulk CMOS, European Solid-State Circuits Conference, Edinburgh,
pp. 166-169
Chen M., Dec 2011, A Fully-Integrated 40-Gb/s Transceiver in 65-nm CMOS Technology,
IEEE J. Solid-State Circuits, Vol. 47, pp. 627-640
Piplani S., Nov 2017, Test and Debug Strategy for High Speed JESD204B Rx PHY, IEEE
26th Asian Test Symp., Taipei, pp. 184-188
Wenjing Yin , Sept 2010, A 1.6mW 1.6ps-rms-jitter 2.5GHz digital PLL with 0.7-to-3.5GHz
frequency range in 90nm CMOS, in IEEE Custom Integrated Circuits Conference 2010 San
Francisco
Tokonami K., Kohira K., Ishikuro H., Aug 2015, Wave monitor for glitch detection and
skew adjusting in high-speed DAC, IEEE Int. Symp. on Radio Frequency Integration Technology,
Sendai, pp. 175-177
Huang S., Cao J., Green M. M., Feb 2014, An 8.2-to-10.3 Gb/s Full-Rate Linear Reference-less
CDR Without Frequency Detector in 0.18 μm CMOS, Int. Solid-State Circuits Conf. Tech.
Dig., San Francisco, CA, pp. 152-153
Francese P. A., Aug 2014, A 16 Gb/s 3.7 mW/Gb/s 8-Tap DFE Receiver and Baud-Rate CDR
With 31 kppm Tracking Bandwidth, IEEE J. Solid-State Circuits, Vol. 49, pp. 2490-2502
Gangasani G. R., July 2012, A 16-Gb/s Backplane Transceiver With 12-Tap Current Integrating
DFE and Dynamic Adaptation of Voltage Offset and Timing Drifts in 45-nm SOI CMOS Technology,
IEEE J. Solid-State Circuits, Vol. 47, pp. 1828-1841
Author
Kyunghwan Min received the Bachelor of Science (B.S.) in 2017. His research is focused
on high-speed wireline interface circuit design. During his M.S. degree, he has been
researching clock gene-ration circuits such as phase-locked loop, clock and data recovery
schemes and IO interface transceivers related to HDMI standard.
Sanggeun Lee received the B.S. degree in the department of electro-nic engineering
from Kwangwoon university, Korea, in 2020. He is currently pursuing the M.S. degree
in Kwangwoon university, Korea. His research interests include PLL, clock recovery
and high-speed IO circuits.
Taehyoun Oh (S’05) received the Bachelor of Science (B.S.) and Master of Science (M.S.)
degrees in Electrical Engineering from Seoul National University in 2005 and 2007,
respectively. He received his Ph.D. degree in Electrical Engine-ering from the University
of Minne-sota, Minneapolis under the supervision of Dr. Ramesh Harjani. His doctoral
research is focused on high-speed I/O circuits and architectures. During the summer
of 2010, he worked on I/O channel modeling at AMD Boston Design Center, MA. In the
fall semester of 2011, he researched on I/O architecture and jitter budgeting of the
link at Intel Corp., CA. From fall of 2012, he joined the IBM system technology group,
NY. and worked on performance verification of high-speed decision feedback equalizer
for server processors. Since spring of 2013, he joined at the department of electronic
engineering in Kwangwoon university in Seoul, Korea as an assistant professor. His
current research interest is focused on clock generation and high-speed interface
IC design.