HanJiho1
ShinChangyong2
-
(Department of Electronics Engineering)
-
(Department of Information and Communications Engineering,
Sun Moon University, 70, Sunmoon-ro 221 beon-gil, Tangjeong-myeon, Asan, Chungnam
31460, Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Clock synchronization, white rabbit, mixed-mode clock manager, synchronous ethernet, FPGA, 1000BASE-T
I. INTRODUCTION
Clock synchronization has been extensively studied to meet the timing requirements
of various networked or distributed systems in the applications such as telecommunications,
measurement and control, and high energy physics. To achieve higher accuracy, IEEE1588-2008
(1) has been standardized in the past decade and has presented Precision Time Protocol
(PTP) to replace Network Time Protocol (NTP) and to provide sub-microsecond accuracy.
In 2009, the European Organization for Nuclear Research (CERN) proposed White Rabbit
(WR) (2) to provide sub-nanosecond accuracy for particle accelerator equipment synchroni-zation.
Since then, various applications (3-5) including accelerators, synchrotrons, neutrino
detectors, cosmic ray detectors, and national time laboratories are trying to utilize
the great performance of WR.
A number of studies on WR implementation have been reported in the recent years. Some
notable works among them are summarized as follows. National Institute of Standards
and Technology (NIST) evaluated the use of WR-based time and frequency transfer within
their own campus and verified the calibration procedure (6). NIST tries to improve
the accuracy and precision of a real-time realization of Coordinated Universal Time
(UTC). Universidad de Granada evaluated the influence of system frequencies to the
overall clock synchronization accuracy (7). The authors modified WR PTP core (WRPC)
and Ethernet physical layer to implement programmable frequency solution on a WR-LEN
board. The paper shows that the higher system frequency of 250 MHz provides slightly
better synchronization accuracy of 97.619 ps (peak-to-peak of the clock skew) and
double link bandwidth. National Instruments presents the improvement of IEEE 1588
synchronization accuracy in 1000BASE-T systems (8). Some of clock-domain crossing
errors could be compensated by using digital dual-mixer time difference (DDMTD) based
on better understanding the clock relationships in physical layer transceivers. Note
that this is the only attempt reported so far to implement sub-nanosecond accuracy
over copper media. Although their test results show high accuracy of 460ps p-p, they
did not implement frequency synchronization, which is one of the main ideas of WR.
Sun Moon University presents a nanosecond-accuracy clock synchronization circuit for
IEEE 1588 using tapped delay lines (9). Experimental results show that the two nodes
share synchronized timing within the error between -0.74 ns and 0.89 ns.
This paper presents the White Rabbit implementation over copper media (1000BASE-T)
using mixed-mode clock managers (MCMMs) and s frequency transfer technique based on
Synchronous Ethernet (SyncE) (10). Frequency synchronization has been achieved by
using clock signals generated from the phase-locked loop (PLL) in the physical layer
transceiver. The clock synthesis circuit implemented in Xilinx 7 series FPGA replaces
the voltage controlled crystal oscillator (VCXO) and digital-to-analog converter (DAC)
components outside the FPGA to make the system implementation simple, easy, and low-cost.
Measurement results show the clock synchronization accuracy less than 100 ps has been
achieved.
The rest of this paper is organized as follows. Section II presents the architecture
and functions of the proposed clock synchronization circuit for 1000BASE-T White Rabbit.
In Section III, a clock synthesis circuit using adders and a MMCM implemented in a
Xilinx FPGA is described in detail. Section IV explains clock relationships among
master and slave devices and 1000BASE-T transceivers to transfer the reference frequency.
Section V shows the test setup and remarkable measurement results. Section VI concludes
the paper.
II. ARCHITECTURE OF THE PROPOSED 1000BASE-T WHITE RABBIT CIRCUIT
White Rabbit is one of the most advanced clock synchronization technologies to provide
large distributed systems with sub-nanosecond accuracy. As mentioned earlier, CERN
has developed WR in 2009, in order to synchronize thousands of devices in sub-atomic
particle acceleration facilities distributed over 20 km within very small time error.
WR uses SyncE to lock the clock frequencies at distant nodes and PTP of IEEE 1588
to share timing information. A two-way exchange of the PTP messages allows precise
adjustment of clock phase and offset. The link delay is known precisely via accurate
hardware timestamps and the calculation of delay asymmetry. A DDMTD using two digital
mixers with the same offset clock measures the frequency and phase of the Ethernet
media-dependent interface (MDI) clock in relation to the PTP clock domain.
Fig. 1. Overall architecture of the proposed circuit using MMCMs and SyncE for White
Rabbit over 1000BASE-T.
Fig. 1 shows the overall architecture of the proposed circuit for clock synchronization
using modified WRPC. Clock synthesis circuits and a Synchronous Ethernet frequency
transfer unit (FTU) are integrated in the FPGA to run WR over copper media. The former
synthesize a main reference 125 MHz clock (frequency-synchronized to the receive clock)
and an offset or helper loop frequency for DDMTD phase detectors. A clock synthesis
circuit implemented inside the FPGA replaces an external VCXO oscillator tuned by
DAC. In many WR design cases such as SPEC, SVEC, and SPEXI reported so far, soft PLL
requires two external VCXOs controlled by DACs. Section III describes the clock synthesis
process and circuit structure in detail. SyncE FTU converts WR message manager interface
into RGMII using a fast clock of 250 MHz for double data rate (DDR) transfer. Section
IV describes the proposed frequency transfer strategy and the detailed operation of
SyncE FTU.
WR message manager forms Ethernet frames from packets, which WRPC sends out, for a
low-level communication. It also decodes a data stream received from PHY into understandable
high-level packets. DMA engine performs the direct memory access (DMA) mechanism by
pushing PTP packets directly to WR message manager and fetching received messages
from the arbiter. It gets the transmission requests from the soft-core processor and
signalizes when a new packet has been received from Ethernet media access control
(MAC). Ethernet frame arbiter allows both DMA engine and a generic MAC to access the
data interface of the WR message manager. Since the interface is actually the Wishbone
bus operating in a pipelined mode, the module is a very simple Wishbone interconnect.
LatticeMicro32 (LM32) is a 32-bit, big-endian, Harvard architecture soft-core processor
optimized for FPGA chips. The original LM32 written in Verilog has been configured
to control the synchronization and operation of all modules inside the WR PTP Core.
The processor can pass bytes for sending, get received characters and configure a
baud rate using a Wishbone interface. The system information from the WR PTP daemon
can be outputted to the user console. Soft phase-locked loop synchronizes the frequency
of the local reference clock (125 MHz) to the receive clock recovered from a data
stream. It consists of two PLLs for helper and main loops, but actually only of the
measurement circuits providing through Wishbone parameters necessary for software
algorithm (DDMTD phase detectors). Dual-port RAM (DPRAM) services as data and instruction
memory for LM32 and as packet data memory for DMA engine at the same time. LM32 has
two Wishbone master interfaces, one for the instruction memory and one for the data
memory. Each module communicates one another via pipelined Wishbone interface.
The proposed circuit uses Ethernet over copper media (1000BASE-T) in order to provide
home and office networks with sub-nanosecond accuracy. Note that WR uses Ethernet
over optical media (1000BASE-X) since it has been developed to synchronize thousands
of accelerator devices distributed over long distance of tens of km. Greenstreet et
al. is the only attempt to achieve sub-nanosecond accuracy using copper media reported
so far. The attempt is not, however, WR implementation but an IEEE 1588 improvement
by using DDMTD and by reducing the number of clock domain crossings.
III. CLOCK SYNTHESIZER BASED ON MIXED-MODE CLOCK MANAGER
Most of WR implementations use 2 external VCXOs controlled by DACs for the soft PLL.
One generates a reference clock (frequency-synchronized to the physical layer clock),
while the other outputs an offset frequency ($f_{PLL}=\frac{N}{N+1}f_{clkA}$) for
DDMTD phase detectors. They make, however, the overall system expensive and complex
since all the components except the VCXOs and DACs are integrated inside a single
FPGA. Note that in particular, a very stable external VCXO to provide frequency control
of several ppm costs quite a lot.
A clock synthesis circuit implemented in the FPGA to replace the external VCXO and
DAC reduces a lot of the complexity and cost of the overall system. Although digital
clock synthesizer circuits have been studied for the recent decades (11,12), most
papers describe clock synthesizers as a small part of PLL or time-to-digital converter
(TDC). Moreover, synthesizable clock synthesizers are very rare since very accurate
frequency control requires full or semi-custom design. A fully synthesizable clock
generator with proper performance must be designed by hardware description languages
and implemented into the FPGA.
Fig. 2. Digital dual-mixer time difference (DDMTD) with a phase-locked loop (PLL),
two DFFs, a deglitcher, and counters.
Fig. 2 shows the structure of digital dual-mixer time difference circuit. It consists of
a phase-locked loop (PLL), two DFFs, a deglitcher & pulse shaping circuit, and counters
for phase difference averaging. Note that the DDMTD presented is fully synthesizable
since the entire circuit is written in Verilog. It computes the phase difference between
clkA and clkB by sampling them with the offset frequency of f$_{\mathrm{PLL}}$. Deglitcher
eliminates glitches around the signal transitions in the DFF outputs caused by jittery
clock inputs. Counters then determines if the signal stay constantly high or low for
a configured amount of clock cycles.
Fig. 3. Block diagram of mixed-mode clock manager (MMCM) in a Xilinx Kintex 7 FPGA
for clock synthesis.
Fig. 3 shows a detailed view of MMCM as a clock synthesizing resource of Xilinx 7 series
FPGAs. Input multiplexers select the reference and feedback clocks from either the
dedicated clock buffer outputs or interconnect. Each clock input has a programmable
counter divider (D). The phase-frequency detector (PFD) compares both phase and frequency
of the rising edges of the both clocks. PFD’s output drives the charge pump (CP) and
loop filter (LF) to generate a reference voltage. The PFD produces an up or down signal
to CP and LF to determine whether the VCO should operate at a higher or lower frequency.
The VCO produces eight output phases and one variable phase for fine-phase shifting.
A special counter (M) is also provided for fractional divide operation. M controls
the feedback clock of the MMCM, allowing a wide range of frequency synthesis.
IV. FREQUENCY TRANSFER USING SYNCHRONOUS ETHERNET TECHNOLOGY
Sub-nanosecond accuracy of WR is based on frequency synchronization using Synchronous
Ethernet, which is an ITU-T standard for computer networking that facilitates the
transference of clock signals over the Ethernet physical layer. Note that most of
WR applications are concentrated in science field for very long distance, which is
why they use optical media. We could not find WR implementation using copper media
reported.
Fig. 4. Frequency transfer through 1000BASE-T physical layer based on Synchronous
Ethernet.
Fig. 4 shows the proposed frequency transfer scheme between SyncE master and slave nodes.
The difference from the existing method of PTP is that the clocks are syntonized by
SyncE FTUs to improve the synchronization accuracy. First, two 1000BASE-T transceivers
are forced to be link master and slave by writing proper values into PHY internal
registers instead of decided in the auto-negotiation procedure. SyncE master generates
a reference clock from a local oscillator clock signal using an internal PLL. SyncE
slave then recovers the master’s reference clock from the incoming SYNC signals. The
transceiver sends RGMII clock to SyncE FTU for the clock synthesis circuit. The recovered
clock is syntonized but not synchronized to the clock in the master node, which means
the frequencies are the same, the phases are different.
NetFPGA-1G-CML is one of the most popular prototyping boards for computer network
devices. It provides a Xilinx Kintex-7 FPGA and 4 1Gb/s Ethernet interfaces, which
enables the development of a single port Gigabit Ethernet device or a 4-port Ethernet
switch using one of the cutting edge FPGAs. It is worth noting, that NetFPGA-1G-CML
is much more cost-effective than SPEC board developed by CERN. The FPGA-based system
allows users to develop designs that are able to process packets at line-rate, a capability
generally not afforded by software based approaches.
Four Realtek RTL8211E Ethernet transceivers are provided to interface network connections
via on-board RJ-45 connectors. Note that RTL8211E provides a PLL clock output, which
is not a general support by commercial Ethernet transceivers. The PLL clock output
is used in SyncE FTU for clock synthesis. IEEE 802.3 is going to define the generation
of the RX and TX timestamp triggers in the generic reconciliation sublayer to support
synchronization accuracy improvement. The standard committee is also defining transceiver
delay measurements to report the latencies between the MDI and GMII for the both directions.
Fig. 5. Measurement environment for the clock synchronization accuracy and precision.
SyncE FTU connects Ethernet interface of WRPC and RTL8211E transceiver. In packet
transmission, the massage data in 8-bit Ten Bit Interface (TBI) is decoded to detect
special characters, then converted into double data rate (DDR) 4-bit RGMII of the
half pin-count. In packet reception, the message data in 4-bit RGMII is converted
into single data rate (SDR), then encoded using 8B/10B scheme to make 8-bit TBI. To
meet the specific timing constraint, the skew among data and control signals has been
minimized and controlled to have the phase difference of 90 degrees from RXC the reference
clock.
V. MEASUREMENT RESULTS
The synchronization accuracy and precision of the 1000BASE-T WR circuit has been evaluated
in the measurement environment as shown in Fig. 5. Two NetFPGA-1G-CML boards are exchanging WR messages via a direct Gigabit Ethernet
connection. Unshielded twisted pair category-5 cable between them has the maximum
length of 100 m. The master has the most accurate and stable reference clock from
a function generator to broadcast its own timing information on SYNC messages. Receiving
the messages, the slave then estimates the phase and frequency offsets to synchronize
its local clock to the master’s timing. A digital phosphor oscilloscope with a sampling
rate of 20 GS/s records the clock skew between pulse per second (PPS) output signals
from the both nodes for 12 days.
Fig. 6. Oscilloscope screenshot that records the clock skew distribution of the slave
from the master’s reference clock.
As depicted in
Fig. 6, an oscilloscope screenshot indicates the rising edges of PPS$_{\mathrm{SLAVE}}$
are distributed over the range from -50.27 ps to 47.83 ps from the exact rising edge
of PPS$_{\mathrm{MASTER}}$. As a final result, the synchronization accuracy or peak-to-peak
clock skew is 98.10 ps and the precision or standard deviation is 10.78 ps. It is
remarkable that the histogram are made from more than a million PPS samples, which
is much more than most of the previous works. A mathematical model for the precision
would be obtained by modifying the offset equation of WR over optical fiber
(13).
Analysis on copper transceiver latencies and cable delays, and then comparison to
the measurement results should be presented in further research.
VI. CONCLUSIONS
A circuit implementation of White Rabbit over 1000BASE-T network environment has been
presented with a clock synthesis circuit implemented in a commercial FPGA device and
frequency transfer strategy based on Synchronous Ethernet. Realized and downloaded
onto a state-of-the-art FPGA, the proposed circuit was verified to achieve the peak-to-peak
clock skew less than 100 ps. Moreover, this technique uses gigabit Ethernet connections
over existing copper media without any additional hardware resource or modification.
That proves it is one of the very prominent solutions for cost-effective yet high-accuracy
clock synchronization in the future applications.
ACKNOWLEDGMENTS
This work was supported by the National Research Foundation of Korea (NRF) grant funded
by the Korea government (MSIT) (NRF-2017R1C1B5018418).
REFERENCES
2008, IEEE Standard for a Precision Clock Synchroniza-tion Protocol for Networked
Measurement and Control Systems, IEEE Std., pp. 1588-2008
Moreira P., Oct 2009, White rabbit: Sub-nanosecond timing distribution over Ethernet,
Precision Clock Synchronization for Measurement, Control, and Communication, 2009,
ISPCS 2009, 3rd IEEE International Sym. on, 12-16, pp. 1-5
Jiménez-López M., Feb 2019, A Fully Programmable White-Rabbit Node for the SKA Telescope
PPS Distribution System, Instrumentation and Measurement, IEEE Trans. on, Vol. 68,
No. 2, pp. 632-641
Ramos F., Gutiérrez-Rivas J., López-Jiménez J., Caracuel B., Díaz J., May 2018, Accurate
Timing Networks for Dependable Smart Grid Applications, Industrial Informatics, IEEE
Transactions on, Vol. 14, No. 5, pp. 2076-2084
de la Morena C., Jan 2018, Fully Digital and White Rabbit-Synchronized Low-Level RF
System for LIPAc, Nuclear Science, IEEE Trans. on, Vol. 65, No. 1, pp. 514-522
Savory J., Sherman J., Romisch S., May 2018, White Rabbit-Based Time Distribution
at NIST, Frequency Control Symposium, 2018. IFCS 2018, IEEE International, 21-24,
pp. 1-5
Girela-López F., Torres-González F., Díaz J., Ultra-accurate Ethernet time-transfer
with programmable carrier-frequency based on White Rabbit solution, Precision Clock
Synchronization for Measurement, Control, and Communication, 2017, ISPCS 2017, 11th
IEEE International Symposium on, pp. 36-41
Greenstreet R., Zepeda A., Improving IEEE 1588 synchronization accuracy in 1000BASE-T
systems, Precision Clock Synchronization for Measurement, Control, and Communication,
2015, ISPCS 2015, 9th IEEE International Sym., pp. 1-6
Han J., Shin C., Dec 2016, A nanosecond-accuracy clock synchronization circuit for
IEEE 1588-2008 using tapped delay, Electronics Express, IEICE, Vol. 13, No. 23, pp.
1-6
G., 2019, 8261: Timing and Synchronization Aspects in Packet Networks, ITU-T Recommendation
Yuan C., Shekhar S., Sep 2019, A Supply-Noise-Insensitive Digitally-Controlled Oscillator,
Circuits and Systems I: Regular Papers, IEEE Transactions on, Vol. 66, No. 9, pp.
3414-3422
Cadeddu S., Aug 2017, A Time-to-Digital Converter Based on a Digitally Controlled
Oscillator, Nuclear Science, IEEE Transactions on, Vol. 64, No. 8, pp. 2441-2448
Lipinski M., Włostowski T., Serrano J., Alvarez P., Sep 2011, White Rabbit: a PTP
Application for Robust Sub-nanosecond Synchronization, Precision Clock Synchronization
for Measurement, Control, and Communication, 2011, ISPCS 2011, 5th IEEE Int. Sym.
on, pp. 25-30
Author
received the B.S, M.S, and Ph.D degrees in Electrical Engi-neering and Computer Science
from Seoul National Univ. in 2002, 2004, and 2009, respectively.
He has been an assistant professor in Sun Moon Univ. since 2014. His research interests
include clock synchronization and carrier-grade Ethernet.
received the B.S and M.S degrees from Yonsei Univ. in 1993 and 1995, respectively,
and the Ph.D degree from the Univ. of Texas at Austin, in 2006, all in Electrical
Engineering.
From 1995 to 2001 and from 2007 to 2014, he was with LG Electronics and with SAIT,
respectively. Since 2014, he has been with the Department of Information and Communications
Engineering at Sun Moon Univ.
His research interests include wireless communications and signal processing for communications.