I. INTRODUCTION
As the process technology scales down, there has been rapid increase in the density
and the design complexity of the integrated circuit (IC). Recently, the 3D-IC design
technologies are drawing much attention due to the physical limit on process scaling.
In a synchronous system, the clock tree synthesis (CTS) is performed between placement
and routing in physical design. Since the clock signal connects all the clock sinks
on multiple layers or dies, there are many challenging tasks in each CTS design step.
In multi-layered circuits, the clock tree generated by traditional clock skew optimization
methods [1-3] may suffer from heat problem especially in inner layers. One of the goals in CTS
is the skew optimization. The clock skew is normally defined as the maximum difference
among clock latencies from clock source to all the sink nodes. In particular, thermal
variation becomes a crucial factor affecting reliability and performance due to the
structural characteristics in 3D-IC [4]. The 3D-clock tree elements such as clock
source, sinks, buffers, and TSVs are the major heating sources on the circuit.
Machine learning (ML) algorithms have been increasingly adopted to solve traditional
design problems at various steps of circuit design processes. In back-end design,
the works in [5] and [6] proposed the method to apply routing using support vector machine (SVM) and Generative
adversarial networks (GANs). Recently, there are approaches applying ML algorithms
to 3D-CTS. However, it has potential problems in building training data sets and predicting
results. In addition, 3D-CTS on advanced nodes may require considering thermal variations
for better reliability and accuracy. Our main contribution is building the framework
of thermal effect prediction for constructing 3D-clock tree with the following features;
1) thermal prediction method based on random forest; 2) thermal profiling time reduction;
3) the features definition and extraction to thermal variation for better accuracy;
4) thermal effect consideration in 3D-CTS while maintaining the design targets.
The random forest based thermal prediction for 3D-CTS is proposed in this work through
utilization of thermal information. Prior to CTS step, the thermal information is
trained to construct the thermal weight matrices. Then, clock tree is constructed
by reflecting thermal variations. The remainder of the paper is organized as follows.
Section II explains existing methods of thermal-aware CTS and ML based 3D-CTS. The
proposed method based on ML with thermal prediction for 3D-CTS is described in section
III. Experimental results and discussions are described in section IV. Finally, conclusions
are given in section V.
II. BACKGROUND & MOTIVATION
The design factors of 3D-CTS such as clock sinks, TSVs, and buffers affect the overall
wire length, the power consumption, and the clock skew. Especially, TSV insertion
is one of the important steps while constructing a 3D clock network in contrast with
conventional 2D-CTS. The low-power and reliable clock network design is proposed in
[7]. The method with TSV allocation in [8] is the multi-layer tree generation algorithm based on 2D tree topology. In 3D-IC
design, the major issue is the thermal problem affected by several physical structures.
Works in [9] and [10] proposed the method of skew balancing between the uniform and the worst thermal profiles.
The approach based on the deferred merging embedding algorithm is proposed in [11]. It can handle skew variations caused by thermal gradients. Thermal-aware 3D symmetrical
buffered CTS in [1] proposed the overall CTS with thermal consideration for reducing the skew. The delay
estimations were calculated to apply Elmore delay model [12]. Also, the thermal-aware delay estimation model [10] is generated in [12].
Recently, there have been many researches applying ML algorithms for reducing analysis
time and improving performance at various levels of design stages. The method [13] is proposed to predict each parameter and derive the results of CTS using SVM. There
are several researches [6] and [14] that use GAN to utilize images for clock tree construction and routing. The authors
in [15] proposed a deep reinforcement learning approach for global routing that uses Q-learning
algorithm.
The motivation of this work is to predict the accurate thermal effect of 3D-CTS in
a reasonable time. The thermal variation influences the clock skew while TSV and buffer
insertion greatly impact on thermal distribution. An example of thermal variation
on clock skew is illustrated in Fig. 1. The clock skew without considering thermal effect has 9ps, however, clock skew considering
the thermal effect increases up to 26 ps as shown in Fig. 1(a) and (b). The CTS consists of steps such as tree topology generation, element insertion,
and clock skew adjustment. The thermal-aware CTS requires an additional step which
is so-called thermal profiling. For accurate analysis, this step requires extra implementation
time and current state information of circuit. The approaches [1] and [9] perform profiling thermal data after CTS and take iterations if necessary as in Fig. 2(a). Clustering sinks, tree topologies, and buffers/TSV location may vary by the thermal
distribution. Therefore, thermal data needs to be updated and applied at each CTS
step for accuracy as described in Fig. 2(b). However, it is not only expensive but also difficult to obtain precise measurements
due to a lack of circuit information such as power distribution, location of resources,
and their connectivity.
The thermal information is constructed in the form of diffusion around the heating
sources. In general, the range of temperature is limited due to the physical characteristics
of the materials. The thermal effects can be calculated by the material constants
and physical characteristic parameters with formulations such as thermal diffusivity,
impedance, conductivity, and joules’ heating in [4] and [16]. For predicting thermal data, the regression algorithm is suitable for training data
with reasonable accuracy. The random forest [17], as one of the supervised learning algorithms, is specialized for predicting with
a few numbers of uncertain parameters. The proposed random forest based thermal prediction
for 3D-CTS which includes profiling and training is illustrated in Fig. 2(c). For verifying the performance of our method, this work adopts [7] and [11] that are one of the well-known CTS algorithms as the example. The following section
describes detailed steps of our method.
Fig. 1. Thermal variation on clock skew: (a) Clock skew without considering thermal variation; (b) Clock skew considering thermal variation.
Fig. 2. Thermal-aware CTS flows: (a) Thermal profiling iteration; (b) Thermal profiling at each step; (c) ML based thermal profiling.
III. RANDOM FOREST BASED THERMAL PREDICTION FOR 3D-CTS
Firstly, thermal information is trained using random forest. In training step, the
primary inputs are the set of sinks and profiled thermal data. The output is thermal
weight matrices that represents the degree of thermal diffusion. Secondly, the inferenced
thermal data are applied to generate clock tree. The overall flow of our method is
described in Fig. 3.
Fig. 3. The overall flow of our thermal-aware CTS with random forest algorithm.
1. Training Thermal Information using Random Forest
1) Feature Extraction and Building Training Data
The features in our work are defined as the measured circuit data related to heating.
The training data generation consists of three steps. First, the structural elements
of CTS are classified by their element types. Second, the area of heat effect is calculated
by geometrical information including locations of clock tree elements and distances
among them. Third, the thermal data are applied to the calculated area of each CTS
element. The notations in the formulation are summarized in Table 1.
The set of all CTS elements is classified into elements U. An example of building
training set is described in Fig. 4. For simplicity, Circuit elements such as sink, TSV, and buffer are represented as
S, T, and B at corresponding locations. The set of thermal data Q is defined as geometric
and temperature information on the placed circuit. In order to reflect thermal information,
the circuit area is divided into small grids as shown in Fig. 4(a). Each grid q$_{n}$ has temperature values which are calculated by thermal simulation
in the power distribution network.
The heat effect is normally represented as contour form and the temperature varies
with the distance from the heat source. In addition, the physical design steps related
to CTS such as placement, power distribution network construction, and wire routing
are performed on the grid with rectangular regions. The heat-effective area is defined
as the region affected by thermal diffusion. A training data set E$_{i}$ includes
thermal data q$_{n}$ and its corresponding region of each type. Also, E$_{i}$ consists
of a set of CTS elements and its associative parameters such as thermal resistivity,
conductance, and capacitance. Each element e$_{ij}$ contains both coordinates of e$_{ij}$
and its thermal matrix. The heat-effective area with the thermal data is represented
as the square matrix e$_{ij}$ that is the degree of thermal diffusion. Each type of
element E$_{i}$ has unique diffusivity and suitable size M$_{i}$ which represents
magnitude of the heat-effective area. Thus, thermal effect is represented as M$_{i}$${\times}$M$_{i}$
matrix form. Computing M$_{i}$ is based on the nearest neighbor graph (NNG). We utilized
Euclidean distance for reflecting thermal diffusion and constructing NNG. Then, all
nodes are clustered with their nearest neighbor node, and all edges are sorted by
the cost of distance. Let D$_{i}$ = {d$_{1}$, d$_{2}$, …, d$_{x}$} is a set of distance
costs at E$_{i}$, then M$_{i}$ is $\left\lceil d_{y}\right\rceil ,$ the median cost
of D$_{i}$. If the number of elements in set E$_{i}$ is even, then d$_{y}$ is the
average of two median costs. An example of M$_{i}$ calculation is described in Fig.
5. If five sinks are placed on grid, then the result of applying matrix form is described
in Fig. 5(a). The NNG of all sinks is constructed based on the cost of distance and sorted by
ascending order. The edges (B, C) and (C, E) are the median values in the list of
costs, 4.00 and 4.24, of which the average is 4.12. The M$_{i}$ is decided as 5 to
round up the average described in Fig. 5(b) and (c).
Table 1. Notation in formulation
U
|
a finite set of all CTS elements. ( = {E1, E2, …, Ei} )
|
Ei
|
a set of same type of each CTS elements. ( = {ei1, ei2, …, eij} )
|
eij
|
the information structure of an element ( ∈ Ei )
|
eijM×M
|
a M-by-M matrix of thermal data on eij.
|
Q
|
a set of thermal data on circuit. ( = {q1, q2, …, qn} )
|
qn
|
temperature of a grid within of circuit. ( ∈ Q )
|
qinit
|
initial temperature.
|
tmax
|
maximum temperature bound of training set
|
tmin
|
minimum temperature bound of training set
|
tinit
|
normalized initial temperature.
|
Di
|
a set of distance costs at element set Ei ( ={d1, d2, …, dx} )
|
dy
|
a distance cost in Di ( ∈ Di )
|
Mi
|
the size of matrix on ei
|
Fig. 4. Building training set on circuit with classification of CTS elements: (a) The grid of thermal data after profiling; (b) An example of the training set and shaded overlapped grids.
Fig. 5. Computing matrix size: (a) Unique size of matrix on sinks; (b) NNG of sinks; (c) M$_{i}$ calculation with median cost.
Algorithm 1: Building training set
The profiled thermal data can be inconsistent due to peripheral temperature. This
may lead to unexpected or inaccurate training results. To maintain consistent training
sets, temperature values are normalized to reflect the amounts of variation. Thus,
the initial temperatures are unified at entire training sets, since each temperature
is measured on different initial temperatures. The marks of q and t are denoted as
profiled temperature and normalized value. The matrix e$_{ij}$ is denoted by j$^{th}$
element of the type i. The value in the center grid of e$_{ij}$ is denoted as the
initial temperature q$^{ij}$$_{init}$, and it is normalized into t$^{ij}$$_{init}$.
The l$^{th}$ temperature and normalized value in eij are denoted as q$^{ij}$$_{l}$
and t$^{ij}$$_{l}$. The normalizing process is performed with computation by the ratio
of q$^{ij}$$_{init}$ to t$^{ij}$$_{init}$ with the bound of t$_{min}$ and t$_{max}$.
The values in e$_{ij}$ represents the degree of thermal variation in the neighbor
area.
An example of the final training set is illustrated in Fig. 4(b). The overlapped grids (OG) that are shaded in the example have more heat influence.
The pseudo-code of building training set is summarized in Algorithm 1. The primary
inputs are the CTS elements on all layers with the corresponding thermal data and
the thermal information matrices are generated as outputs of our algorithm. The time
complexity of the outer loop, as matrix construction, is O(M) where M is the number
of element types. The time complexity of the inner loop as thermal data normalization
takes O(T) where T is the number of grids on matrix. The overall time complexity of
the proposed process of building training data becomes O(M·T).
2) Training Thermal Effect with Random Forest
Prior to applying ML, the training sets are classified into matrices of corresponding
types. The thermal tendency of training set is computed by aggregating matrices of
same types by random forest. The weight matrices representing the degree of thermal
diffusion at each type of element are achieved as the result of training.
Due to unpredictable next operating conditions, we consider the uniformed circuit
operation conditions focusing on the thermal profiling steps and the characteristic
parameters related to thermal. The training parameters of thermal effect are calculated
by models of thermal distribution. The thermal effect matrix Q$_{M}$ with matrix size
M is computed by applying the matrix form in [4] as:
where G$_{M}$, T$_{M}$, and Q$_{M}$ are M-by-M matrices that represent the thermal
resistivity, temperature nodes, and heat sources respectively. The Q is calculated
by both the thermal diffusivity equation and thermal profiling with the power distribution
network. The coefficient of thermal diffusion [16], called thermal diffusivity, is adopted on the training data, and it is calculated
by the heating parameters. As the learning step, the thermal data are represented
by the variation of thermal and thermal diffusivity. The thermal diffusivity is calculated
from Fourier’s law. Then, thermal diffusivity in conductivity for analyzing thermal
diffusion is given by the following Eq. (2), and the thermal diffusivity equation is transformed into Eq. (2a).
where ${\alpha}$$_{diff}$, ${\rho}$, and c$_{p}$ represent the thermal diffusivity,
materials density, and specific heat capacity. q$_{v}$ is the volumetric heat source(W/m$^{3}$),
and ${\Delta}$T is the variation of temperature. The ${\Delta}$T$_{norm}$ is defined
as the value of normalized thermal variation. It can be calculated by subtracting
two values, the initial value t$^{ij}$$_{init}$ and the normalized value t$^{ij}$$_{l}$.
The calculation is modified by substituting ${\Delta}$T by ${\Delta}$T$_{norm}$ from
Eq. (2). The ${\delta}$ is defined multiplicative inverse of normalizing ratio. The coefficient
${\alpha}$$_{diff}$ should be adjusted into ${\beta}$$_{diff}$ that is calculated
by ${\delta}$ · ${\alpha}$$_{diff}$. Then, each ${\Delta}$T$_{norm}$ on all matrices
in E$_{i}$ is calculated to thermal effect vectors by the matrix form applying to
Eq. (1). The calculation of thermal effect is described as following Eq. (3). The temperature T$_{k}$$^{i}$ is computed to model the thermal effect of neighbor
elements. In the i$^{th}$ grid of clock tree elements, the data T$_{k}$$^{i}$ included
thermal variation at grid k is calculated as below:
where l is the neighbor grid of k, g is denoted as the thermal impedance of elements
or thermal impedance between grid k and j. The Q is defined as generated heat, C is
matrices of heat absorbing capability for each layer, and s is the coefficient of
the domain. The term sC$_{k}$$^{i}$ is added for representing different layers at
3D-IC. The thermal effect on each layer matrix Q can be calculated using Eq. (1). Also, the sets of thermal diffusivity using Eq. (2a) on each grid are input vectors
which represents mutual heating effect. The correlation of thermal variation and diffusivity
is trained by random forest.
The training process in this work utilizes the bootstrap aggregating technique that
divides and combines trees. To construct the forest, we apply the boosting as one
of the ensemble models [17] specialized in regression and classification. Let’s assume that N is the number of
training set, then each training set becomes the root of each decision tree. For maintaining
accuracy, prior to begin learning steps, the optimal N is measured by integer linear
programing (ILP). The training process consists of three steps; generation of N base
learners through bootstrap, training by decision trees, and computing training results
from all decision trees.
Firstly, the N base learners are generated with bootstrap. The forest specification
is determined by forest parameters which become the number of decision trees and the
size of forest as tree depth. Secondly, the N base learners are trained by random
decision tree. The random node optimization is performed by objective function and
split function according to the width and depth of the forest. The nodes of each tree
have their split functions that consist of thermal effect parameters in Eq. (1). The probability of Q is generated to maintain their randomness by the expected thermal
variant of the target due to overestimation. All the trees that proceeded above steps
are combined to the form random forest. Finally, learning results are achieved by
taking average of output values of all decision trees. The thermal weight matrix,
defined as Y$_{i}$, represents the correlation of thermal effect at E$_{i}$ by distance
from the center of the matrix. The process is iterated to fill Y$_{i}$ with the learning
results. The larger weight has a greater impact from thermal. The weights can be added
to other weights as Eq. (3) when the matrices are overlapped at the same grid. An example of the learning thermal
effect is shown in Fig. 6. In the Y$_{i}$, grids (e$_{\mathrm{(3,4)}}$, e$_{\mathrm{(4,3)}}$) which are the
same distance from target element e$_{i}$ have different weights due to their randomness.
Fig. 6. Learning process to obtain thermal weight matrix with random forest.
2. Application of Thermal Weight Matrix in 3D-CTS
The Y$_{i}$ generated in the previous step represents the thermal sensitivity of each
clock tree element. Thermal variations are updated immediately using Y$_{i}$ with
no additional measurement or profiling at each tree construction step. In each step,
Y$_{i}$ correspond to the circuit with target resources of clock tree. The overall
flow of our application process is described in Fig. 7.
1) Clock Tree Topology Generation and Elements Insertion with Thermal Weight Matrix
The geometrical information of each sink is paired with its corresponding Y$_{i}$.
An example of the clock tree synthesis with trained thermal data is represented in
Fig. 8. Sinks and matrices are placed as shown in Fig. 8(a). As the initial step, clock sink clustering is generally performed to create clock
tree topologies in CTS. The clock skew can be reduced with avoiding hot spot region.
For this reason, the heating degree of elements are calculated by the average values
in Y$_{i}$ on each clock tree elements. Fig. 8(b) shows clock tree topology without updating thermal data and with updating thermal
data. C and D are paired without thermal consideration. However, with thermal in clock
sink paring, D and E are paired as described in the right in Fig. 8(b). In this case, sinks are clustered for avoiding hotspot regions overlapped Y$_{i}$.
The clock tree elements are inserted after topology generation step. Each element
that may cause heating effect on clock tree is placed by Y$_{i}$. In this step, we
assume that each element is placed on middle of two nodes. In general, the geometrical
coordinates of elements are determined by next step. However, the thermal distribution
can be predicted in advance by Y$_{i}$. Fig. 9 describes the determining process of TSV location between T$_{a}$ and T$_{b}$. The
location is selected by the number of OGs and hotspot regions between the candidates.
The hotspot region and thermal distribution are estimated by the OGs and weight matrices
placed on candidate location. The Y$_{i}$ are applied to the candidates, after the
hotspot regions are predicted. When TSV is inserted on T$_{a}$, the number of OGs
in left subtree of T$_{a}$ is 18, and right subtree has 15 OGs. Thus, the total number
of OGs becomes 33 and the number of hotspot regions with the most OGs is 1. On the
other hand, when TSV is inserted on T$_{b}$, OGs in the left subtree of T$_{b}$ are
12, and the right subtree has 18 OGs. Thus, the total number of OGs becomes 30 and
the number of hotspot regions with the most OGs is 3. Even though inserting TSV on
T$_{a}$ has an advantage in skew balancing, inserting TSV on T$_{b}$ has a great performance
in power consumption. In this case, the element locations are determined by the design
constraints in CTS. Thus, the weight functions for determining the TSV location are
generated by the constraints. The weight functions consist of the maximum degree of
heating regions and thermal gap between the left and right subtree.
Fig. 7. The proposed thermal weight matrix application flow in 3D-CTS.
Fig. 8. Application of thermal weight matrix at each CTS step.
Fig. 9. Determining TSV location with thermal weight matrices and predicting the thermal effect with weight matrix.
At first, the maximum weight of OGs on an element type a of Y$_{a}$ is represented
by Max(Y$_{a}$), and it is calculated by the number of multilayers. The matrices more
than half have OG due to the matrix size obtained by the median cost of distance.
Thus, overlapped level (OL) can be classified into three types, as described in Eq.
(4).
The i is the overlapped level in function OL. The OL(Y$_{a}$,i) is the function of
classifying weight by overlapped level in Y$_{a}$ and it returns the sum of weights.
The L$_{max,a}$ is denoted the maximum level of OGs on Y$_{a}$. In Eq. (4), the function Max(Y$_{a}$) calculates the maximum value, it consists of three conditions
according to the number of OGs on Y$_{a}$. Due to the matrix size, the number of grids
on hot spot region has less than that of lower overlapped level. Also, the number
of OGs is decreased by increasing overlapped level. When new elements are inserted,
the threshold level is set to prevent hot region to be underestimated. Thus, the maximum
level is restricted by the threshold level. Fig. 10 describes an example of overlapped condition in Max(Y). If there are no OGs (L$_{max,a}$
= 0), Max(Y$_{a}$) is set to default value 0 as represented in Fig. 10(a). If L$_{max,a}$ is 2, Max(Y$_{a}$) becomes the summation of weights on OGs as in
Fig. 10(b). If L$_{max,a}$ is greater than 2, Max(Y$_{a}$) becomes the summation of weights
on OGs as shown in Fig. 10(c) and (d) respectively. When more than two elements are inserted, the difference in
L$_{max}$ among elements can lead to a faulty prediction due to the number of OGs.
For example, as shown in Fig. 10(b) and (c), the maximum overlapped levels generated by the candidate Y$_{b}$ and Y$_{c}$,
are 2 and 3 respectively. The max operations on Y$_{b}$ and Y$_{c}$ perform to calculate
thermal weight with the summation for 24 and 8 grids in Fig. 10(b). The sum of weights is influenced by the number of grids, so this calculation can
be insufficient to consider the hotspot region. Thus, L$_{max}$ needs to be fixed
as the maximum overlapped level among all candidates for insertion. In this example,
if L$_{max}$ is set to 3, Y$_{b}$ and Y$_{c}$ have no grids of maximum level and 8
grids of maximum level 3. For the other case, two candidates Y$_{c}$ and Y$_{d}$ are
constructed as in the Fig. 10(c) and (d). The matrix Y$_{c}$ includes 8 grids of overlapped level 3. The matrix Y$_{d}$
contains 4 grids of overlapped level 3 and 4 grids of overlapped level 4. In this
case, as mentioned above, the L$_{max}$is determined by 4. However, when the threshold
level L$_{max}$ are set as 2, the maximum operations of Fig. 10(c) and (d) are performed on 8 grids.
Fig. 10. Overlapped conditions in Max(Y): (a) Max(Y$_{a}$) = 0; (b) Max(Y$_{b}$) = 0; (c) Max(Y$_{c}$) = 3; (d) Max(Y$_{d}$) = 4.
Fig. 11. The extraction of quadrant in weight matrix using angle of left/right subtree edge: (a) An obtuse angle of subtree edge; (b) An acute angle of subtree edges.
At second, the Eq. (5) considers the clock skew caused by unbalanced thermal variation at the edges of the
subtree. The function of thermal gap Gap(Y$_{a}$) between left and right subtree represents
the degree of skew balance that is the difference in thermal distribution at child
nodes as shown in Eq. (5).
where v is defined as the weight located within the quadrants including subtree and
neighbor. When a node is connected to other nodes by edges, the irrelevant area is
excluded by the angle of subtree edges for accuracy and efficiency. The matrix Y$_{a}$
can be divided into quadrants, and the quadrants corresponding to left and right subtree
are obtained as described in Fig. 11. If one of the subtree edges belongs to a quadrant, it has two adjacent quadrants
which can affect its child nodes. For example, when the angle of subtree edges is
obtuse as shown in Fig. 11(a), left subtree edge belongs to the 3$^{\mathrm{rd}}$ quadrant, and its adjacent quadrant
is the 2$^{\mathrm{nd}}$ and the 4$^{\mathrm{th}}$ quadrant. Also, right subtree edge
belongs to the 1$^{\mathrm{st}}$ quadrant, and its adjacent quadrant is the 2$^{\mathrm{nd}}$
and the 4$^{\mathrm{th}}$quadrant. On the contrary, when the angle of subtree edges
is acute as shown in Fig. 11(b), both left and right subtree edges belong to the 3$^{\mathrm{rd}}$ quadrant, and
their adjacent quadrants are same as the 2$^{\mathrm{nd}}$ and the 4$^{\mathrm{th}}$
quadrant. Thus, the 1$^{\mathrm{st}}$ quadrant is excluded in weights calculation
because it includes less information related to subtree edges (irrelevant area). Therefore,
the influence degree at each subtree is represented as summation of weights including
its own and adjacent quadrants, and Gap(Y$_{a}$) becomes the thermal gap between two
subtrees. Finally, the weight function W(Y$_{a}$) through Eqs. (4) and (5) is formulated for elements insertion as follows:
where w$_{max}$, and w$_{gap}$ are the weight of Max(Y$_{a}$), and Gap(Y$_{a}$). The
${\gamma}$ is the constant value. The W(Y) can be utilized to consider the thermal
effect before elements are inserted in advance. Also, Eq. (6) can be reused in element insertion step for selecting candidate in detailed location.
2) Clock Skew Adjustment with Thermal Weight Matrix
The clock arrival time is measured by delay estimation on clock tree topology. The
difference of clock arrival time needs to be reduced to improve circuit performance.
Also, the clock can be skewed intentionally to resolve violations, it is called useful
clock skew. When considering heating effect, the thermal weight matrix is adopted
by the various clock skew adjustment techniques such as TSV/buffer insertion, and
wire routing strategies. In general, TSV/buffer insertion is proceeded to determine
geometrical location of TSV/buffer with considering skew balancing and power consumption
by each path delay. Then, wire routing is performed with various constraints such
as obstacle avoiding, skew balancing, and thermal distribution. For example, TSV positions
are reallocated and buffers are inserted to control skew. The hotspot regions result
in increasing resistance and consequently may cause delay degradation as shown in
Fig. 8(c). Thus, the path delay is measured with the changed resistance applying thermal weight
matrix Y. Thus, the multiplicative inverse of normalizing ratio ${\delta}$ is used
for both the absolute temperature scale (K) and the change in temperature. The temperature
effect according to the unit length of interconnect is represented by joules’ heating
equation as follows:
where R is the interconnect resistance, I is the root mean square occurred by insertion.
The heating effects can lead to increase current. The term I$^{2}$·R represents the
power dissipation on the interconnects, and R$_{\theta }$ is the thermal impedance
of the interconnect line to the chip. The resistance R of unit wire length can be
expressed as linear relation with the wire width and height using the Eq. (2) and transformed Eq. (7) as follows:
where ${\rho}$$_{0}$, w$_{m}$, and t$_{m}$, are the unit length resistance at the
reference temperature, width and height of wire, and ${\beta}$ is the temperature
dependence weight of resistance (1/$^{\mathrm{o}}$C). The y$_{rf}$ is the average
value in thermal weight matrix that belongs to a path segment for delay calculation.
Also, the degree of thermal change from the initial temperature is represented by
y$_{rf}$. The updated temperature is calculated by multiplication of T and y$_{rf}$.
With R transformed by Eq. (8), the path delay can be estimated by delay formulation. After delay estimation, the
clock skew optimization as TSV/buffer insertion is performed with Eq. (6). In the wire routing, the clock skew is sensitive to the wire length and routing
path due to thermal variation as shown in Fig. 8(d). In general, the routing algorithms are processed by object function. When wire routing
is performed, the grids belonged to wire should require additional weight for considering
the thermal effect. Our method calculates the weight for each CTS element using Y$_{i}$
obtained by random forest in advance. In addition, the proposed method can be applied
to entire CTS steps by weight function.
IV. EXPERIMENTAL RESULTS
The verification for our method is made up of two parts: performance of learning process
and CTS performance guarantee. Our method has been verified on 45 nm process technology
parameters based on a predictive technology model (PTM) [19]. For implementation, C/C++, Python, HSPICE, and MATLAB on quad core 3.1 GHz Linux
machine was utilized. The nominal supply voltage and the initial temperature is set
to 1.1 V and randomness from -25$^{\circ}$C to 125$^{\circ}$C respectively. A chip
with the size of 6cm$^{2}$ is divided into a uniform grid to obtain the distributed
temperature map by Hotspot 6.0 thermal simulator [22]. The technology parameters of wire have unit resistance r$_{0}$ = 0.1 ${\Omega}$/${\mu}$m
and unit capacitance $c_{0}$ = 0.2 fF/${\mu}$m. The TSV parameters have unit resistance
r$_{v}$ = 0.035 ${\Omega}$, and c$_{0}$ = 15.48 fF. The parameters related to buffer
have input capacitance c$_{b}$ = 9 fF, intrinsic delay t$_{d}$ = 15 ps, and output
driving resistance r$_{b}$ = 66 ${\Omega}$. For a fair comparison, the parameter value
for NN-3D [7] and the via bound are set to ${\alpha}$ = 0.1 in [1] and minimum value. The indicators of error rate are adopted with both mean square
error (MSE) and mean absolute error (MAE) which can measure the amount of error in
statistical prediction models as following Eqs. (9) and (10):
where n is the number of the test cases, $\hat{Y}_{i}$ and Y$_{i}$ are the vectors
of predicted and observed values, y$_{i}$ and x$_{i}$ are the predicted and actual
values.
For detailed implementation with random forest, we set the parameters of the proposed
method such as thermal variation parameters, tree width and depth. The values of tree
width and depth are calculated by ILP at width range of 10 to 500 and depth range
of 3 to 100. We selected the width and depth parameters as 100 and 20 obtained from
the lowest MAE. The values of tree width and depth are decreased at each 300 and 50,
however, each value shared almost no decrease at over 500 and 100.
The effectiveness of the proposed learning method is compared with other learning
algorithms in accuracy and runtime. The comparison group consists of regression and
boosting algorithms. The regression algorithms are support vector machine (SVM), least
absolute shrinkage and selection operator (Lasso), ridge regression, and multivariate
adaptive regression splines (MARS). The boosting algorithms are extreme gradient boosting
(XGBoost), adaptive boosting (AdaBoost), and light gradient boosting machine (LightGBM).
In accuracy, each learning algorithm is evaluated by MSE and MAE according to the
number of learning instances as represented in Table 2. At the 50 instances, MARS has the highest MSE and Ridge represents the highest MAE.
At the 1000 instance, AdaBoost has both the highest MSE and MAE respectively. Furthermore,
our method maintains both MSE and MAE below 1 in the entire test case. In ratio, the
average of SVM which is one of the regressions, are set 1 as the reference point for
comparing our method with other algorithms in accuracy.
Table 3 shows the runtime comparing our method with seven other models. For fair comparison,
two conditions are set to this experiment. First, we increase the number of test cases
until the MAE is close to 1 in accuracy. Second, we performed 1000 times iteration
for all models to revise error rate due to unexpected test values and calculated the
average values of measurements. A training set was constructed about 0.31 seconds.
The number of training set affects both the runtime of each model and the preparing
time for training set. Ridge algorithm is performed until 700 training sets because
MAE converges to 2 as the number of test cases increases. Our method is sufficient
to use only 50 training sets for precise prediction of thermal effect. In runtime
comparison, the other models except for XGBoost are executed within 1 second. Lasso
regression has the fastest execution time as 0.039 seconds. However, it has the third
highest value in MSE and MAE. The proposed method which applies to boosting method
is implemented within 0.381 seconds. In learning comparison, our method improved prediction
accuracy and performed in outstanding runtime.
Table 2. The accuracy comparison with other ML algorithms using MSE/MAE
# of instances
|
MSE / MAE
|
SVM
|
Lasso
|
Ridge
|
MARS
|
XGBoost
|
AdaBoost
|
Our
|
50
|
11.779 / 2.213
|
8.405 / 2.328
|
16.286 / 3.137
|
20.411 / 3.060
|
8.756 / 2.584
|
3.177 / 1.196
|
0.004 / 0.059
|
100
|
9.891 / 2.550
|
6.096 1.618
|
21.318 3.077
|
1.204 / 0.820
|
7.272 / 1.958
|
3.043 / 1.472
|
0.023 / 0.084
|
300
|
10.060 / 1.572
|
7.481 / 2.039
|
10.351 / 2.382
|
10.110 / 2.140
|
9.542 / 1.546
|
9.940 / 1.996
|
0.013 / 0.070
|
500
|
4.739 / 1.029
|
7.208 / 2.171
|
16.026 / 2.535
|
6.269 / 1.679
|
4.944 / 1.540
|
11.193 / 2.594
|
0.011 / 0.071
|
700
|
16.192 / 2.086
|
8.513 / 2.154
|
8.433 / 2.182
|
10.211 / 2.287
|
7.423 / 1.713
|
7.034 / 2.036
|
0.016 / 0.080
|
1000
|
8.114 / 1.415
|
9.864 / 2.233
|
10.925 / 2.295
|
7.929 / 1.973
|
7.248 / 1.868
|
16.511 / 3.200
|
0.010 / 0.072
|
Avg.
|
10.129 / 1.811
|
7.928 / 2.091
|
13.890 / 2.601
|
9.356 / 1.993
|
7.531 / 1.868
|
8.483 / 2.083
|
0.015 / 0.073
|
Ratio
|
1.000 / 1.000
|
0.783 / 1.155
|
1.371 / 1.437
|
0.924 / 1.101
|
0.743 / 1.032
|
0.837 / 1.150
|
0.001 / 0.040
|
Table 3. The runtime comparison with other ML algorithms
Model type
|
Verifying 1000 times when MAE = 1
|
# of training set
|
MSE
|
MAE
|
Runtime(s)
|
SVM
|
500
|
3.398
|
1.021
|
0.568
|
Lasso
|
100
|
5.138
|
1.858
|
0.039
|
Ridge
|
700
|
10.707
|
2.035
|
0.092
|
MARS
|
100
|
1.623
|
1.042
|
0.241
|
XGBoost
|
500
|
8.076
|
1.653
|
1.168
|
AdaBoost
|
100
|
5.299
|
1.832
|
0.308
|
LightGBM
|
500
|
9.888
|
1.884
|
0.640
|
Our
|
50
|
0.065
|
0.197
|
0.381
|
Table 4. The skew comparison with other CTS algorithms
Bench-mark
|
# of sinks
|
Skew (ps)
|
Skew reduction comparison with our method (%)
|
|
Without considering thermal [7][11]
|
With considering thermal [18]
|
Considering thermal with ML (our)
|
[7][11]
|
[18]
|
|
r1
|
267
|
93.2470
|
75.3259
|
79.1123
|
15.16%
|
-5.03%
|
|
r2
|
598
|
111.1972
|
81.5505
|
77.9883
|
29.86%
|
4.37%
|
|
f11
|
121
|
46.3255
|
34.0915
|
33.5218
|
27.64%
|
1.67%
|
|
f21
|
117
|
50.8296
|
34.1723
|
35.2710
|
30.61%
|
-3.22%
|
|
f31
|
273
|
65.3832
|
50.7124
|
47.5306
|
27.30%
|
6.27%
|
|
f32
|
190
|
52.2513
|
38.6632
|
38.7558
|
25.83%
|
-0.24%
|
|
Avg.
|
69.8723
|
52.4193
|
52.0300
|
25.54%
|
0.74%
|
|
Ratio
|
1
|
0.7502
|
0.74464
|
|
|
|
Next, we construct clock tree applying existing methods, then the clock skew and construction
time of clock tree are measured respectively. The proposed method is evaluated on
four ISPD’09 benchmark circuits f11, f21, f31, f32 [20] and IBM benchmark circuits r1, r2 [21]. For estimating 3D-IC, 2D-IC in ISPD/IBM benchmark are transformed into two-layered
3D structure. The clock sinks in benchmark circuits were placed by randomly partitioning
into two-layers. The results in Table 4 and 5 are average values with 1000 times. The clock skew of proposed method is compared
with that of other CTS algorithms as described in Table 4.
In skew, the clock tree without considering thermal is constructed by combining [7] and [11]. For fair comparison, our method is applied to consider thermal effect with [7] and [11]. Our method reduced about 25.54% of clock skew compared to the method without thermal
consideration and the [18] showed skew reduction as 24.98%. The average skew reduction rate of proposed method
was represented similar to [18], however, our method showed better performance when the number of sinks are increased.
Also, the degree of skew reduction in our method depends on CTS algorithms.
Table 5. The runtime comparison with other CTS algorithms
Bench-mark
|
# of sinks
|
Runtime (s)
|
Runtime reduction compared to our method (%)
|
w/o considering thermal [7][11]
|
Thermal profiling iteration w/o ML
|
Thermal profiling at each stage w/o ML
|
Considering thermal with ML (our)
|
r1
|
267
|
48.0532
|
222.7577
|
481.4068
|
52.1281
|
-8.48%
|
76.60%
|
89.17%
|
r2
|
598
|
826.4700
|
6098.4006
|
3317.2875
|
863.8535
|
-4.52%
|
85.83%
|
73.96%
|
f11
|
121
|
5.1906
|
27.2515
|
278.7895
|
7.2308
|
-39.31%
|
73.47%
|
97.41%
|
f21
|
117
|
4.4604
|
24.2528
|
272.6368
|
6.4590
|
-44.81%
|
73.37%
|
97.63%
|
f31
|
273
|
65.4941
|
298.0960
|
501.0753
|
70.4884
|
-7.63%
|
76.35%
|
85.93%
|
f32
|
190
|
16.8741
|
79.3434
|
374.4031
|
19.4463
|
-15.24%
|
75.49%
|
94.81%
|
Avg.
|
161.0904
|
1125.0170
|
870.9332
|
169.9344
|
-5.49%
|
84.89%
|
80.49%
|
Ratio
|
1.00
|
6.98
|
5.41
|
1.05
|
|
|
|
To represent the runtime of the proposed method, we compared it with the CTS methods
considering thermal using. One of the comparison targets is implemented to iterate
thermal information profiling until obtaining optimal solution. The other comparison
target is implemented to profile thermal data at each CTS stage. We constructed the
clock trees with combining [7] and [11] as described in Table 5. The clock tree construction is performed until skew reduces below 3% compared to
our method. The proposed method considering thermal with applying ML is increased
as about 5% on average than the method without considering thermal effect. All methods
in runtime are increased according to the number of clock sinks. Especially, in r2,
the iteration method is increased exponentially as the number of clock sinks. However,
CTS with thermal profiling at each stage is increased proportionally according to
the number of clock sinks, thus, it has a large execution time due to profiling steps.
Our method requires an average runtime about 80% less than other methods with thermal
effect consideration.
In summary, proposed method improves the accuracy, and reduces runtime. Our random
forest-based thermal effect prediction for 3D-CTS shows almost the similar performance
compared with existing method in skew when applied to existing CTS algorithm. In addition,
our method can be applied to the other CTS algorithms with less additional runtime.