ChoiEunjeong1
KimJeongtae1
-
(Department of Electronic and Electrical Engineering, Ewha Womans University, Seoul,
Korea
{eunjeong_choi, jtkim}@ewha.ac.kr
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Bootstrap, CMOS, fully differential, mirrored-cascode, TIA
1. Introduction
The semiconductor industry has been rapidly growing, and semiconductors are widely
used in various products, such as mobile phones, computers, tablets, and automobiles
[1,2]. In the semiconductor fabrication process, wafers are the core material, and integrated
circuit (IC) dies are arranged on the wafer in a grid pattern [1]. To ensure the quality of semiconductor products, it is essential to inspect the
dies of the wafer during the manufacturing process [2,3].
Typical wafer die inspection methods are based on machine vision-based golden
template method that utilizes the characteristics of repetitive patterns of the dies
on the wafer [2-5]. These methods first generate a golden die image by combining multiple die images
in various ways [2-5]. Then, they calculate the pixel-wise difference between the golden die image and
the test die image. Finally, they detect defects by applying various post-processing
techniques to the difference image, such as image thresholding or morphology operation
[2-4]. These inspection methods are called die-to-die inspection.
However, machine vision-based die-to-die inspection methods are vulnerable to
registration error and intensity-level variation between die images. In addition,
they rely heavily on an expert to extract hand-crafted features [6] and tune various parameters for an inspection system. Several deep learning-based
wafer inspection methods have been investigated to automatically extract optimal features
for wafer inspection and provide better performance [6-10]. Most deep learning-based methods use convolutional neural networks (CNN) to classify
defect types [8-10] or detect defect locations [6,7]. However, these are not based on die-to-die inspection because they inspect the wafers
using only the test image without information from the golden image.
We believe that the die-to-die inspection using the characteristics of repetitive
die patterns can improve the performance of deep learning-based wafer inspection methods.
To the best of our knowledge, there are no studies on deep learning-based die-to-die
inspection. Therefore, we propose a deep learning-based comparison system for wafer
die-to-die inspection. The proposed method is based on a twin network (Siamese network)
composed of a convolutional encoder-decoder [11]. It takes golden and test die images as input and compares them to detect different
areas between two input images as defects. To alleviate the performance degradation
problem caused by registration error, we use only one die image as the golden die
image instead of combining multiple die images. In addition, the proposed method detects
defects using optimal features automatically extracted for die-to-die inspection in
the training stage instead of directly calculating the difference between the golden
die and test die images.
Fig. 1. A conventional wafer inspection system.
Furthermore, we improve the performance of the twin network by applying a Bayesian
learning technique. Recently, many studies on Bayesian learning have been conducted
in segmentation and classification fields [12-14]. These studies showed that the performance of the networks with Bayesian learning
is better than that of conventional networks.
The idea of Bayesian learning is to interpret network weights as random variables
and compute the posterior distribution of network weights given training data [15]. The posterior distribution allows us to calculate the distribution of prediction
by marginalizing over network weights [15]. Through marginalization, Bayesian learning may prevent overfitting and improve the
performance of the network [15]. However, it is impractical to compute the posterior distribution of network weights
exactly, so techniques to approximate the posterior distribution are often used, such
as variational inference [16].
Recently, a study developed a theoretical framework that casts dropout in machine
learning as approximate Bayesian learning [15]. The study showed that a neural network trained with dropout is an approximate Bayesian
neural network [15,17]. Moreover, some studies showed that a Bayesian neural network may measure the uncertainty
of a trained model and reduce the overconfidence problem in classification tasks [15,18,19].
To verify the usefulness of the proposed method for die-to-die inspection, we
compare its performance with that of a conventional wafer inspection method [7]. The conventional method is based on an encoder-decoder network with a single input
image [7], so for a fair comparison, we modified it so that the encoder-decoder network uses
golden and test images as input, as shown in Fig. 1. We call this modified method the conventional method in this paper. To the best
of our knowledge, this is the first attempt to apply a twin network and Bayesian learning
for wafer die-to-die inspection.
The remainder of this paper is organized as follows. In Section 2, we explain
two important aspects of the proposed method in detail: the twin network and the Bayesian
twin network. We present a detailed description of our dataset in Section 3. Experimental
results and conclusions are presented in Sections 4 and 5.
2. Proposed Method
2.1 Twin Network
A twin network is a neural network that contains two or more sub-networks [11]. The network inputs multiple images and extracts feature vectors by processing images
separately through each sub-network. Then, it computes the distance of extracted feature
vectors to measure similarity between the input images [11]. The main advantage of a twin network is sharing weights between sub-networks [11]. By sharing the network weights, a twin network can identify whether similar features
exist, so it can measure similarity between multiple images. Recently, a twin network
composed of a convolutional encoder-decoder structure has been applied for various
applications, such as video object segmentation [20] and change detection [21-23]. These methods also share the weights between encoders [20-23].
As shown in Fig. 2, the proposed network consists of two encoders and one decoder, and the weights in
the two encoders are shared. This network takes the golden and test die images, extracts
the feature maps from sub-encoders, and concatenates extracted feature maps. The merged
feature maps are used as the decoder input, and the network finally outputs the different
areas between the golden and test die images as defects. To detect small defects accurately,
we also apply a difference skip-connection [24], which computes the absolute difference of the feature maps between encoder-convolution
layers and then transfers the difference values to the decoder, as shown in Fig. 2.
2.2 Bayesian Twin Network
A Bayesian neural network models network weights as random variables instead
of fixed values and computes the posterior distribution of network weights given training
data. Using a posterior distribution, the network may predict a target value of test
data by marginalizing over network weights $\mathbf{W}$ given training data $\mathbf{X}$
and its corresponding label set $\mathbf{Y}$, which is defined as follows [15]:
Fig. 2. Twin network-based wafer die-to-die inspection system.
where $\boldsymbol{x}^{\boldsymbol{*}}$ is test data, $\boldsymbol{y}^{\boldsymbol{*}}$
is the prediction for the test data, and $p\left(\mathbf{W}|\mathbf{X},\mathbf{Y}\right)$
represents the posterior distribution of network weights. The network weights $\mathbf{W}$
are composed of $L$ layers $\left\{\mathbf{W}_{i}\right\}_{i=1}^{L}$, where $\mathbf{W}_{i}$
is a matrix of dimensions $K_{i}\times K_{i-1}$, and $K_{i}$ is the number of units
for each layer $i$.
In the prediction stage, the integral above is not tractable, so it is usually
approximated by Monte Carlo integration using samples drawn from the posterior distribution.
Also, it is impractical to compute the posterior distribution exactly, so one needs
to approximate the posterior distribution by applying variational inference techniques
[16]. These techniques approximate the posterior distribution to some variational distribution
$q\left(\mathbf{W}\right)$, from which samples can be easily drawn. This is done by
minimizing the Kullback-Leibler (KL) divergence between $q\left(\mathbf{W}\right)$
and the posterior distribution. The KL divergence is defined as follows [15]:
A previous study showed that a model trained with dropout is an approximate Bayesian
network by defining the variational distribution $q\left(\mathbf{W}_{i}\right)$ for
every layer $i$ with units $j$ as follows [15]:
where $\mathrm{b}_{i,j}$ is a Bernoulli-distributed random variable with probability
$p_{i}$, and $\mathbf{M}_{i}$ is a variational parameter to be optimized. The integral
in Eq. (2) can be approximated by summing over Monte-Carlo samples drawn from the
variational distribution $q\left(\mathbf{W}\right)$ as follows [15]:
where $N$ is the number of data, $T$ is the number of Monte Carlo samples, and
$~ \hat{\mathbf{W}}_{n,t}$ is a Monte Carlo sample from the variational distribution
$q\left(\mathbf{W}\right).$
The first term approximated with a single Monte Carlo sample is identical to
the cross-entropy loss of a model trained with dropout [15]. In addition, the second term can be approximated as $L2$ regularization [15]. Therefore, minimizing a loss function composed of the cross-entropy loss and $L2$
regularization is approximately equivalent to minimizing KL divergence [15].
We trained the twin network with dropout for the Bayesian twin network. After
training, we sampled the network weights 50 times from the approximate posterior distribution
using dropout. We used the mean of these samples as our prediction and used variance
to measure model uncertainty.
3. The Dataset
We used patterned wafer data acquired from the Vega facility of the ATI company
for this study. We added synthetic defects to the patterned wafer data. The generated
synthetic defect images contain defects of various shapes, sizes, and intensity values.
Fig. 3(a) shows the patterned wafer data, Fig. 3(b) shows one of the die images in the patterned wafer data, and Fig. 3(c) shows example images of synthetic defects with red boxes indicating the synthetic
defects.
Fig. 3. Experimental dataset.
Fig. 4. Pair-set examples.
Since the size of the die image in Fig. 3(b) is 10,000$\times 10,000$, we cropped it into 200 $\times $ 200 sub-die images. We
extracted the golden and test sub-die images as follows. First, we selected two different
die images (10,000 $\times $ 10,000) for the golden and test die images and obtained
test sub-die images (200 $\times $ 200) by tiling the test die image. Then, we extracted
the golden sub-die images through a template-matching technique using test sub-die
images for templates.
We set the search area for matching using the location information used for cropping
the test die image. This prior information can be used due to the characteristics
of the repetitive die patterns on the wafer image, as shown in Fig. 3(a). We measured the similarity between the golden and test sub-die images using the
mean absolute error (MAE) and performed template matching in sub-pixel units. We refer
to each golden and test sub-die set as a pair-set.
Fig. 4 shows two pair-set examples with red boxes indicating the defects. Fig. 4(a) shows golden sub-die images, Fig. 4(b) shows test sub-die images, and Fig. 4(c) shows the ground truth images indicating the different areas between the two sub-die
images. We collected 26,379 pair-set images (15,376 defective images and 11,003 defect-free
images) for training and 21,168 pair-set images (12,283 defective images and 8,885
defect-free images) for testing. The total number of defects in the test set was 14,554.
To verify the comparison performance of the twin network, we also collected 21,168
identical pair-set images composed of the same sub-die images, as shown in Fig. 5. The golden and test sub-die images are identical, so networks that compare the input
images should detect no defects for this case, as shown in Fig. 5(c).
Fig. 5. Identical pair-set examples.
4. Experiments
We investigated two methods: twin network and Bayesian twin network methods. We
verified the usefulness of the twin network for die-to-die inspection by implementing
the conventional wafer inspection method, as shown in Fig. 1. We compared the performance of the twin network with that of the conventional method.
In addition, we compared the performance of the Bayesian twin network and the twin
network to verify that Bayesian learning could improve the network performance. We
conducted experiments using the patterned wafer data and attempted to make the number
of parameters of each method similar.
4.1 Training and Inference
The twin network had two encoders with shared weights and one decoder, as shown
in Fig. 2, while the conventional method had one encoder and a corresponding decoder. The encoder
networks of all methods had four convolutional layers and max-pooling layers. The
decoder networks of all methods had four transposed convolutional layers and five
convolutional layers.
We used batch normalization and activation layers after every convolutional layer
and transposed convolutional layer. All models used the same number of filters and
filter sizes. The numbers of filters in each encoder-convolutional layer were 16,
32, 64, and 128, and those in each decoder-convolutional layer were 128, 64, 32, 16,
and 2. The filter size was $3\times 3$.
We implemented all methods using the PyTorch library [25] and an NVIDIA GeForce GTX 1070 GPU (Nvidia Corporation, USA). In addition, all models
were trained using the Adam optimizer with an initial learning rate of $1\times 10^{-3}$,
a batch size of 64, and a weight decay of $1\times 10^{-5}.$ We also randomly selected
20$\% $ of the training data as validation data in every epoch to monitor the performance
of the models and used the validation data for early stopping. For the Bayesian twin
network, we inserted dropout after every max-pooling layer of the encoders except
for the first layer and after every convolutional layer of the decoder except for
the last two layers. We applied dropout with a rate of 0.1 and obtained 50 Monte
Carlo samples at the test stage.
4.2 Performance Measurement
We used precision and recall to measure the performance of the models and computed
them in terms of the number of connected objects in the prediction images obtained
by thresholding softmax scores. We define precision as follows:
where true positive ($TP$) indicates the number of connected objects correctly inspected
as defects by comparing the input sub-die images. $FP$ is the number of connected
objects incorrectly inspected as defects (false positives). We define recall as follows:
where $FN$ denotes the number of actual defects not detected (false negatives). Higher
precision and recall indicate better performance.
4.3 Experimental Results
We evaluated the performance of the conventional method and the proposed method
using the patterned wafer data. To ensure a fair comparison, we set the threshold
individually for each method. The recall and precision of each network are reported
in Tables 1 and 2.
Table 1 summarizes the precision of each method when the recall of each method is the same.
The twin network shows better performance than the conventional method in the sense
that it exhibited higher precision when the recall is the same. The precision of the
twin network was about 0.39 percentage points higher than that of the conventional
method.
Table 1. Performance of all methods (the recall of each method is the same).
Methods
|
Recall
|
Precision
|
The conventional method
|
98.87 % (14,390/14,554)
|
96.33 % (14,390/14,938)
|
Twin network
|
98.87 % (14,390/14,554)
|
96.72 % (14,390/14,878)
|
Bayesian
twin network
|
98.87 % (14,390/14,554)
|
98.57 % (14,390/14,599)
|
We also compared the performance of the Bayesian twin network and the twin network
to confirm whether Bayesian learning could improve the performance of the twin network.
The Bayesian twin network achieved the best performance, and its precision was about
1.85 percentage points higher than that of the twin network.
Table 2 summarizes the recall of each method when the precision of each method is very similar.
As shown in the table, the recall of the twin network was about 0.47 percentage points
higher than that of the conventional method. In addition, the recall of the Bayesian
twin network was about 0.33 percentage points higher than that of the twin network.
Table 2. Performance of all methods (the precision of each method is very similar).
Methods
|
Recall
|
Precision
|
The conventional method
|
98.07 % (14,273/14,554)
|
98.56 % (14,273/14,481)
|
Twin network
|
98.54 % (14,341/14,554)
|
98.57 % (14,341/14,549)
|
Bayesian
twin network
|
98.87 % (14,390/14,554)
|
98.57 % (14,390/14,599)
|
Tables 1 and 2 confirm that the twin network shows better performance than the conventional method.
We think that it is because the twin network focuses more on image comparison than
the conventional method. The conventional method merges the two sub-die images and
treats them as different color channels. On the other hand, the twin network processes
the two sub-die images separately through each encoder with shared weights and then
merges the two branches. We think that the shared weights allow the twin network to
more effectively identify whether similar features exist in the two input images.
We also confirmed that Bayesian learning could improve the performance of the
twin network. We suspect that the reason is that Bayesian learning may prevent overfitting
through marginalization over the network weights. Moreover, Bayesian learning may
measure uncertainty of a trained model, as shown in Fig. 6. We think that providing the uncertainty information of the model predictions can
be helpful to improve the trust of the twin network-based wafer die-to-die inspection
system.
Fig. 6. Bayesian twin network prediction results
We also evaluated the performance of all methods using identical pair-set images
to further verify that the structure of the twin network focuses more on image comparison.
Table 3 shows the test results of each network using identical pair-set images. Identical
pair-set images are composed of the same sub-die images, so comparison networks should
detect no defects.
Table 3. Performance of three methods using identical pair-set images.
Methods
|
The number of false positive
|
The conventional method
|
1,722
|
Twin network
|
0
|
Bayesian twin network
|
0
|
As shown in Table 3, the twin network and the Bayesian twin network correctly inspected all identical
pair-set images. On the other hand, the number of false positives of the conventional
method was 1,722. Fig. 7 shows the prediction results of each method using identical pair-set images. Table 3 and Fig. 7 confirm that the structure of the twin network is more suitable for die-to-die inspection.
Fig. 7. Prediction results of the conventional method, twin network, and Bayesian
twin network using identical pair-set images.
Table 4 shows the inference time of each method. This was computed using the same hardware
described in Section 4.1. The inference time of the conventional method was about
8.04 milliseconds shorter than that of the twin network. We think it is because the
structure of the twin network is composed of two encoders, whereas the network structure
of the conventional method is composed of one encoder. As expected, the Bayesian twin
network was the slowest because it requires multiple inferences at test time. Although
the Bayesian twin network requires more computation than other methods, we think it
is useful because it could improve the performance of the twin network and measure
model uncertainty. One may conceive of a method to reduce the amount of computation
while maintaining the advantages of Bayesian learning, which is a subject for future
study.
Table 4. Inference time (msec).
Methods
|
Inference Time (msec)
|
The conventional method
|
19.22
|
Twin network
|
27.26
|
Bayesian twin network
|
389.44
|
5. Conclusion
We have proposed an encoder-decoder-based twin network for die-to-die wafer inspection.
The proposed twin network is composed of two sub-encoder networks that share weights
and one decoder network. In contrast to conventional deep learning-based wafer inspection
methods, the proposed method takes the golden sub-die image and the test sub-die image
as input and compares them to detect different areas between two input images as defects.
Furthermore, we applied Bayesian learning to improve the performance of the twin network.
We verified the usefulness of the proposed method in experiments using patterned wafer
data with synthetic defects.
ACKNOWLEDGMENTS
The authors are grateful to ATI Co., Ltd in Incheon, Korea for providing us the
patterned wafer data. This work was supported by the grant from the ATI company and
by the Ewha Womans University scholarship of 2019.
REFERENCES
Stan Stokowski, Vaez-Iravani Mehdi, 1998, Wafer inspection technology challenges for
ULSI manufacturing., American Institute of Physics, Vol. 449, No. 1
Zhang Jiun-Ming, Lin Ruey-Ming, Wang Mao-Jiun J, 1999, The development of an automatic
post-sawing inspection system using computer vision techniques., Computers in Industry,
Vol. 40, No. 1, pp. 51-60
Chou Paul B. , et al. , 1997, Automatic defect classification for semiconductor manufacturing.,
Machine Vision and Applications, Vol. 9, No. 4, pp. 201-214
Sheng-Uei Guan, Pin Xie, Hong Li., 2003, A golden-block-based self-refining scheme
for repetitive patterned wafer inspections., Machine Vision and Applications, Vol.
13, No. 5, pp. 314-321
Liu Hongxia, et al. , 2010, Defect detection of IC wafer based on two-dimension wavelet
transform., Microelectronics journal, Vol. 41, No. 2-3, pp. 171-177
Chen Ssu-Han, Kang Chih-Hsiang, Perng Der-Baau, 2020, Detecting and Measuring Defects
in Wafer Die Using GAN and YOLOv3., Applied Sciences, Vol. 10, No. 23
Nakazawa Takeshi, Kulkarni Deepak V., 2019, Anomaly detection and segmentation for
wafer defect patterns using deep convolutional encoder-decoder neural network architectures
in semiconductor manufacturing., IEEE Transactions on Semiconductor Manufacturing,
Vol. 32, No. 2, pp. 250-256
Cheon Sejune, et al. , 2019, Convolutional neural network for wafer surface defect
classification and the detection of unknown defect class., IEEE Transactions on Semiconductor
Manufacturing, Vol. 32, No. 2, pp. 163-170
Lin Hui, et al. , 2019, Automated defect inspection of LED chip using deep convolutional
neural network., Journal of Intelligent Manufacturing, Vol. 30, No. 6, pp. 2525-2534
Chen Xiaoyan, et al. , 2020, A Light-Weighted CNN Model for Wafer Structural Defect
Detection., IEEE Access, Vol. 8, pp. 24006-24018
Koch Gregory, Richard Zemel, Ruslan Salakhutdinov, 2015, Siamese neural networks for
one-shot image recognition., ICML deep learning workshop, Vol. 2
Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla, 2015, Bayesian segnet: Model
uncertainty in deep convolutional encoder-decoder architectures for scene understanding.,
arXiv preprint arXiv:1511.02680
Isobe Shuya, Arai Shuichi, 2017, Deep convolutional encoder-decoder network with model
uncertainty for semantic segmentation., 2017 IEEE International Conference on INnovations
in Intelligent SysTems and Applications (INISTA) IEEE
Nair Tanya, et al. , 2020, Exploring uncertainty measures in deep networks for multiple
sclerosis lesion detection and segmentation., Medical image analysis, Vol. 59
Gal Yarin, Ghahramani Zoubin, 2016, Dropout as a bayesian approximation: Representing
model uncertainty in deep learning., international conference on machine learning
PMLR
Graves Alex, 2011, Practical variational inference for neural networks., Advances
in neural information processing systems
Srivastava Nitish, et al. , 2014, Dropout: a simple way to prevent neural networks
from overfitting., The journal of machine learning research, Vol. 15, No. 1, pp. 1929-1958
Kristiadi Agustinus, Matthias Hein, Philipp Hennig, 2020, Being bayesian, even just
a bit, fixes overconfidence in relu networks., International Conference on Machine
Learning. PMLR
Shen Yichen, et al. , 2021, Real-Time Uncertainty Estimation in Computer Vision via
Uncertainty-Aware Distribution Distillation., Proceedings of the IEEE/CVF Winter Conference
on Applications of Computer Vision.
Lu Xiankai, et al. , 2019, See more, know more: Unsupervised video object segmentation
with co-attention siamese networks., Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition
Varghese Ashley, et al. , 2018, ChangeNet: A deep learning architecture for visual
change detection., Proceedings of the European Conference on Computer Vision (ECCV)
Workshops
Dong Hongwen, et al. , 2019, PGA-Net: Pyramid feature fusion and global context attention
network for automated surface defect detection., IEEE Transactions on Industrial Informatics,
Vol. 16, No. 12, pp. 7448-7458
Chen Jie, et al. , 2020, DASNet: Dual attentive fully convolutional siamese networks
for change detection of high resolution satellite images., IEEE Journal of Selected
Topics in Applied Earth Observations and Remote Sensing
Daudt Rodrigo Caye, Saux Bertr Le, Boulch Alexandre, 2018, Fully convolutional siamese
networks for change detection., 2018 25th IEEE International Conference on Image Processing
(ICIP), IEEE
Paszke Adam, et al. , 2019, Pytorch: An imperative style, high-performance deep learning
library., arXiv preprint arXiv:1912.01703
Author
Eunjeong Choi received her B.S. and M.S. degrees in Electronic and Electrical Engineering
at the Ewha Womans University in Seoul, Korea, in 2016 and 2018, respectively. She
is currently pursuing Ph.D. in electronics engineering at the Ewha Womans University
in Seoul, Korea. She is interested in digital signal processing and machine learning
for machine vision, etc.
Jeongtae Kim received his B.S. and M.S. degrees in Control and Instru-mentation
Engineering from Seoul National University, Seoul, Korea, in 1989 and 1991, respectively.
From 1991 to 1998, he had worked for Samsung Electronics in Korea where he had been
engaged in the develop-ment of digital camcorder and digital TV. He received his Ph.D.
degree in Electrical Engineering and Computer Science from the University of Michigan,
Ann Arbor in 2004. Since 2004, he has been with the department of Electronic and Electrical
Engineering in the Ewha Womans University in Seoul, Korea, currently as a professor.
His research interests include statistical signal processing, image restoration, image
reconstruction, machine learning, etc.