Mobile QR Code QR CODE

  1. (Machine Intelligence and Data Science (MINDS) Lab., Incheon National University (INU), South Korea)
  2. (Department of Electronics Engineering, Incheon National University, Korea)



deep learning, artificial intelligence, big data, quality control, digital, semiconductor, smart manufacturing, image recognition

I. INTRODUCTION

Artificial Intelligence (AI) has transformed numerous industries by enabling computers to perform complex tasks such as data analysis, language processing, decision-making, and task execution---functions traditionally requiring human intelligence. Among its many applications, AI has significant potential in the manufacturing sector [1], where the adoption of machine learning-driven solutions is essential for maintaining competitiveness in a globalized economy. Smart manufacturing [2,3], leveraging AI capabilities, plays a critical role in meeting consumer demands for high-quality products, efficiency, and adaptability in production processes.

Manufacturing has long relied on strategies such as Total Quality Management (TQM) [4], Six Sigma, Lean Manufacturing, and Zero-Defect Manufacturing to improve efficiency, enhance product quality, and reduce costs. Despite these advancements, achieving consistent product quality remains a critical challenge, especially in high-precision sectors like sensor board manufacturing. Traditional quality inspection methods, which involve examining, testing, or measuring components against predefined specifications, are labor-intensive and prone to human error [5].

Research [6,7] has explored hand-crafting optimal feature representations for quality control problems. While these methods often deliver satisfactory results for specific problems, they lack generalizability to new challenges due to the unique characteristics of each problem, which may require distinct feature extraction techniques. Furthermore, as Harris [8] observed, the accuracy of manual inspections decreases as product complexity increases. A study by Sandia National Laboratories [9] reported that human operators correctly rejected defective precision-manufactured parts with an accuracy of 85%, while the industry average was only 80%, which is well below acceptable thresholds.

To address these challenges, machine learning-driven technologies, particularly computer vision and deep learning, are increasingly being adopted. By leveraging advanced algorithms and the availability of affordable data and computational resources, AI enables manufacturers to automate quality inspection processes. These solutions enhance accuracy and efficiency, while also providing real-time feedback to identify defective components early in the production cycle, thus reducing waste. Researchers have proposed various image-based defect detection methods. For instance, [10] utilized CNN for quality inspection at industrial sites, [11] applied a K-means clustering algorithm to detect casting surface defects, and [12] employed a sliding window CNN approach to analyze X-ray images for fault detection in casting products. Despite the rapid advancement of ML in manufacturing, several challenges remain. The effectiveness of models is heavily influenced by the quality, diversity, and availability of training data. Poor-quality or insufficient datasets can result in biased or underperforming models, limiting their ability to generalize to real-world scenarios.

To overcome these limitations and automate manufacturing quality control while minimizing human fatigue, errors, and labor, computer vision plays a central role. In this paper, we propose a novel approach using a CNN-based model to inspect sensor boards during manufacturing. The algorithm extracts feature from raw sensor board data and automatically classifies boards as defective or non-defective with high accuracy and reliability, requiring minimal human intervention. This work investigates the design of CNN architectures to develop a robust and generalizable method for sensor board quality control. Additionally, it examines the critical role of data quality---particularly volume and class diversity---in overcoming challenges such as data bias and insufficient representation.

By effectively addressing these issues, this research provides actionable insights for integrating ML [13] into quality control systems. The findings not only advance ML applications in manufacturing but also serve as a valuable resource for researchers and practitioners aiming to optimize production processes and achieve zero-defect manufacturing.

The paper is organized as follows. Section II reviews the related work. Section III proposes the method for quality inspection, detailing the database, model architecture, and training process. Section IV presents the result analysis, which includes the metrics used, model component analysis, and the impact of data quality on model performance. Finally, Section V concludes the paper.

II. RELATED WORK

The manufacturing industry has traditionally relied on manual methods for defect detection, but these approaches often suffer from limitations in accuracy and efficiency. For example, inspectors may overlook defects due to fatigue or prolonged observation, leading to economic losses from undetected low-quality products [14] and diminished reliability in quality assurance [15]. Although manual methods can be effective in some contexts, the growing demand for more precise and efficient systems highlights their shortcomings.

The integration of technology, particularly through Internet of Things (IoT) sensors, has shown great promise in enhancing defect detection capabilities |cite{16}. This shift towards automation is crucial, as it not only boosts operational efficiency but also ensures higher product quality, thereby reducing the economic impact of manufacturing defects. Advances in machine learning, especially Convolutional Neural Networks (CNNs), have become central to automating defect detection. CNNs have demonstrated exceptional performance in analyzing images for defect identification. For instance, [17] reports a significant improvement in accuracy and speed when detecting micro-defects on screws using CNNs. Similarly, [18] highlights the effectiveness of CNNs in identifying micro-defects on fabric surfaces, outperforming traditional image processing methods. Additionally, [19] demonstrates the use of saliency-based methods in fabric defect detection, leveraging high-level semantic information that traditional approaches often miss.

Advanced analytical tools also play a crucial role in processing large-scale data, enabling manufacturers to extract valuable insights for optimizing decision-making and operations. These technologies have been successfully applied to detect surface defects on steel sheets [20], inspect fabric [21], and improve quality assurance in semiconductor manufacturing [22]. Such advancements have significantly transformed automated quality assurance processes, enhancing overall product performance.

The benefits of deep learning extend beyond manufacturing to other industries. For instance, Yingjie Qiao's work on oracle image classification [23] illustrates how deep learning improves image recognition accuracy. Similarly, advanced vision-based systems enhance vehicle detection and tracking [24], while deep learning applied to MEMS sensor data advances human gait analysis [25], showcasing its potential in healthcare and biometrics.

While manual defect detection methods have historically played a vital role, the integration of IoT and machine learning---particularly CNNs---represents a transformative shift. These advancements not only improve accuracy and efficiency but also pave the way for adaptive, sophisticated quality control mechanisms capable of meeting the complex demands of modern manufacturing.

III. METHODOLOGY

In this section, we will first define the problem statement for extreme classification, then provide details on our proposed model, and finally discuss the objective function and evaluation metrics.

1. Problem Statement

The proposed deep neural network (DNN) model utilizes sensor image data as an input and output the result that is categorized either ``good'' or ``defective''. Let $\{x_{1}$, $x_{2}$, ..., $x_n\}$ and $Y \in \{y_{1}$, $y_{2}$, ..., $y_n\}$ be $n$ sensor image and its corresponding image category. The primary goal of this study is to develop a prediction model $f$, which uses input $X_i$ to categorize the input to $Y_i$ The model $f$can be defined, as follows:

(1)
$ Y_i=f\left(X_i,\ \theta \right), $

2. Model Architecture

We propose a convolutional neural network-based architecture for extreme sensor image classification, named as XCNet. The model comprises three main components: (i) a feature extraction network, which employs convolution and pooling layers to extract features from the input image while reducing its spatial resolution progressively, (ii) a multi-layer perceptron (MLP) network, which transforms features into higher-level abstract representations, and (iii) a classification head, which maps these representations to the final output classes. A schematic representation of the proposed model is illustrated in Fig. 1.

The feature extraction network takes an image and extract the features of sensor image using a series of stacked convolutional blocks, where each block varies in number of the feature maps and their resolutions. Each convolutional block is designed with three components: a convolutional layer, an activation layer, and a pooling layer. Let there be $L$ convolutional blocks. The output of the $j${th} channel of $l${th} convolutional block $(o^j_l)$ can be expressed as

(2)
$ O^j_l=pool\left(\sigma \left(\sum^{c_{l-1}}_{k=1}{(W^j_lx^k_l+b^j_l)}\right)\right) \in \left[1,~ c_l\right], $

where $W^j_l$ and $b^j_l$ are the weights and biases of the $l${th} convolutional block, $j$ is the channel index, $c$ denotes the number of convolutional filters, $\sigma $ represents the activation function, and $pool$ denotes the pooling operation applied to the activation output.

The output of the feature extraction network is concatenated into a dense vector and then passed through a Multi-Layer Perceptron (MLP). To reduce the computational overhead, we use global average pooling (GAP) layers to extract a single representative feature from each convolutional channel, as described in Eq. (3). This approach is more efficient than traditional flattening layers, as it significantly reduces the size of the feature vector without compromising model performance.

(3)
$ O^{gap}_L=GAP([O^1_L,~O^2_L,~\dots ,~O^{c_L}_L]). $

The MLP comprises a sequence of three fully connected layers, with each layer followed by a non-linear activation function. The output of the MLP $\left(\hat{y}\right)\ $can be expressed as shown in Eq. (4).

(4)
$ {\hat{y}}_{mlp}=\sigma \left({\hat{y}}_{d_2}W_{d_3}+b_{d_3}\right). $

Here, ${\hat{y}}_{mlp}$ represent the output of MLP module, ${\hat{y}}_{d_{2}}$ denote the activation from the previous layer, $W_{d_{\mathrm{3}}}$ represents the weight matrix of the final (third) layer in the MLP module, and $b_{d_{\mathrm{3}}}$ denote the bias of the same layer.

Finally, the output of the MLP module is passed to the classifier head to compute the probability distribution for each class. This is achieved using a single fully connected layer followed by a softmax activation function. The overall output of the network can be expressed as

(5)
$ y=softmax\left({\hat{y}}_{mlp}W_o+b_o\right). $

Here, $W_{o}$ and $b_o$ represent the weight matrix and bias vector of the output layer, respectively, while $softmax$ denotes the softmax activation function.

Fig. 1. Overview of the proposed XCNet framework. The model comprises three main components: A feature extraction network, a multi-layer perceptron (MLP), and a classifier head. Different colors in the figure indicate the type of operations performed at each stage.

../../Resources/ieie/JSTS.2025.25.3.245/image1.png

3. Model Training

In this section, we first explain the objective function used for training and then present the pseudo-algorithm for overall model training process.

3.1 Objective Function

We used binary cross-entropy loss to train our proposed architecture, mathematically defined as follows:

(6)
$ \mathcal{L}(y,y^{'})=-\sum^c_{i=1}{y_i(\log(y^{'}_i))}, $

where $y$ represents the ground truth label, $y^{'}$ represents the predicted value, and $c$ denotes the number of categories, which in this case is $c=2$. This loss function measures the discrepancy between the predicted probabilities and the actual labels, penalizing predictions that deviate significantly from the ground truth.

3.2 Training Process of XCNet

Algorithm 1 outlines the training process of XCNet. Initially, the minority-class sensor images are augmented using horizontal and vertical flip methods to create a balanced dataset. This augmentation step ensures that the dataset is suitable for effective model training. Next, training batches are constructed by creating sequences of input sensor images{} and corresponding labels{} for all samples, as described in lines 3 to 8. These batches are stored in a set~$S$.

During the training phase, the model parameters $\theta $ are initialized, as mentioned in line 9. The training process then involves randomly selecting a batch of instances $S_b$ from $S$ and updating $\theta $ by minimizing the objective function $\mathcal{L}\left(\theta \right)$, using a gradient descent-based optimization algorithm like Adam, as in lines 10 to 13. This process is repeated iteratively until the predefined stopping criteria, such as a maximum number of epochs or convergence of the loss function, are met. At the end of the training process, the learned model $f$, represented by the optimized parameter set $\theta $, is produced as the output, as described in line 14.

IV. EXPERIMENTS

In this section, we will first define the problem statement for extreme classification, then provide details on our proposed model, and finally discuss on the objective function and evaluation metrics.

1. Experimental Setup

We conducted our experiments on a custom sensor image dataset characterized by extreme class imbalance, where one class is heavily underrepresented. A sample of sensor board images are shown in Fig. 2. To evaluate the effectiveness of our proposed method, we conducted the series of ablation studies to understand effect of individual component to propose the optimal architecture. All models were implemented using the TensorFlow deep learning library and trained on a single 12 GB NVIDIA TITAN Xp GPU.

Our sensor dataset consists of two classes: 998 good images and 35 defective images, with the defective class as the minority. This imbalance reflects real-world manufacturing scenarios, where machines predominantly produce good sensors and rarely generate defective ones. However, deep learning models are often biased toward the majority class, resulting in poor predictions for the minority class. Therefore, our experiments aimed to mitigate this bias and improve model performance and generalization. To address the class imbalance, we designed two cases:

⦁ Case I: The original, highly imbalanced dataset was used to without any modification to perform extreme class classification.

⦁ Case II: Data augmentation techniques, including horizontal and vertical flips, were applied to defective class to increase the sample size. This approach partially bridged the gap between the two classes and allowed us to explore the impact of balancing the dataset.

For both cases, the dataset was split into approximately 80% for training and 20% for testing. In Case I, 800 good images and 28 defective images were allocated for training, while the remaining 198 good images and 7 defective images were used for testing. In Case II, the defective class was augmented, increasing its training set to 80 defective images and its testing set to 35 defective images, while the good images remained the same as in Case I.

To address class imbalance, we selected horizontal and vertical flips as augmentation techniques because they preserve defect characteristics while increasing diversity in the minority class. Since defects on sensor boards are often orientation-invariant, these transformations expose the model to variations without introducing unrealistic distortions. Unlike geometric transformations such as rotation or color alterations, flipping ensures that defect patterns remain realistic while expanding the dataset.

All models were trained using the Adaptive Moment Estimation (Adam) optimizer with a learning rate of $1\times 10^{-3}$ and a batch size of 32, employing the binary cross-entropy loss function.

Table 1. Hyperparameters.

Parameter

Values

Model

XCNet

Dataset

Sensor Board Images

Good (G), Defective (D)

Train: Test = 80: 20

Case I

Original

Train

Test

G: 998

D: 35

G: 800

D: 28

G: 198

D: 7

Case II

Augmented

Train

Test

G: 998

D: 105

G: 800

D: 83

G: 198

D: 22

Loss

Binary Cross-entropy

Optimizer

Adam

Learning Rate

0.001

Batch Size

32

Epoch

80

Fig. 2. Sensor Board Dataset. (a)-(b) Good sensor images, (c)-(d) Defective sensor board images, (e)-(f) vertical and horizontal flip transformations of defective sensor board image (c).

../../Resources/ieie/JSTS.2025.25.3.245/image2.png

2. Evaluation Metrics

This paper addresses the challenge of classifying highly imbalanced datasets with a significantly underrepresented minority class. To evaluate model performance, we employ threshold-based metrics, including accuracy, recall, precision, and F1-score. Here, minority class recall corresponds to the true positive rate (TPR), while majority class recall corresponds to the true negative rate (TNR); further details can be found in Table 2.

In defect detection, the choice of evaluation metrics is critical for real-world manufacturing decisions. Although accuracy measures overall correctness across all classes, it can be misleading in imbalanced scenarios since a model that predicts only the majority class may still achieve high accuracy but miss most defects. Recall, on the other hand, is vital because it measures how many actual defects are correctly identified; missing a single defect can have severe cost, safety, or reliability implications. Precision complements recall by indicating how many predicted defects are genuinely defective, thus minimizing false positives that could disrupt manufacturing or re-inspection efforts. Finally, the F1-score harmonizes recall and precision, balancing the need to catch every defect with the need to avoid excessive false alarms. Focusing on these metrics---especially recall and F1-score---ensures the model robustly identifies defects without overwhelming production lines with unnecessary rechecks, ultimately supporting a more reliable and efficient defect detection process. The mathematical definitions of these metrics are as follows:

(7)
$ Recall=\frac{TP}{TP+FN}, $
(8)
$ Precision=\frac{TP}{TP+FP}, $
(9)
$ Accuracy=\frac{TP +TN}{TP+FN+FP+TN}, $
(10)
$ F1- score =2\times \frac{Precision \times Recall}{Precision + Recall}. $

Table 2. The confusion matrix.

True Class

Predicted Class

Minority

Majority

Minority

True Positive

(TP)

False Negative (FN)

Majority

False Positive

(FP)

True Negative

(TN)

3. XCNet Implementation

The overall framework of XCNet is illustrated in Fig. 1. The model processes sensor input with a resolution of $H\times W$ and $C$ channels, where $H=W=224$ and $C=3$. The input image passes through a feature extraction network comprising five convolutional blocks, each consisting of a 2D convolutional layer with a $3 \times 3$ filter, a dilation rate of $1 \times 1$, and zero padding to preserve spatial dimensions. Each convolutional layer is followed by a ReLU [26] activation function and a MaxPooling2D layer with a $2 \times 2$ filter size, which reduces the spatial resolution of the feature map by half along both the height and width axes while doubling the number of channels. After feature extraction, the output feature map has dimensions $\frac{H}{32}\times \frac{W}{32}\times 512$.

A GlobalAveragePooling2D (GAP) layer is then applied to aggregate spatial information into a single value per channel, resulting in a 512-dimensional feature vector. This layer enables the network to focus on the presence of features rather than their spatial location, reduces computational cost, and mitigates overfitting compared to flattening layers. The GAP output is passed to a Multi-Layer Perceptron block comprising three fully connected layers with output sizes of 256, 128, and 10, respectively. Each layer is followed by a ReLU activation and a Dropout layer with a rate of 20% to regularize the model by reducing overfitting. Finally, the MLP block output is passed through a classifier layer, a fully connected layer with an output size of 2 (one for each class), which employs a softmax activation function to generate a probability distribution. The class with the highest probability is selected as the predicted output for the given image.

4. Result Analysis on XCNet

In this section, we first investigate the impact of each component of the XCNet model on classification performance, followed by an analysis of how the quality of training data influences model effectiveness.

4.1 Ablation Study

The ablation study presented in Table 3 investigates the impact of varying the number of convolutional blocks and filter configurations on classification performance under different levels of class imbalance. Additionally, it provides a comprehensive analysis of computational efficiency, including model complexity (parameter count and GFLOPs) and inference speed, critical factors for real-time defect detection in industrial settings. The dataset consists of two training ratios, 1:10 and 1:50, representing different levels of minority class underrepresentation.

Four architectures were explored with convolutional blocks configured as [32, 64, 128], [32, 64, 128, 256], [16, 32, 64, 128, 256], and [32, 64, 128, 256, 512]. These configurations progressively increase network depth and filter sizes, enhancing feature extraction and enabling finer defect detection. Larger filters improve receptive fields, capturing structural variations in sensor board defects while maintaining computational efficiency.

For the 1:10 ratio, deeper architectures, such as AN3 ([16, 32, 64, 128, 256]) and A.N. 4 ([32, 64, 128, 256, 512]), significantly outperformed shallower configuration. These models achieved high accuracy (99.09% and 99.55%, respectively) and precision (100%), while also improving recall for the minority class (90.90% and 95.45%). The F1-scores of these configurations (95.23% and 95.45%) highlight their ability to balance sensitivity and precision effectively. Conversely, shallower architectures, like A.N. 1 ([32, 64, 128]), show significant drop in recall, achieving only 40.90% and F1-score of 58.05%, despite maintaining high overall accuracy (94.09%).

For the 1:50 ratio scenario, the performance gap between shallow and deep architectures became even more visible due to the severe class imbalance. A.N. 5 ([32, 64, 128]) struggled, with a minority class recall of 21.05% and an F1-score of 33.33%, demonstrating its limitations in handling extreme imbalance. In contrast, the A.N. 8 ([32, 64, 128, 256, 512]) architecture achieved the better performance, with an accuracy of 98.61%, precision (100%), and an F1-score of 91.42%. This configuration showed substantial improvements in recall (84.21%), indicating its effectiveness in capturing minority class instances even in highly imbalanced scenarios.

Computational efficiency and scalability. Shallower models (A.N.~1, A.N.~5) have fewer parameters (0.2~M) and lower FLOPs (1.18~G) but struggle to capture minority-class instances effectively. Deeper architectures generally require more resources but substantially improve recall. For example, A.N.~3 retains the same 1.0~M parameters like A.N.~2 but reduces FLOPs (0.90~G vs.~1.75~G) and inference time (3.75~ms vs.~4.65~ms), while achieving better performance due to enhanced feature extraction. Building on this, A.N.~4, which employs larger convolutional filters, has 4.0~M parameters and 2.31~G FLOPs, but maintains an inference time of 5.63~ms, making it a viable option for real-time applications while delivering the highest recall and F1-score.

These findings confirm that increasing the network depth and filter size improves performance in heavily imbalanced scenarios but also offer competitive inference speeds, making deeper architectures like A.N. 4 and A.N. 8 suitable for real-world manufacturing environments where real-time defect detection is critical.

Table 3. Ablation study on number of convolution blocks and filter configuration. Here, ``AN'' denotes the analysis number, ``Param (M)'' represents the model parameters in millions, and ``Inf. (ms)'' indicates the inference speed in milliseconds. ``Acc.'' stands for accuracy, while ``Pre.'' refers to precision.

AN

Data

(Defect, Good)

Convolution Blocks

& Filters

Complexity

Evaluation Metrics (%)

Confusion Matrix

Param

(M)

FLOP

(G)

Inf. (ms)

Acc.

Pre.

Recall

F1-

Score

[TP, FN, FP, TN]

1

Train:

(80, 800)

Ratio: 1:10

Test: (22,19)

[32, 64, 128]

0.2

1.18

4.03

94.09

100

40.90

58.05

[9, 13, 0, 198]

2

[32, 64, 128, 256]

1.0

1.75

4.65

95.90

100

59.09

74.28

[13, 9, 0, 198]

3

[16, 32, 64, 128, 256]

1.0

0.59

3.75

99.09

100

90.90

95.23

[20, 2, 0, 198]

4

[32, 64, 128, 256, 512]

4.0

2.31

5.63

99.55

100

95.45

97.67

[21, 1, 0, 198]

5

Train:

(16, 800) Ratio: 1:50

Test: (19,19)

[32, 64, 128]

0.2

1.18

6.71

92.62

80

21.05

33.33

[4, 15, 1, 197]

6

[32, 64, 128, 256]

1.0

1.75

4.58

94.93

90

47.36

62.06

[9, 10, 1, 197]

7

[16, 32, 64, 128, 256]

1.0

0.59

3.82

97.69

93

78.95

85.70

[15, 4, 1, 197]

8

[32, 64, 128, 256, 512]

4.0

2.31

5.49

98.61

100

84.21

91.42

[16, 3, 0, 198]

4.2 Analysis of Data Distribution on Model Performance

This section provides an in-depth analysis of how variations in the composition of training data affect the performance metrics of the model in two cases, Case I and Case II. In Case I, the model is trained with the original data distribution, while in Case II, the minority class sample is augmented. Details of the dataset are discussed in Subsection 4.1.

Fig. 3 shows four graphs, each illustrates the performance metrics (accuracy, precision, recall and F1-Score) for both cases, plotted against the fraction of the total good-class training images. The x-axis represents the fraction of 800 good-class images used for training, while the number of minority-class images is fixed at 28 for Case I and 80 for Case II. The testing dataset remains constant in both cases. This comparative evaluation demonstrates the significant impact of data distribution, particularly the size of the minority class, on model performance.

Fig. 3(a) illustrates the accuracy metrics used to analyze the impact of data distribution on model performance. In both cases, the model's accuracy progressively improves as the fraction of good-class training data increases. However, Case II starts with a higher initial accuracy (approximately 80%) and reaches near-perfect accuracy (100%) much faster than Case I. This rapid improvement can be attributed to the larger minority-class size in Case II, which mitigates class imbalance and allows the model to effectively learn the dominant class, even with limited good-class training data. In contrast, the improvement in Case I is slower, likely due to its smaller minority class, which creates a greater imbalance and requires additional good-class training data to achieve comparable accuracy. Overall, Case II demonstrates a clear advantage in accuracy across all training levels, highlighting the critical role of managing class distributions to optimize model performance.

As shown in Fig. 3(b), precision follows a similar trend, with Case II achieving near 100% precision early, whereas Case I exhibits a more gradual increase. The superior performance of Case II can be attributed to its larger minority-class size, which helps reduce overall class imbalance. This improved balance enables the model to effectively minimize false positives, even with limited good-class training data. In contrast, Case I, with its smaller minority class, struggles to achieve comparable precision under similar conditions.

Fig. 3(c) highlights model performance in terms of recall stability and sensitivity under different data distributions. Unlike accuracy and precision, which show stable and gradual improvement as training data increases, recall behaves differently. In Case I, recall fluctuates at lower training fractions, with noticeable dips indicating inconsistent sensitivity in detecting the minority class. This instability likely arises from the smaller minority-class size in the training dataset. In contrast, recall in Case II remains consistently high across all training fractions, demonstrating the model's robustness in detecting the minority class. The larger minority class in Case II mitigates class imbalance, ensuring that the good class receives adequate representation during training.

Fig. 3(d) illustrates the F1-score, which combines precision and recall into a single metric to provide a holistic measure of the model's classification performance. Case II consistently outperforms Case I, with a rapid increase in F1-score that stabilizes near 100%, even at lower fractions of good-class training data. This superior performance can be attributed to Case II's larger minority-class size, which enhances the model's ability to distinguish between classes and reduces the trade-off between precision and recall. In contrast, Case I shows a slower, more gradual improvement in F1-score, remaining consistently lower across all training levels. This slower progression reflects challenges in balancing precision and recall caused by the smaller minority class, which intensifies class imbalance and limits the model's ability to optimize performance with less training data.

The key observations and implications are as follows: One significant finding is the impact of minority-class size on model performance. Case II, with a larger minority-class size, consistently outperforms Case I across all metrics, demonstrating that a more balanced class distribution improves the model's learning and generalization. Another important insight is the sufficiency of training data. Case II achieves near-optimal performance with fewer training images, highlighting the efficiency of balanced datasets in achieving high accuracy and other key metrics with reduced data. Both cases show performance improvements as training data increases. However, Case II excels in accuracy, precision, recall, and F1-score, even with smaller training fractions. This stability underscores the importance of balanced class distributions for reliable and consistent model performance.

Fig. 3. Data quality study. The plots show the effect of varying the proportion of good quality training data on (a) Accuracy, (b) Precision, (c) Recall, and (d) F1-Score for two cases: Case I (original data) and Case II (augmentation data).

../../Resources/ieie/JSTS.2025.25.3.245/image3.png

4.3 Result Analysis

Table 4 compares XCNet with other state-of-the-art (SOTA) models, including VGG16 [27], ResNet34 [28], ResNet50 [28], ViT-Tiny [29], and ViT-Base [29], all pretrained on ImageNet1k. While these models perform well on large-scale datasets, they struggle to adapt to the small, domain-specific defect dataset, particularly in cases of severe class imbalance. ViT-Base, for instance, with 86.90 million parameters and 17.58 GFLOPs, fails to surpass XCNet's performance, likely due to the limited training data ($<1000$ images) and the domain shift from natural images to industrial defect images.

In contrast, XCNet delivers higher recall and F1-scores under both 1:10 and 1:50 imbalances, attaining 99.45% and 98.62% accuracy, respectively, while also maintaining perfect precision (100%). From a complexity standpoint, XCNet requires substantially fewer parameters (4.0~M) and GFLOPs (2.31~G) than the larger SOTA models---VGG16 with 138.36~M parameters and 15.47~GFLOPs or ViT-Base with 86.90~M parameters and 17.58~GFLOPs. This efficient combination of strong performance and moderate computational requirements underscores XCNet's suitability for real-world manufacturing scenarios, where data are scarce and real-time defect detection is critical.

Table 4. Result Comparison with other state-of-the-art models. Here, Inference (ms) refers to model inference time in mili-seconds. Best result is shown in bold.

Data

(Defect, Good)

Model

Complexity

Evaluation Metrics (%)

Confusion

Param

(M)

FLOP

(G)

Inference (ms)

Accuracy

Precision

Recall

F1-score

[TP, FN, FP, TN]

Train:

(80,800)

Ratio: 1:10

Test:

(22:198)

VGG16 [27]

138.36

15.47

11.60

98.18

95

86.36

90.47

[19, 3, 1, 197]

ResNet34 [28]

22.10

3.67

8.16

98.64

95.23

90.90

93.01

[20, 2, 1, 197]

ResNet50 [28]

25.90

4.11

9.34

99.45

100

95.45

97.67

[21, 1, 0, 198]

ViT-Tiny [29]

5.80

1.26

5.45

99.09

100

90.90

95.23

[20, 2, 0, 198]

ViT-Base [29]

86.90

17.58

12.02

99.45

100

95.45

97.67

[21, 1, 0, 198]

XCNet

4.00

2.31

5.63

99.45

100

95.45

97.67

[21, 1, 0, 198]

Train: (16,800)

Ratio: 1:50

Test: (19:198)

VGG16 [27]

138.36

15.47

11.23

94.47

70.59

63.16

66.67

[12, 7, 5, 193]

ResNet34 [28]

22.10

3.67

8.35

95.39

73.68

73.68

73.68

[14, 5, 5, 193]

ResNet50[28]

25.90

4.11

9.29

96.31

82.35

73.68

77.77

[14, 5, 3, 195]

ViT-Tiny [29]

5.80

1.26

5.37

95.85

77.78

73.68

75.67

[14, 5, 4, 194]

ViT-Base [29]

86.90

17.58

11.72

98.16

100

78.94

88.23

[15, 4, 0, 198]

XCNet

4.00

2.31

5.49

98.62

100

84.21

91.43

[16, 3, 0, 198]

4.4 Analysis on Overfitting to Synthetic Patterns

Figs. 4 and 5 analyze the potential risk of overfitting, specifically whether the model learns artificial patterns from data augmentation instead of genuine defect features. To investigate this, we applied augmentation only to the minority class in both the training and test datasets. Each augmented set includes both original and modified images, ensuring that synthetic images do not dominate the evaluation.

In Fig. 4, the training loss (showing results from the original training dataset; other graphs with similar trends are omitted for clarity) remains steady and stable, while test losses across various augmentations (e.g., flipped images) show no abrupt spikes---indicating that the model learns robust features rather than memorizing synthetic patterns. Fig.~5 further underscores this robustness by showing steady improvements in recall and F1-scores for each augmented scenario. The fact that performance consistently increases as more augmentations are introduced suggests that XCNet acquires generalizable defect features rather than relying on artificially introduced cues. This combined findings strongly support the effectiveness of our augmentation strategy in helping XCNet learn meaningful defect characteristics without overfitting to synthetic patterns.

Fig. 4. Training and test loss under various augmentation scenarios.

../../Resources/ieie/JSTS.2025.25.3.245/image4.png

Fig. 5. Recall and F1-Scores comparisons across different augmentation scenarios.

../../Resources/ieie/JSTS.2025.25.3.245/image5.png

4.5 Adaptability to Other Semiconductor Products

Although our work focuses on sensor board defect detection, XCNet can easily be adapted to other semiconductor components and industrial products. Numerous studies confirm the versatility of CNN-based architectures for various defect detection tasks, from wafer map analysis [30] to surface flaw identification [31,32]. Transfer learning has further demonstrated CNN adaptability to different data distributions [33,34]. Building on these findings, XCNet's emphasis on efficient feature extraction and robust classification requires only minimal adjustments---such as domain-specific data augmentation or slight architectural tweaks---to detect defects across diverse industrial contexts. This adaptability highlights XCNet's potential to make a broader impact on semiconductor inspection and defect detection in a wide range of manufacturing scenarios.

V. CONCLUSION

In this work, we introduced XCNet, a convolutional neural network-based solution for automated defect detection in sensor boards. XCNet addresses the limitations of traditional manual inspection methods by significantly improving inspection accuracy and efficiency while minimizing human intervention, making it a robust tool for modern manufacturing environments.

We conducted a comprehensive ablation study to evaluate the impact of XCNet's architectural components on model performance. By experimenting with different numbers of convolutional blocks and filter configurations, we demonstrated that deeper architectures consistently outperformed shallower ones, particularly in handling class imbalance. These configurations achieved high accuracy while showing substantial improvements in minority class recall and F1-scores.

Our analysis further highlighted the critical role of data quality and class balance in determining model performance. The study on data augmentation for the minority class showed significant advantages, achieving near-perfect accuracy and precision with fewer training samples. These findings underscore the importance of both architectural design and data preprocessing in improving model performance for imbalanced classification tasks. By automating defect detection, XCNet reduces costs, enhances efficiency, and ensures product reliability. This work contributes to the advancement of intelligent manufacturing systems, providing valuable insights for addressing class imbalance and optimizing deep learning models for quality control.

References

1 
J. Serey, M. Alfaro, G. Fuertes, M. Varhas, C. Durán, R. Ternero, R. Rivera, and J. Sabattin, ``Pattern recognition and deep learning technologies, enablers of Industry 4.0, and their role in engineering research,'' Symmetry, vol. 15, no. 2, 535, 2023.DOI
2 
J. Wang, Y. Ma, L. Zhang, R. X. Gao, and D. Wu, ``Deep learning for smart manufacturing: Methods and applications,'' Journal of Manufacturing Systems, vol. 48, pp. 144-156, 2018.DOI
3 
S. Sundaram and A. Zeid, ``Artificial intelligence-based smart quality inspection for manufacturing,'' Micromachining, vol. 14, no. 2, 570, 2023.DOI
4 
E. Baran and T. K. Polat, ``Classification of Industry 4.0 for total quality management: A review,'' Sustainability, vol. 14, no. 6, 3329, 2022.DOI
5 
T.P. Nguyen, S. Choi, S.J. Park, and J. Yoon, ``Inspecting method for defective casting products with convolutional neural network (CNN),'' International Journal of Precision Engineering and Manufacturing-Green Technology, vol. 8, pp. 583-594, 2021.DOI
6 
F. Pernkopf and P. O'Leary, ``Visual inspection of machined metallic high-precision surfaces,'' EURASIP Journal on Advances in Signal Processing, vol. 2002, pp. 1-12, 2002.DOI
7 
X. Jiang, P. Scott, and D. Whitehouse, ``Wavelets and their applications for surface metrology,'' CIRP Annals, vol. 57, no. 1, pp. 555-558, 2008.DOI
8 
D. H. Harris, ``The nature of industrial inspection,'' Human Factors, vol. 11, no. 2, pp. 139-148, 1969.DOI
9 
J. E. See, ``Visual inspection reliability for precision manufactured parts,'' Human Factors, vol. 57, no. 8, pp. 1427-1442, 2015.DOI
10 
D. Weimer, B. Scholz-Reiter, and M. Shpitalni, ``Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection,'' CIRP Annals, vol. 65, no. 1, pp. 417-420, 2016.DOI
11 
F. Riaz, K. Kamal, T. Zafar, and R. Qayyum, ``An inspection approach for casting defects detection using image segmentation,'' Proc. of 2017 International Conference on Mechanical, System and Control Engineering (ICMSC), pp. 101-105, 2017.DOI
12 
M. Ferguson, R. Ak, Y.-T. T. Lee, and K. H. Law, ``Automatic localization of casting defects with convolutional neural networks,'' Proc. of 2017 IEEE International Conference on Big Data (Big Data), pp. 1726-1735, December 2017.DOI
13 
M. I. Jordan and T. M. Mitchell, ``Machine learning: Trends, perspectives, and prospects,'' Science, vol. 349, no. 6245, pp. 255-260, 2015.DOI
14 
H. Xie and Z Wu, ``A robust fabric defect detection method based on improved RefineDet,'' Sensors, vol. 15, no. 15, 2020.DOI
15 
P. Murray, E. Yakushina, S. Marshall, and W. Lon, ``Automated microstructural analysis of titanium alloys using digital image processing,'' IOP Conference Series: Materials Science and Engineering, vol. 179, 012011, 2017.DOI
16 
M. M. Islam, A. A. Mintoo, and A. S. M. Saimon, ``Enhancing textile quality control with IoT sensors: A case study of automated defect detection, Global Mainstream Journal, vol. 1, no. 1, pp. 19-30, 2024.DOI
17 
J. Breitenbach, I. Eckert, V. Mahal, H. Baumgartl, and R. Buettner, ``Automated defect detection of screws in the manufacturing industry using convolutional neural networks,'' Proc. of the 55th Hawaii International Conference on System Sciences, 2022.DOI
18 
L. Song, X. Li, Y. Yang, X. Zhu, Q. Guo, and H. Yang, ``Detection of micro-defects on metal screw surfaces based on deep convolutional neural networks,'' Sensors, vol. 18, no. 11, 2018.DOI
19 
Z. Liu, B. Tian, X. Li, C. Li, and Y. Dong, ``Saliency-based fabric defect detection network with feature pyramid learning and refinement,'' Proc. of Fourteenth International Conference on Graphics and Image Processing (ICGIP 2022), vol. 12705, 127050N, 2023.DOI
20 
S. Zhou, Y. Chen, D. Zhang, J. Xie, and Y. Zhou, ``Classification of surface defects on steel sheet using convolutional neural networks,'' Materiali in Tehnologije/Materials and Technology, vol. 51, no. 1, pp. 123-131, 2017.DOI
21 
A. S˛eker, K. A. Peker, A. G. Yüksek, and E. Delibas, ``Fabric defect detection using deep learning,'' Proc. of 2016 24th Signal Processing and Communication Application Conference (SIU), IEEE, pp. 1437-1440, 2016.DOI
22 
S.-H. Huang and Y.-C Pan, ``Automated visual inspection in the semiconductor industry: A survey,'' Computers in Industry, vol. 66, pp. 1-10, 2015.DOI
23 
Y. Qiao and L. Xing, ``Automatic classification method for Oracle images based on deep learning,'' IEIE Transactions on Smart Processing and Computing, vol. 12, no. 2, pp. 87-96, April 2023.DOI
24 
S. P. Yadav, ``Vision-based detection, tracking, and classification of vehicles,'' IEIE Transactions on Smart Processing and Computing, vol. 9, no. 6, pp. 427-434, December 2020.DOI
25 
M. N. Nguyen and T. Nguyen, ``Deep learning approaches to human gait pattern classification based on MEMS sensors,'' IEIE Transactions on Smart Processing and Computing, vol. 9, no. 4, pp. 184-292, August 2020.DOI
26 
J. He, L. Li, J. Xu, and C. Zheng, ``ReLU deep neural networks and linear finite elements,'' Journal of Computational Mathematics, vol. 38, no. 3, pp. 502-527, July 2018.DOI
27 
S. Karen and Z. Andrew, ``Very deep convolutional networks roe large-scale image recognition,'' arXiv preprint arXiv:1409.1556, 2014.DOI
28 
K. He, X. Zhang, S Ren, and J. Sun, ``Deep residual learning for image recognition,'' Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.DOI
29 
A. Dosovitskiy, ``An image is worth 16x16 words: Transformers for image recognition at scale,'' arXiv preprint arXiv:2010.11929, 2020.DOI
30 
Y. F. Yang and M. Sun, ``Semiconductor defect detection by hybrid classical-quantum deep learning,'' Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2323-2332, 2022.DOI
31 
D. Ujalambkar, C. Kulkarni, V. Navale, and N. P. Sable, ``Industrial product surface defect detection using CNN: A deep learning approach,'' Panamerican Mathematical Journal, vol. 34, no. 3, 2024DOI
32 
S. Arikan, K. Varanasi, and D. Stricker, ``Surface defect classification in real-time using convolutional neural networks,'' arXiv preprint arXiv:1904.04671, 2019.DOI
33 
B. Devika and N. George, ``Convolutional neural network for semiconductor wafer defect detection,'' Proc. of 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1-6, 2019.DOI
34 
J. Yang, S. Li, Z. Wang, H. Dong, J. Wang, and S. Tang, ``Using deep learning to detect defects in manufacturing: A comprehensive survey and current challenges,'' Materials, vol. 13, no. 24, 5755, 2020.DOI
Sachin Ranjan
../../Resources/ieie/JSTS.2025.25.3.245/author1.png

Sachin Ranjan received his diploma from Tribhuvan University, Nepal, in 2015 and his B.E. degree from Uttarakhand Technical University, India, in 2019. He is currently pursuing an M.S. degree in electronics engineering at Incheon National University (INU), South Korea, where he is working as a Research Assistant at the Machine Intelligence and Data Science (MINDS) Lab. His research interests include image processing, machine learning, computer vision, robotics, 6G mobile communication systems, and the Internet of Things (IoT).

Hoon Kim
../../Resources/ieie/JSTS.2025.25.3.245/author2.png

Hoon Kim received his B.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Korea in 1998, and his M.S. and Ph.D. degrees in engineering from Information and Communication University (ICU), Korea in 1999 and 2004, respectively. He had been working with Samsung Advanced Institute of Technology (SAIT) during 2004 to 2005, while serving as a Senior Engineer in Communications and Networks Laboratory Division joining the project of design and performance analysis of radio transmission technology for beyond 3G and 4G mobile communication systems. He also had been working with Ministry of Information and Communications (MIC) from 2005 to 2007 as a deputy director in Broadband Communications Division in charge of promotion policies on broadband communications industry such as WiMAX. He joined Stanford University as a visiting scholar and a visiting professor during 2007 to 2008 and 2014 to 2015, respectively, and worked on developing radio resource management algorithms and cross layer optimization schemes for 4/5G mobile communications systems. He is currently a Professor of the Department of Electronics Engineering at Incheon National University where he has been working with the same department since 2008. His research interests include 6G mobile communication systems, internet of things, artificial intelligence, and big data. He is a Member of KICS, IEIE, IEEE, and IEICE.