(Department of Electronics and Communication Engineering, SRM Institute of Science
and Technology / Kattankulathur-Chennai, Tamilnadu-603203, India)
Casting quality inspection, CNN, Industry 4.0, Machine learning
1. Introduction
Casting subjected to visual inspections is essential to improving productivity in
today’s industry. The process of finding defects in casting requires more time to
accomplish the inspection. Visual assessments are performed by experienced human auditors.
In any case, quality control by ordinary visual techniques is debilitating, prone
to mistakes, wasteful, and costly [9]. Hence, replacement of the traditional inspection process is required, and it must
be automated to deliver to customers zero-defect products that meet high-quality standards.
Quality control is a focal point of any modern industrial assembly. This intricate
process should be completed with a high level of accuracy and meticulousness [12]. Current industry requires advanced solutions to check programmed creation of products
and to distinguish defective materials. Smart checking is a useful functionality for
mechanical innovation frameworks or machines, and is a compulsory advance towards
programmed creation [1]. Machine learning (ML) algorithms are well-known techniques to handle the complex
task of visual quality inspection of a casting [10,11]. More manufacturers prefer to utilize ML and deep learning (DL) techniques to automate
quality assurance of products in order to reduce costs and time [2].
Open information by using vision gear is utilized to distinguish and report damaged
items, to determine the reasons for inadequacies, and to permit quick and effective
intercessions in modern industries [3]. The convolutional neural network (CNN) is the best technique for image classification
with rapid processing speed [7] and to separate the attributes of items, with the end goal being modern quality control,
which groups faulty and sound items via dataset preparation [5]. A CNN with X-ray images, called multi-optical image fusion (MOIF), is a promising
technology for improving performance, but training a model with this technique is
a complex and expensive process [4,13]. Similarly, surface-defect detection is done with three-dimensional data aided by
stereoscopic vision, laser scanners, and spotted light measurement methods with a
CNN [14]. For detecting internal cracks from the surface, the BoDoC methodology or AdaBoost
with a support vector machine (SVM) algorithm provides good precision. But concerning
computation time, it takes too long to execute the output [6,15]. Hence, to identify faults in casting products, a CNN provides the best solution
in automation.
The approach proposed in this paper improves the accuracy of defect detection in casting
products and is done through a CNN. The image dataset used in this work is an open-source
online dataset for defect detection in casting products. These datasets include two
classes of testing and training data: DEF and OK. Some sample images from the datasets
are in Fig. 1.
Fig. 1. Sample Casting Images.
2. Related Work
In this section, we briefly discuss the results of various approaches used to identify
defects in casting.
We deliberate in the following subsections the various methods included with a CNN
for detecting defects on the surface of casting products.
2.1 MVGG-19
The Visual Geometry Group (VGG) makes most of the various customary leveled multipath
designs of VGG19, while it associates early and late convolutional obstructs by developing
additional ways of feature map handling. In this approach, three BD layers (batch
normalization, drop out, global average pooling) yielding consecutive results are
connected and taken care of with a brain organization of 2500 neurons.
Fig. 2 Illustrate the architecture of MVGG19 The thought behind this adjustment is to detach
early and late separated highlights from a consecutive design, and interface them
in a straightforward manner with the classifier at the highest point of the organization.
Along these lines, the absolute removed highlights are expanded, and every way is
answerable for moving elements from the early and the late picture process. In this
manner, the BD blocks are not adding some other convolution activity, yet are simply
moving, normalizing, and decreasing their elements [1].
Fig. 2. Multipath VGG19[1].
Table 1. Summary of various defect detection approaches.
Parameters measured
Result Achieved
Visual Motif discovery
Casting plate defects
The achieved accuracy for quality is 97.14%
CNN with a Visual Geometry Group – multipath network (MVGG19)
Defect detection and object recognition in various industries. (Castings, tools, metal
surfaces, magnetic tiles, solar cells, bridge cracks)
Defect detection accuracy for casting was achieved is 97.88%
Photometric stereo algorithm with a custom segmentation network
Material surface defects (nickel)
Accuracy of 95.60% for NI surface defect detection.
Casting products
Validated with 400 products; achieved 98% accuracy
EfficientNet-B0 trained in a CNN for classification of products, and a Decision Tree
algorithm for the prediction process
Casting plate defects
Classification achieved 96.88% accuracy
Data augmentation with Wasserstein Generative Adversarial Nets (WGANs); feature extraction
through a CNN; multimodel ensemble to show the classified result.
Cast austenitic stainless steel (CASS)
Classification accuracy from MLP is 98.54% at 400°C/10 kh, which is high compared
to other classification methods like KNN and the SVM.
2.2 Motif Discovery Approach
This approach is combined with a CNN using the ResNet34 architecture. Fig. 3 shows the process flow of Motif approach.
This structure has two steps.
· Images are converted into a time series.
The images are changed to a time series by utilizing a histogram. It acquires the
power appropriation of the picture where the X-pivot addresses the force esteem, and
the Y-hub addresses the relating pixels that have that force esteem.
· To determine image motifs, use a time series.
In this approach, a triple succession table is prepared with the help of the variable
Euc. It stores values of the row, column, Euclidean distance, then Scount; the number
of images in the time series is Scount[i]. Si<r for Euclidean distances in the image
time series [2].
Fig. 3. The Motif Discovery Approach[2].
2.3 EfficientNet-B0
· This approach helps to increase the accuracy with minimal computational cost. It
is used for a new scaling process in neural networks called compound scaling. These
processes are done at the preprocessing stage. Then, the pre-trained images are fetched
into a simple convolutional neural network.
· The EfficientNet-B0 baseline network is the mobile baseline network to evaluate
the scaling method in convolutional networks. Table 2 shows the EfficientNet-B0 architecture [3].
Table 2. The EfficientNet-B0 baseline network[3].
Stage (i)
Operator (Fi)
Resolution (Hi x Wi)
Channels (Ci)
Layers (Li)
Conv 3x3
MB Conv1, k3x3
MB Conv6, k3x3
MB Conv6, k5x5
MB Conv6, k3x3
MB Conv6, k5x5
MB Conv6, k5x5
MB Conv6, k3x3
Conv 1x1 & pooling & FC
2.4 Photometric Stereo Approach
The photometric stereo approach right off the bat recovers surface directions and
can be joined with a mixed strategy to compute a stature or profundity map. Indeed,
even without a resulting reconciliation step, the surface directions can be utilized,
for instance, to decide the bend boundaries of article surfaces. To obtain photometric
stereo pictures, the item is continuously illuminated by several light sources, illustrated
in Fig. 4.
To inspect a component with this approach, four light sources are placed at different
angles. The gathered five photometric images are, a curvature image, a texture image,
gradient images (X & Y), and a range image. Furthermore, these images are processed
by the CNN for image classification. The processing time to detect defects in a product's
surface was 707ms with a CPU.
Fig. 4. Illustration of the Photometric Stereo Approach.
2.5 Research Gap Identified
A literature review showed the capacity of a CNN for programmed deformity discovery.
In addition, the significance of the picture quality, which is exceptionally related
to obtaining a satisfactory setup, is indicated by the attributes of the surface to
be investigated. Uniquely, in specular surfaces, geographic data are extremely pertinent,
and hence are a significant wellspring of information on location. Moreover, handling
time is basic in adapting to the review of parts in high-creation-rate situations.
The following are the findings from the existing methods.
· Refinement is required to prepare a trained model to improve accuracy.
· Optimization is needed to improve the processing time of a system.
· Improvising precision in defect identification is done with real-time augmented
· Robustness is required to train the model to reduce the false rejection rate from
the input datasets.
To provide the best solution for the findings mentioned above, we propose a CNN with
a DenseNet architecture for defect detection in casting products. We demonstrate the
inspection system with large input image datasets to classify defective products.
Through this, manufacturing industries can ensure high quality in their products.
3. The Proposed System
The proposed system uses a CNN for model preparation, with DenseNet to classify the
images and detect defects using feature extraction. Compared with the best-trained
model with a CNN, predictions of defective and non-defective casting products were
The best-trained model was prepared with 6633 training images and 695 testing images.
Here, the CNN uses a two-stage sequential process for model creation. Compilation
of the whole CNN process was done with the help of the Adam optimizer. Casting image
datasets were imported into Python using TensorFlow’s Keras from the local directory.
The training and testing data were found in imported image datasets, with each image
rescaled to 256${\times}$256 pixels from 512${\times}$512 pixels. Thus, the processing
time for identifying faults in the images was reduced.
The prepared model is utilized to foresee the classes of pictures that were not recently
remembered for the preparation and approval process. The arrangement will yield a
likelihood score between 0 and 1, and an edge set at 0.5 was determined to isolate
the classes. A likelihood score equivalent to, or more than, this limit is designated
as defective; all other cases are OK.
3.1 System Architecture
The overall process of the system architecture for casting quality visual inspection
using CNN with DenseNet is shown in Fig. 5.
The process flow of the architecture is as follows.
1. Casting image datasets are imported from the local directory using the image data
generator function to preprocessing the images.
2. The model is prepared with the help of a CNN architecture with two stages.
3. Then, the CNN will have a database from the best-trained model to find defects
in an image object.
4. Prediction of defective and non-defective datasets can be made through the linear
regression process of the system.
5. Results are produced by importing the sample image data for testing from the corresponding
Fig. 5. Overall system architecture for casting quality visual inspection with the CNN.
3.2 Data Preprocessing
The first and foremost step in this work is the preparation of data to normalize pixel
values (between 0 and 255) into a range between 0 and 1, aided by passing rescale
arguments in the Keras ImageDataGenerator for both training and testing sets.
From the data, 20% is for validation using the argument validation_split=0.2 in the
training and testing data generator, and by using the flow_from_directory~() function
during preprocessing. Fig. 6 shows the distribution ratio of the datasets.
Fig. 6. Distribution of the Datasets.
3.3 Preparation of the Trained Model
This section elaborates on the different processes involved in preparation by the
best-trained model using a CNN.
2D Convolution: This is used for filtering during image processing. Here, it is developed
with a 150${\times}$150 kernel with 32 filters in one stage; in another stage, the
kernel size is 75${\times}$75 with the same filter size in the 2D convolution. The
following equation is used for 2D convolution to provide a feature map from preprocessed
&l-Input\,\,image;\,\,m-Kernel; \\
a& b-rows\,\,and\,\,columns\,\,of\,\,output\,\,matrix\,\,indexes \\
p& q-indexes\,\,of\,\,kernel\,\,in\,\,the\,\,convolution
Max_pooling 2D: To reduce the dimensions of an image, this function is used. There
are two layers in this architecture: one is 75${\times}$75-dimension reduction from
the 150${\times}$150 kernel, and the other reduces the kernel size from 38${\times}$38
to 19${\times}$19.
Flatten: Used to generate one-dimensional data from 2D data.
DenseNet: Used to load the pretrained data images for defect identification.
Table 3 and Fig. 7 show detailed descriptions of each layer's kernel size.
As mentioned above, the CNN comprises various layers: a progression of convolutional
layers (with enactment), pooling layers, and one final, completely associated layer
that creates a bunch of class scores for a given picture. The convolutional layers
of the CNN go about as component extractors; they extricate shape and shading designs
from the pixels that benefit from preparing the pictures.
Fig. 7. The CNN with a DenseNet Architecture.
Table 3. The sequential model.
Layer (type)
Output Shape
No. of Param.
conv2d (Conv2D)
(None, 150, 150, 32)
max_pooling2d (MaxPooling2D)
(None, 75, 75, 32)
conv2d_1 (Conv2D)
(None, 38, 38, 64)
max_pooling2d_1 (MaxPooling 2D)
(None, 19, 19, 64)
flatten (Flatten)
(None, 23104)
dense (Dense)
(None, 128)
dense_2 (Dense)
(None, 1)
4. Results and Discussion
This section describes the various research gaps in other approaches. Refinement in
the CNN architecture with MVGG19 is needed to achieve more accuracy in identifying
casting defects [1]. The photometric stereo algorithm achieves less accuracy due to a lack of real-time
data acquisition [12]. Optimization is required for the motif discovery approach with a CNN to improve
accuracy [2]. EfficientNet-B0 with the CNN-based model training approach can provide good accuracy
(99%) for predictions from the model, but it needs more computation time to complete
the process of finding defects [3]. Real-time augmentation of defect detection is more simple than multi-optical image
fusion [4,8]. Modification in model creation for the CNN is needed to improve system accuracy
in fault identification [5]. The proposed system uses the casting product dataset available online from the Kaggle
website. Fig. 8 shows the performance of our proposed system compared with other approaches.
Fig. 8. Performance of the Proposed Approach Compared with Other Approaches.
4.1 Simulation Results
Fig. 9 shows the precision of the model, which (for the most part) increases, while errors
are reduced over time. Likewise, preparation and approval of bends are firmly adjusted,
indicating that the model does not cause overfitting, and may perform well when grouping
pictures from the testing dataset.
Epoch 20 provided the best performance with the following results:
99.64% training accuracy
99.40% validation accuracy
1.65% training loss
2.56% validation loss
The datasets used in this study are publicly available from Kaggle.com to prepare
the model to classify OK and DEF image products using approaches by the various researchers,
such as a CNN with MVGG, a CNN with the Motif Discovery approach, a CNN with EfficientNet-B0,
and a CNN with the photometric approach. In this work, we propose a CNN with DenseNet
to classify the images. Hence, researchers can replicate this work using a variety
of variations in the future.
Fig. 9. Simulation results from Accuracies Achieved.
Fig. 10. Confusion Matrix to Generate the Classification Report.
4.2 Evaluation Criteria
The confusion matrix represents the basic prediction results from the system, which
produces the outcome from binary classification. The data instances are predicted
as either positive or negative. The following predictions can be made through this
confusion matrix.
1. True Positive (TP): Correct Positive Prediction
2. True Negative (TN): Correct Negative Prediction
3. False Positive (FP): Incorrect Positive Prediction
4. False Negative (FN): Incorrect Negative Prediction
Accuracy: The accuracy from the images used to predict defective items can be obtained
from Eq. (2):
Precision: Precision finds all positive samples from the given datasets. It is obtained
from Eq. (3):
Recall: This cannot mark negative samples from the model as positive. It is obtained
from Eq. (4):
F1-score: The harmonic mean of precision is called the F1-score. It is obtained from
Eq. (5):
Table 4 shows a classification report from the proposed method. The mathematical expressions
used to find Precision, Recall, and F1-score are discussed below.
Table 4. Classification report from the proposed system.
Macro Avg.
Weighted Avg.
4.3 Performance Measures
Beyond finding the accuracy of the trained model, another critical point to investigate
in the system is processing time during an inspection. Measuring the time spent by
the system ensures continuous improvement in production. In this connection, our system
inspects castings in less processing time (around 454ms with a CPU).
Table 5 compares Precision, Recall and F1-score from existing methods for casting defect
detection against our system.
Table 5. Comparison of the proposed system versus existing method parameters.
MVGG19 Approach (Apostolopoulos and Tzani, 2022)
Motif Discovery (Bhatia, A.S et al. 2022)
EfficientNet-B0 (Benbarrad et al. 2021)
Photometric stereo approach (Saiz et al. 2022)
The proposed system
5. Conclusion
This work mainly focused on replacing the traditional visual inspection method used
in industry. Here, an optimized CNN architecture improves the accuracy in identifying
defects in casting products. The images used for our work are in RGB format. It is
essential to convert them to 2D for prediction of defective products. Then, the image
datasets are preprocessed by using TensorFlow’s Keras in Python. These preprocessed
datasets provide very high accuracy in image classification.
One limitation of this system is that it uses preprocessed 2D images for fault identification,
but 2D images are not sufficient to detect defects in the casting surface when the
defect is very small. Hence, the inspection of defective regions can be done in 3D
to get high precision in identifying small defects. There is a need to improve optimization
for depth analysis of an image in order to find the dimensions of the defects in industrial
casting products. Hence, future work needs to improvise small-defect detection in
the manufacturing of casting products.
The authors would like to thank [https://www.kaggle.com/ravirajsinh45/real-life-industrial-dataset-of-castingproduct]
for access to its publicly available casting datasets used in this work.
Vijayakumar Ponnusamy received his Ph.D. from SRM IST in 2018. He obtained his
Masters in Applied Electronics from the College of Engineering, Guindy, in 2006. In
2000, he received his B.Eng. in Electronics and Communication Engineering from Madras
University. He is currently a Professor in the ECE Department, SRM IST, Chennai, Tamil
Nadu, India. He is a Certified IoT Specialist and Data Scientist. He is also a recipient
of the NI India Academic Award for Excellence in Research (2015). His current interests
are in machine learning and deep learning, IoT-based intelligent system design, blockchain
technology, and cognitive radio networks. He is a senior member of IEEE.
E. Dilliraj received a B.Engg. in Electronics and Communication Engineering and
a Master’s degree in Embedded System Technologies from Anna University. He was an
Assistant Professor in Electronics and Communication Engineering at Prathyusha Engineering
College. His current interests are image processing, machine learning algorithms,
deep learning, artificial intelligence, computer vision, the IoT, and embedded systems.
His is currently a research scholar at the SRM Institute of Science and Technology.