ColemanSeung-Pil W.
HwangYoung-Sup2
-
(Dept. of Computer Science and Engineering, Sun Moon University, Korea spil3141@naver.com
)
-
(Dept. of Computer Science and Engineering, Sun Moon University, Korea young@sunmoon.ac.kr
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Android malware detection, Code item, Convolutional neural network, Grayscale image, Static analysis
1. Introduction
Smartphones have become a daily necessity in life. In the smartphone market, Android
is the most used operating system (OS), and it is still expanding its market share.
This increase in popularity also makes Android a target for developers with malicious
intentions. Android is a vulnerable operating system compared to other platforms because
it allows the installation of applications from multiple third-party markets. Most
third-party markets do not have anti-malware detection features, so the chance of
downloading a malicious application is high. Developers of malicious applications
are finding ways to develop attacks that are difficult to detect. Mobile malware seems
to be increasing and is finding new ways to avoid detection [8]. This means that there is a vital need for Android malware security.
The methods used in malware analysis can be divided into two groups: static and
dynamic methods. Static analysis is widely used by researchers and industry. It relies
on scanning disassembled code without executing the application to capture information.
The file is disassembled to obtain both syntactic and semantic information by exploiting
API calls, permission lists, and opcodes. In contrast, dynamic analysis involves methods
that can monitor the behavior of applications at runtime.
This study proposes a technique to detect Android malware effectively based on
converting malware binaries into images and then applying a machine learning technique
to them. Other methods merely convert the data section of the classes.dex file and
use it as features. However, our technique converts only a part of the data section,
the code item, into an image. The code item section is shown in Fig. 1 and was inspired by previous work [3]. The code item is the target of our APK file pre-processing step.
The rest of this paper is arranged as follows. Section 2 discusses previous works.
Our methodology is explained in section 3. Experiments and results with related information
are presented in section 4. Finally, section 5 concludes the paper.
Fig. 1. The DEX file structure.
2. Related Work
Many methods detect Android malware by converting application packages into images
using a convolutional neural network (CNN) [1, 3, 5, 12-18]. These methods can be
organized based on how they analyze and engineer data. Static and dynamic analysis
approaches using generic, machine learning, or deep learning methods are the most
known approaches.
2.1 Machine Learning and Deep Learning
Some methods that engineer features using dynamic analysis are TaintDroid [9], DroidRanger [10], and DroidScope [11]. TaintDroid provides real-time analysis by leveraging Android’s virtualized execution
environment to detect malicious behavior of third-party Android applications. DroidScope
utilizes virtualization-based malware analysis to reconstruct both OS-level and Java-level
semantics. A few related works use a more static route and focus on generating colored
images from the bytecode of the whole DEX file. Gamut converts DEX files into images
with a user-controlled level of semantics [3]. R2-d2 decompresses Android application packages to retrieve classes.dex (DEX file)
and map the malware application bytes to an RGB color image using a pre-defined rule
[1].
Malware detection approaches that do not utilize the advantages of machine learning
or deep learning pattern recognition have a noticeable caveat. For example, even though
dynamic analysis is effective at identifying malicious activities at runtime, there
is a matter of overhead that arises. A static analysis method that does not utilize
machine learning or deep learning works well but can easily be dodged by malware developers
who can trick the disassemblers into producing incorrect code. Malware detection based
on machine learning has been introduced to mitigate these limitations, and our research
expands on these approaches by focusing on a sub-section of the DEX file called the
code item and utilizing deep learning. Machine learning has several advantages:
· It can handle malware variants
· It can detect unknown or packed malware
· It does not require an Android emulator environment
· It can achieve high code coverage.
Similar research to our study using the data section of the DEX file achieved
a reduction of storage capacity by 17.5\% on average. Our research results show a
lower performance overhead. Our goal was to determine and utilize sections that have
the best representation of the APK file. This work shows that using only the code
item section offers a greater reduction in memory while maintaining acceptable generalization
performance.
3. Methods
Our method converts APK files to grayscale images and then trains a deep learning
model for classification using the generated images. Firstly, APK apps were processed
using Androguard [2] to analyze an APK file and gain access to the Dalvik executable (DEX/ classes.dex)
files. The exact classes used for obtaining the code item byte data are APK and DalvikVMFormat.
In order to obtain an equivalent of the code item bytes, the original open-source
Androguard source code was modified. This was done because the code item bytes by
default cannot be directly extracted, so the API built-in functionality relating to
parsing the Dalvik object bytes was modified. In the end, the hexadecimal string representation
of the code item bytes was obtained. The bytecode in classes.dex is represented as
hexadecimal.
In the image creation stage, a 2D grayscale image was generated from a parsed
hexadecimal string representation of the code item binaries. This hexadecimal string
was in the form of a byte array. A byte array is a mutable sequence with elements
in the range of 0 ${\leq}$ x {\textless} 255. The resulting vector is a one-dimensional
array of bytes. We needed an algorithm to create a two-dimensional image from this
one-dimensional vector of bytes. The work by Jordy Gennissen [3] was very helpful and contained information on continuous fractal space-filling curve
algorithms. The first algorithm is called linear plotting, which plots a one-dimension
array of elements linearly while jumping to a new line based on a predetermined width
value.
The other algorithm is a Hilbert curve technique that creates a space-filling
curve from a one-dimension vector by visiting every point in a square grid with a
size of any power of two (2${\times}$2, 4${\times}$4, etc.). The resulting figure
is a square image. After testing, the results yielded little difference in performance
between the two techniques (about 1% increase in accuracy). We decided to use linear
plotting for our conversion algorithm.
The generated images had diverse resolutions based on the size of their bytecodes.
Therefore, after converting all the samples into images, we resized their resolution
to a fixed size. More information concerning our decision is given in section 3.1.
The last stage is the classification stage, which involves a CNN architecture.
We tested many popular CNN architectures on our dataset, including InceptionV3 [7], ResNet50, ResNet101, DenseNet121, NASNet, and InveptionResNetV2 [4]. In the end, we ended up using the InceptionResNetV2 architecture model, which showed
the best result.
3.1 Image Resolution and Experiment Environment
Finding the input resolution of our targeted CNN model required various image
resolutions to be tested. Resolutions of 100${\times}$100, 128${\times}$128, 150${\times}$150,
and 256${\times}$256 were tested. Our goal was to determine the resolution that has
the least information loss after resizing. But in the end, we realized that there
was little change in the performance of the models. Therefore, for our experiment,
we chose 100x100.
Another reason for choosing 100${\times}$100 was based on our experiment environment.
Table 1 shows the hardware and software libraries used in our experiments. The GPU of our
system had limited VRAM, so using an input size of 100${\times}$100 helps us reduce
the memory used for training at the cost of performance. The initial model training
was conducted on a system with an AMD Ryzen 7 2700X 8-core processor, 32 gigabytes
of DDR4 RAM, and a 3.6 terabytes of storage to hold our dataset in a comma-separated
value (CSV) file format.
Table 1. Experiment parameters.
Label
|
Information
|
CPU
|
AMD Ryzen 7 2700X Eight-Core Processor
|
Memory
|
32 GB
|
GPU
|
NVIDIA GeForce GTX 1080 Ti
(11 GB vRAM)
|
HDD
|
3.6 TB
|
Library
|
Tensorflow 2.x, Matplotlib, Numpy, etc.
|
CUDA
|
v10.0
|
cuDNN
|
V7.6.5
|
3.2 The Architecture of Our Methodology
Fig. 2 shows our technique. There are three main stages: APK File processing, image creation,
and classification.
· Stage 1: AndroGuard is used to reverse engineer the APK files and retrieve
the classes.dex information as bytecode.
· Stage 2: Using linear plotting, grayscale images are generated using the bytecode.
· Stage 3: InceptionResNet2 was trained on the acquired datasets.
4. Performance Evaluation
4.1 Dataset
We made use of two types of APK files: malicious and benign sets. The APK file
sources were from Google Play, Amazon, APKpure, AMD, and Drebin [6]. Samples were divided into 10,000 malware APKs and 10,000 benign APKs. The benign
APKs were obtained from Google Play, Amazon, and APKpure, while the malware APKs were
from AMD and Drebin. The samples were cleaned, and a balanced distribution of malware
and benign datasets was created. Corrupted APK files and damaged files were discarded.
In the end, only 20,000 samples were used, which were separated into 18,000 samples
for training, 1,000 samples for validation, and 1,000 samples for testing.
4.2 Images
After reverse-engineering the APK file and retrieving the code item binary in
a one-dimensional vector (an array containing parsed bytes), we had to convert the
1D vector into two dimensions to form a grayscale image. This can be done using various
plotting algorithms, as explained in the methods section. Below are some of the generated
images using the linear plotting algorithm. Fig. 4 shows examples of the samples after they have been resized to a fixed 2D resolution.
The generated images cannot be distinguished with the naked eye, which is why we used
a deep learning classification model.
Fig. 4. Generated images.
4.3 Model Performance
Table 2-4 show the results from the evaluations. Images were generated from 20,000 APK files,
and the resolution depended on the size of the code item binary. Afterward, all the
images were resized to a fixed resolution, which corresponded to the dimensions of
the input layer of the CNN (100${\times}$100). Using the 20,000 generated images (10,000
malware and 10,000 benign), an InceptionResNetv2 CNN model was trained using a stochastic
gradient descent (SGD) optimizer with a vanilla hyperparameter setup (with a learning
rate of 0.01, 10 epochs, and batch size of 100, etc.).
Table 2. Experiment results.
Evaluation
|
DEX Image (100x100)
|
Code Item Image (100x100)
|
Training accuracy
|
98%
|
98%
|
Validation accuracy
|
94%
|
89%
|
Test accuracy
|
94%
|
90%
|
F1 score
|
0.94
|
0.90
|
Table 3. DEX file confusion matrix.
Actual Class
|
Predicted Class
|
|
Positive
|
Negative
|
Positive
|
469
|
31
|
Negative
|
12
|
488
|
Table 4. Code item confusion matrix.
Actual Class
|
Predicted Class
|
|
Positive
|
Negative
|
Positive
|
464
|
36
|
Negative
|
44
|
456
|
4.4 Memory Comparison
This study compared 2,000 APKs to calculate the average, minimum, and maximum
size of the code item section of the whole DEX file. Out of the 2,000 APKs, 1,000
APKs were malware, and the other 1,000 were benign. The experiments led to the observations
in Tables 5 and 6.
The size comparison tables show that the code item sections occupy approximately
44.6% of the DEX files. This implies that memory usage can be reduced by 55.4% when
using only the code item section for Android malware detection.
Table 5. Size ratios.
Dataset
|
Min.
|
Max.
|
Avg.
|
Benign
|
3.4%
|
48.8%
|
45.56%
|
Malware
|
15.87%
|
47.26%
|
43.68%
|
Table 6. Size comparison.
Dataset
|
Min.
|
Max.
|
Avg.
|
Size of benign DEX (es)
|
6.19 kb
|
10120.7 kb
|
3320.7 kb
|
Size of malicious DEX (es)
|
2.7 kb
|
6098.5 kb
|
1418.9 kb
|
Size of code item in benign DEX (es)
|
0.21 kb
|
4898.5 kb
|
1513.0 kb
|
Size of code item in malicious DEX (es)
|
0.429 kb
|
2882.4 kb
|
619.8 kb
|
4.5 Execution Time of Conversion
In an experiment, a single sample with a size of 1.6 GB was selected from the
dataset as our target. Selecting a single APK file served as a good representative
for determining the execution time of our image generation approach. Algorithms for
converting the code item and DEX binaries (bytes) to images were examined. The time
it takes each algorithm (code item-image and DEX-image) to complete the image generation
process was our target. The results show that the code item conversion algorithm took
about 1.92 seconds, while the DEX file binary conversion algorithm took 2.27 seconds.
This means that the code item conversion algorithm was about 15% faster.
5. Conclusion
This research adopted deep learning to construct an Android malware detection
technique that involves converting Android APK binaries into images for classification.
Our experiment results indicated that faster overall execution time can be achieved
when generating images using the code item section in comparison to the whole DEX
file or the data section. The overall execution time (time complexity) of the image
generation was shown to decrease by 15% compared to other methods. In future work,
higher performance will be our objective. The reduction in byte size when using the
code item section leaves room for creating a hybrid system that combines the code
item with other representative features while still having lower data size.
REFERENCES
Huang T.H, Kao H.Y, Dec 10, 2018, R2-D2: ColoR-inspired Convolutional NeuRalNetwork
(CNN)-based AndroiD Malware Detections., IEEE BigData 2018, pp. 2633-2642
Anthony Desnos , 2019, Androguard Documentation, Release 3.4.0
Gennissen J., Blasco J., , Gamut: Sifting through Images to Detect Android Malware.,
June-25-2017
Szegedy C., Ioffe S., Vanhoucke V., Alemi. A.A., 2017 Feb 12, Inception-v4, inception-resnet
and the impact of residual connections on learning., In Thirty-First AAAI Conference
on Artificial Intelligence
Nataraj L., Karthikeyan S., Jacob G., 2011, Malware images: visualization and automatic
classification., In Proceedings of the 8th international symposium on visualization
for cyber security, p. 4. ACM
Arp D., Spreitzenbarth M., Hubner M., Gascon H., Rieck K., Siemens. C.E., 2014 Feb
23, Drebin: Effective and explainable detection of android malware in your pocket.,
In Ndss, Vol. 14, pp. 23-26
Szegedy C., Vanhoucke V., Ioffe S., Shlens J., Wojna. Z., , Rethinking the inception
architecture for computer vision., In Proceedings of the IEEE conference on computer
vision and pattern recognition 2016, pp. 2818-2826
2020 , McAfee Mobile Threat Report Q1
Enck W., Gilbert P., gon Chun B., Cox L. P., Jung J., McDaniel P., Sheth. A., 2010,
Taintdroid: An information-flow tracking system for realtime privacy monitoring on
smartphones., In Proc. of USENIX Symposium on Operating Systems Design and Implementation
(OSDI), pp. pages 393-407
Zhou Y., Wang Z., Zhou W., Jiang. X., 2012, Hey, you, get off of my market: Detecting
malicious apps in official and alternative android markets., In Proc. Of Network and
Distributed System Security Symposium (NDSS)
Yan L.-K., Yin. H., 2012, Droidscope: Seamlessly reconstructing os and Dalvik semantic
views for dynamic android malware analysis., In Proc. of USENIX Security Symposium
Vidas T., Christin N., June 2014, Evading Android Runtime Analysis via Sandbox Detection,
in Proceedings of the 9th ACM symposium on Information, computer and communications
security (ASIA CCS ’14), Kyoto, Japan
Yang C., Xu Z., Gu G., Yegneswaran V., Porras P., September 2014, DroidMiner: Automated
Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications.,
in Proceedings of the 19th European Symposium on Research in Computer Security(ESORICS’14),
Wroclaw, Poland
William Hardy , Lingwei Chen , Shifu Hou , Yanfang Ye , 2016, DL4MD: A Deep Learning
Framework for Intelligent Malware Detection, International Conference on Data Mining
(DMIN)
Krizhevsky A., Sutskever I., Hinton G. E., 2012, ImageNet Classification with Deep
Convolutional Neural Networks, in Advances in Neural Information Processing Systems
25 (NIPS 2012) , Harrahs and Harveys, Lake Tahoe, pp. 1097-1105
Simonyan A. Z. K., 2015, Very Deep Convolutional Networks for LargeScale Image Recognition,
in International Conference on Learning Representations 2015 (ICLR2015), San Diego,
CA
Saxe J., Berlin K., 2015, Deep neural network based malware detection using two dimensional
binary program features, 2015 10th International Conference on Malicious and Unwanted
Software (MALWARE), Fajardo
Yuan Z., Lu Y., Wang Z., Xue Y., 2014, Droid-Sec: deep learning in android malware
detection, in Proceedings of the 2014 ACM conference on SIGCOMM, Chicago, Illinois,
USA
Author
Seung-Pil W. Coleman received his B.S degree in Computer Engineering and Electronics
from Sun Moon University, Korea, in 2018. He is currently a graduate student at Sun
Moon University, Korea.
Young-Sup Hwang received the PH.D. degree from the Department of Computer Science
and Engineering, POSTECH, Korea, in 1997. He is currently a Professor in the Division
of Computer Science and Engineering, Sun Moon University, Korea. His research interests
include pattern recognition, machine learning and neural networks.