ChughGarvit1
BhatiNitesh Singh2
KumarPuneet3
BhartiVishal3
-
(Department of Computer Science, Indian Institute of Technology, Jodhpur chugh.2@iitj.ac.in)
-
(Department of Computer Science, Delhi Technical Campus, Greater Noida (GGSIPU) niteshbhati07@gmail.com
)
-
(Department of Computer Science, Chandigarh University, Mohali, India professor.pkumar@gmail.com,
mevishalbharti@yahoo.com
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
MapReduce, Clustering, K means, K harmonic, Parallel execution
1. Introduction
The industrial world in 2022 has accumulated billions of raw data points related to
their policies, implementations, sales, and different product-related information.
On the other hand, not all the data are useful. Some are irrelevant to the company
and hence a burden/wastage of storage. Understanding this raw data to verify the relevancy,
but the size of this raw data is huge. In addition, it is difficult for a human to
describe and make meaning of such huge data sizes. Hence various techniques have been
introduced that perform this task on behalf of humans. Such techniques follow the
methodology of clustering, which is an unsupervised technique of forming groups of
similar data to satisfy the objective of understanding the underlying structure in
the available raw data. This technique has evolved and is used in various tools to
make statistical analyses based on the output provided by clustering.
With the help of clustering, the expectation is that similar examples/data points
in the dataset should be part of the same cluster. Similarly, the dissimilar examples
are part of different clusters. Thus, the dataset is broken down into smaller datasets,
making mathematical operations easier [1].
An exact solution using clustering is an NP-hard problem because it cannot be done
in polynomial time, even with only two clusters [2]. Hence, a more elegant solution for analysis was proposed based on parallelism, i.e.,
parallel execution of data, which makes faster computation and efficient distribution
of the tasks. This solution is based on MapReduce. MapReduce is a programming model
for preparing and producing huge informational indices [3]. The MapReduce calculation can handle a large amount of raw data by producing meaningful
clusters. The calculation is simple, flexible, fault-tolerant, and highly scalable.
Hence, MapReduce is used in big data clustering [4].
The users have the capability of specifying the map. This map consists of the values
of a key/value pair. This map is used to generate many key/value pairs, which are
known as a set of intermediate pairs. On the other hand, there is a Reduce function,
which works on this map and merges every intermediate key/pair value with the same
intermediate key. This phenomenon is known as MapReduce [5]. Thus, parallelism only is insufficient, and there is a need to manage the exponential
job creation time and the time required to perform big data shuffling. These time
management issues can be managed by removing the many iterations on which clustering
algorithms depend. For that purpose, different clustering algorithms have been used
in different studies, based on different starting points and criteria leading to different
outcomes and taxonomies of clustering algorithms. Some include K-Means, K-Harmonic
Means, Fuzzy C-Means, and Hierarchical. Table 5 lists the advantages and disadvantages, and other usages for the aforementioned algorithms.
1.1 MapReduce
MapReduce is simple, flexible, fault-tolerant, and highly scalable. Hence, MapReduce
is used in big data clustering. Model applications incorporate ordering records, breaking
down Web access logs, and AI. The most widely used implementation of the MapReduce
framework is Hadoop, a part of the Open Source community [6]. Fig. 1 presents the system as a file system, DFS, or Distributed file system, to form partitions
in multiple machines. The partition is produced in an initial phase.
Fig. 1. MapReduce Functionality.
The word Count problem can be taken as the best example to explain the functionality
of MapReduce. The word count problem dictates that different words can be taken as
the input, and each type will be counted. The mapper class, in this case, will be
used to tokenize the strings. The tokenization can be used further to sort the words.
The sorting can also be numbered, making it a key-value pair. The reducer class will
take this sorted list as the input and convert it into a list with the keys and the
count of the keys. This example can be understood visually using Fig. 2.
Fig. 2. MapReduce Example.
1.2 Clustering Algorithms
Clustering is an unsupervised technique of clustering the data or forming groups of
similar data to satisfy the objective of understanding the underlying structure in
the available raw data. This technique has evolved and is being used as a tool by
analysts for statistical analyses based on the output provided by clustering.
With the help of clustering, the expectation is that similar examples/data points
in the dataset should be part of the same cluster. Similarly, the dissimilar examples
are part of different clusters, as shown in Fig. 3. Thus, the dataset is broken down into smaller datasets, on which performing mathematical
operations is easier. The different parallel clustering algorithms that are compatible
with MapReduce, are discussed further and are mentioned below in Fig. 4.
Fig. 3. Clustering Procedure.
Fig. 4. Clustering Algorithms compatible for Parallelism.
1.2.1 K-means
The K-Means algorithm is a simple, computationally fast, and storage-efficient algorithm.
The algorithm is very popular in terms of partitioning algorithms. It is straightforward,
as it begins with initializing ‘k’ clusters centers, which are made randomly by sampling.
After initialization, the similarity index is calculated, and similar data points
are clustered together. The similarity index, in this case, is the distance. The data
point nearest to a specific center is considered the most similar to that cluster
generalization; hence, it is added to that cluster. If some data points are left un-clustered,
new cluster centers are formed with the same remaining data points, and a loop is
produced. This process continues until no centroid changes its location or no new
centroid is produced. Owing to its simplicity, it cannot be used with highly complex
datasets and is actually an inferior performer when the shapes of the dataset become
larger, out of control, or unknown [7].
The algorithm of K-means can be found in Table 1.
Table 1. K-means Algorithm.
Step #
|
Steps
|
1.
|
Classify: Assign observations to the closest cluster center
$Z_{i}\leftarrow _{j}^{argmin}{\left| \left| u_{j}-x_{i}\right| \right| }{_{2}^{2}}$
|
2.
|
Map: For each data point, given ({${{\mu}}$j}, xi), emit(zi, xi)
|
3.
|
Reduce: Average over all points in cluster
j (zi= k)
$u_{j}=\frac{1}{n_{j}}\sum _{i=1}^{k}x_{i}$
|
1.2.2 K-harmonic Means
The K-Harmonic Means algorithm is similar to the K-Means algorithm because it is also
a center-based algorithm. On the other hand, the approach is different because it
calculates harmonic averages of the distances between the data points and the centers,
which increases the clustering quality. This converges faster than K-Means, but this
algorithm requires a considerable number of iterations to reach convergence [7].
The algorithm of K Harmonic means can be found in Table 2.
Table 2. K Harmonic means Algorithm.
Step #
|
Steps
|
1.
|
KHM starts with random centers.
|
2.
|
Distance between each data point is calculated.
|
3.
|
New cluster centers are calculated.
|
4.
|
Calculate distance by
$\varnothing \left(d_{j},C_{l}\right)=\left| \left| d_{j}-C_{l}\right| \right| $;$1\leq
i\leq n,1\leq l\leq k$
Distance between data points and cluster centers is calculated by $\alpha _{i}$
$\alpha _{i}=\frac{1}{\left(\sum _{i=1}^{k}\frac{1}{\varnothing \left(d_{j},C_{l}\right)^{\left(a\right)}}\right)^{2}}$
|
1.2.3 Fuzzy C-means
The Fuzzy C Means algorithm is the lenient version of its K-Means counterpart. As
the traditional clustering techniques require different data points to be mutually
exclusive when clustered, known as the hard way, Fuzzy C Means allows data points
to be added to more than one cluster simultaneously.
This technique is more natural, resembling a real-world scenario because the data
points clinging to the boundaries cannot be attached to any one cluster. Instead,
it uses a combination to represent the partial membership to the clusters. The objective
is to classify the data set into ‘c’ clusters, assuming c is known beforehand. The
condition for fuzzy partition is presented in Eq. (1) [8].
The algorithm of Fuzzy C means can be found in Table 3 below.
Table 3. Fuzzy C means Algorithm.
Step #
|
Steps
|
1.
|
Let $X=x_{1},x_{2},x_{3},x_{4}\ldots x_{n}$ be the set of data points and $V=v_{1},v_{2},v_{3},v_{4}\ldots
v_{n}$ be the set of centers.
|
2.
|
Randomly select ‘c’ clusters
|
3.
|
Calculate fuzzy membership as
$u_{ij}=1/\sum _{k=1}^{c}\left(d_{ij}/d_{ik}\right)^{\left(\frac{2}{m}-1\right)}$
|
4.
|
Compute the fuzzy centers using
$v_{j}=\sum _{j=1}^{c}\left(u_{ij}\right)^{\left(m\right)}x_{i}/\sum _{j=1}^{c}\left(u_{ij}\right)^{\left(m\right)};1\leq
i\leq m$
|
5.
|
Repeat steps 3 and 4 until the minimum J value is achieved.
|
1.2.4 Hierarchical Clustering
The Hierarchical Clustering algorithm is a part of the Fuzzy C Means clustering algorithm.
It generates a hierarchy of partitions by agglomerative and divisive methods. The
agglomerative method produces a cluster sequence by producing a merged cluster derived
from the two higher hierarchy clusters. The divisive algorithm works the other way
around [8]. Such a method of clustering is also known as a dendrogram, which is a straightforward,
progressive clustering technique. This method is highly scalable, but the time complexity
is very high as the concept of tree building is introduced. The algorithm of Hierarchical
Clustering can be found in Table 4.
Table 4. Hierarchical Clustering Algorithm.
Step #
|
Steps
|
1
|
Data point set, $X=x_{1},ax_{2},ax_{3},ax_{4}\ldots ax_{n}$
|
2
|
Initializing disjoint clustering by level L(0) = 0 and
m = 0 .
|
3
|
Least distance pair is to be found.
d[(r),(s)] = min d[(i),(j)]
|
4
|
m = m + 1.
L(m) = d[(r),(s)].
|
5
|
d[(k), (r, s)] = min(d[(k),(r)],d[(k),(s)]).
|
6
|
If one cluster left
stop,
else
repeat from step 3.
|
1.2.5 The Present Contributions
The different clustering algorithms have been used in different studies, based on
different starting points and criteria leading to different outcomes and taxonomies
of clustering algorithms. On the other hand, their open challenges, advantages, and
usefulness in parallel behavior are yet to be discussed. There have been various implementations
of the algorithms mentioned in this study in a parallel manner. This study proposes
the parallel behavior of these algorithms with MapReduce in parallel mode.
The remainder of this paper is arranged as follows: Section 2 envelopes the Related
work, including recent advancements in the field. Section 3 formulates a comparison
in which analysis of the various clustering-based MapReduce algorithms has been described.
Finally, the Conclusion is described in Section 4.
2. Related Work
2.1 K-means
Lv et al. [9] implemented an experiment based on a sequential K-Means algorithm using C++, and
another experiment based on a parallel K-Means algorithm using Hadoop running on Java.
They aimed to analyze a large number of remote sensing images, which reached its limit
when there were limitations in hardware resources and the tolerance of time-consuming
produces a bottleneck in processing a large amount of RS images. This was overcome
by using parallel execution-based algorithms.
As mentioned before, K-Means fail to work efficiently with large datasets and with
data set formation with unknown shapes, Li et al. [10] proposed an improved K-Means based on an ensemble learning method of bagging. Their
experiment helped overcome the inefficiency issue and sensitiveness to the outliers.
The results of their experiments have shown that their approach is acceptable for
a scalable model.
Shahravari and Jalili [11] performed an experiment using the mrk-means algorithm, which helps overcome the issue
of the iterative nature of the traditional K-Means implementation of MapReduce. The
I/O overhead increases tremendously when MapReduce is implemented with K-Means during
each iteration, the whole data set is read, and this operation is repeated with every
iteration. With mrk means, this process is reduced to a single pass solution which
uses the reclustering technique, i.e., the dataset is only read once. Hence, it is
very much faster than the traditional setup of MapReduce.
2.2 K-harmonic Means
Zhang et al. [12] proposed an experiment in which they compared the traditional K-means algorithm with
the Expectation Minimization (EM) algorithm and the K-Harmonic Means algorithm to
check which algorithm brings out the output in a better way in these three algorithms.
According to their findings, the traditional K-Means and EM share the same flaw: the
dependency on the initialization of the centers and their sensitivity towards the
same. The K Harmonic Means algorithm overcame this issue by improving the clustering
quality and providing a faster and better implementation with MapReduce.
Bagde and Tripathi [13] proposed a hybrid combination of the traditional K-Means and the K harmonic means
algorithm to overcome the sensitivity towards initial points and local optimization
and the over iterations because the algorithm runs for k times when not necessary.
K-Harmonic Means is highly scalable and insensitive toward data points. Their experiment
provided acceptable results.
Guo and Peng [14] proposed a better hybrid combination, including the K Harmonic means (KHM) algorithm,
combined with dimensionality reduction. This combination increased the clustering
results with less computational time and more efficient iterations. This experiment
was proposed because of the incompatibility of the clustering algorithms with high-dimensional
data. KHM is already suitable for large data sets but lacks higher dimensions. Hence,
the most natural strategy to solve this problem is being used, which is dimensionality
reduction. They used Principle Component Analysis for the basis of dimensionality
reduction.
2.3 Fuzzy C-means
All the above-mentioned algorithms were part of hard clustering, i.e., the data points
are strictly part of only one cluster, but that was not the natural case. In the real
world, the data points on the cluster boundary cannot strictly be considered a part
of that cluster. Hence, a soft clustering technique was required, which allows one
data point to be a part of more than one cluster at a time. Ludwig [15] presented such an experiment using the fuzzy c means algorithm on MapReduce and investigated
the scalability and accuracy in terms of parallel execution in such a soft clustering
scenario. The results obtained show that fuzzy c means is more accurate and realistic,
but it is unable to scale well.
Dai et al. [16] presented an experiment consisting of a canopy-clustering concept, which quickly
analyzes the dataset provided to solve the scalability issue of fuzzy c means (FCM).
With the help of the rapid acquisition property of the canopy-clustering algorithm
[17], the convergence rate of the FCM has accelerated. The results based on their experiment
have shown that this combination provides better clustering quality and higher operation
speed.
2.4 Hierarchical Clustering
Similar to Lv et al. [9], Li et al. [18] proposed a highly scalable version of fuzzy c means for their research on underwater
image segmentation. Because they had to deal with the rapid increase in data, they
required a parallel execution of the tasks, for which MapReduce was convenient. Their
experiment consisted of a two-layer distribution model to distribute data transfer
and clean the bandwidth. Their experiment yielded results with better efficiency and
worked well with high scalability.
MapReduce was further improved for data mining, which required faster computations
and more memory usage. On the other hand, the current clustering techniques were not
efficient in terms of the mentioned features. For that purpose, Sun et al. [19] proposed a hierarchical clustering method with batch updating and co-occurrence-based
feature selection. These methods perform so that the computational time is reduced
and noisy features are eliminated. Their experiment showed that the I/O overhead was
reduced along with communication overhead, which reduced the total execution time
to 1/15 of the previous one.
Based on the above work, Gao et al. [20] combined the hierarchical clustering algorithm with the neuron initialization method
of the vector pressing self-organizing model. Their experiment was based on dividing
the large text database into various data blocks and then distributing these blocks
in a manner that they are executed/processed in a parallel fashion. Their experiment
yielded higher efficiency and better performance in terms of text clustering and mining.
3. Analysis
3.1 Open Challenges
3.1.1 K-means
There have been recent publications about extending the k-Means algorithm. The extended
version would include some extra background information. The background information
can be of two types, must-link and cannot-link constraints, both at the instance level.
In addition, there are numerous advancement speculations for the same; some include
specifying additional background information in the form of constraints, which are
very straightforward to implement. On the other hand, finding the feasible solutions
for all the constraints necessary for a unique solution for NP-Complete provides results
for the feasibility of clustering techniques under each type of constraint individually.
The results also mention many types combined for the NP-Complete solution of clustering.
The requirement is an iterative algorithm that minimizes the restricted vector quantization
error without attempting to meet all constraints at each iteration.
3.1.2 K-harmonic Means
The initialization issue can be resolved using K-Harmonic, up to a certain level.
On the other hand, the issue of ``Local minima trapping'' still exists. Several studies
have been conducted to find solutions for the same, but the fundamental concept underlying
proposed algorithms remains unsolved. Future studies can apply heuristics to produce
non-local moves, which can be used for the cluster centers. These techniques can also
choose the optimal hyper parameters for the optimal solution.
3.1.3 Fuzzy C-means
Several studies have pointed out some disadvantages associated with FCM: (1) Only
point-based membership; (2) Loss of information; (3) Slower Convergence. The use of
FCM is very efficient because it can blend in very well with other algorithms and
produce numerous other approaches. This integration of techniques and interdisciplinary
study may yield novel insights for FCM issue solutions.
3.1.4 Hierarchical Clustering
With hierarchical clustering, it is required that the distance metric, as well as
the linking criterion, are mentioned specifically. Rarely is there a solid theoretical
foundation for such decisions. Nevertheless, the applicability of the technique to
modern research is questionable.
Identifying how to calculate a distance matrix when there are numerous data kinds
is difficult. There is no simple method for calculating distance when the variables
are both quantitative and quantitative. How would one calculate the distance between
a 45-year-old man, a 10-year-old girl, and a 46-year-old woman, for example? There
are formulae, but they involve arbitrary decisions.
3.2 Findings
This literature review examines the applications of Clustering in various fields with
the interest of MapReduce, and experience in the same. In a large portion of the new
research work done in the area, factors, such as precision, effectiveness, and other
different properties are calculated on different datasets. Based on these arguments,
specific published papers were included in this review paper. Table 5 presents a detailed summary of all of them in a tabular manner for the reader's sake.
Table 5. Summary of Related Work.
#
|
Author(s)
|
Technique
|
Application Area
|
1
|
Lv et al. [9]
|
K-Means
|
Analysis
|
2
|
Li et al. [10]
|
K-Means
|
Analysis
|
3
|
Shahravari & Jalili [11]
|
K-Means
|
Data Control
|
4
|
Zhang et al. [12]
|
K Harmonic Means
|
Network
|
5
|
Bagde & Tripathi [13]
|
K Harmonic Means
|
Comparison while analysis
|
6
|
Guo and Peng [14]
|
K Harmonic Means
|
Higher dimensions
|
7
|
Ludwig [15]
|
fuzzy c means
|
Analysis
|
8
|
Dai et al. [16]
|
fuzzy c means
|
Analysis
|
9
|
Li et al. [17]
|
fuzzy c means
|
Scalability of FCM
|
10
|
Sun et al. [18]
|
hierarchical clustering
|
Data mining
|
11
|
Gao et al. [19]
|
hierarchical clustering
|
Data mining
|
None of the researchers used typical dataset collection to analyze the outcomes, which
makes it difficult to compare the results of various algorithms mentioned before.
Sometimes, the requirement is scalability, so other disadvantages of the hierarchical
clustering are overlooked and preferred over other variations of the K Means. In those
cases, where a more realistic outlook is required and scalability is not an issue,
Fuzzy C Means is preferred, and its higher accuracy makes it more desirable. The problems
associated with hierarchical cluster analysis are addressed using more contemporary
techniques, such as latent class analysis.
Table 6 compares all algorithms based on the Principle Used, Similarity Function, Advantage,
Disadvantage, and Time Complexity.
Table 6. Algorithms Comparison.
Parameters
|
K-Means
|
K-Harmonic
Means
|
Fuzzy
C
Means
|
Hierarchal Clustering
|
Principle Used
|
Iterative Hill Climbing
|
Harmonic Averaging
|
Optimization of object function
|
Cluster tree building
|
Similarity Function
|
Euclidean
|
Squared Euclidean
|
Euclidean
|
Euclidean
|
Advantage
|
Fast, Simple, Scalable
|
Robust
|
More accurate and realistic
|
High scalability
|
Dis - advantage
|
Sensitive to outliers
|
Trapped by convergence to a local optimum
|
Low scalability
|
High time complexity
|
Time Complexity
|
O(nKd)
|
O(nKd)
|
O(n)
|
O(n$^{3}$)
|
4. Conclusion
The current trends in the digital market have caused a boost in the size of digital
information. In addition, the need to deal with the insightful analysis for the colossal
measure of raw digital data is expanding to keep only the relevant data. Relevancy
can be decided by clustering, and MapReduce can be used to process this huge data
efficiently because of the parallel distribution technique. On the other hand, the
use of clustering in MapReduce is based on different techniques because of technological
advancements. In this review paper, these techniques have been studied. Based on an
examination of different K-Means variations, i.e., K-Harmonic Means, Fuzzy C-Means,
and Hierarchical, it can be concluded that the traditional clustering algorithm is
insufficient for data clustering. The traditional clustering algorithm is also insufficient
in the integrated approach, where amalgamation with multiple other algorithms is required.
More real clusters and reduced execution time is expected from hybrid models. The
crossover of these clustering algorithms used in MapReduce can provide more exact
outcomes because it can deal with considerable unstructured dispersed information
in a parallel fashion. Future researchers may look at the hybrid approach, bringing
out better results than the traditional counterparts.
REFERENCES
Ajin V. W., Kumar L. D., 2016, May, Big data and clustering algorithms., In 2016 International
Conference on Research Advances in Integrated Navigation Systems (RAINS) (pp. 1-5).
IEEE.
Drineas P., Frieze A., Kannan R., et al., 2004, Clustering large graphs via the singular
value decomposition[J], Mach Learn, Vol. 56, No. 1-3, pp. 9-33
Dean J., Ghemawat S., 2008, MapReduce: simplified data processing on large clusters[J],
Commun ACM, Vol. 51, No. 1, pp. 107-113
Ekanayake J., Pallickara S., Fox G., 2008, Map reduce for data intensive scientific
analyses [C], eScience, eScience’08. IEEE fourth international conference on. IEEE
2008, pp. 277-284
Vattani A., 2011, K-means requires exponentially many iterations even in the plane
[J], Discret Comput Geom, Vol. 45, No. 4, pp. 596-616
Vernica R., Carey M. J., Li C., 2010, Efficient parallel set-similarity joins using
mapreduce, in SIGMOD
Kurasova O., Marcinkevicius V., Medvedev V., Rapecka A., Stefanovic P., 2014, November,
Strategies for big data clustering, In 2014 IEEE 26th international conference on
tools with artificial intelligence, pp. 740-747
Ludwig S. A., 2015, MapReduce-based fuzzy c-means clustering algorithm: implementation
and scalability, International journal of machine learning and cybernetics, Vol. 6,
No. 6, pp. 923-934
Lv Z., Hu Y., Zhong H., Wu J., Li B., Zhao H., 2010, October, Parallel k-means clustering
of remote sensing images based on mapreduce, In International Conference on Web Information
Systems and Mining (pp. 162-170). Springer, Berlin, Heidelberg.
Li H. G., Wu G. Q., Hu X. G., Zhang J., Li L., Wu X., 2011 January, K-means clustering
with bagging and mapreduce, In 2011 44th Hawaii International Conference on System
Sciences (pp. 1-8). IEEE.
Shahrivari S., Jalili S., 2016, Single-pass and linear-time k-means clustering based
on MapReduce, Information Systems, Vol. 60, pp. 1-12
Zhang B., Hsu M., Dayal U., 1999, K-harmonic means-a data clustering algorithm, Hewlett-Packard
Labs Technical Report HPL-1999-124, 55
Bagde U., Tripathi P., 2018 February, An analytic survey on mapreduce based k-means
and its hybrid clustering algorithms, In 2018 second international conference on computing
methodologies and communication (iccmc) (pp. 32-36). IEEE.
Guo C., Peng L., 2008 October, A hybrid clustering algorithm based on dimensional
reduction and k-harmonic means, In 2008 4th International Conference on Wireless Communications,
Networking and Mobile Computing (pp. 1-4). IEEE.
Ludwig S. A., 2015, MapReduce-based fuzzy c-means clustering algorithm: implementation
and scalability, International journal of machine learning and cybernetics, Vol. 6,
No. 6, pp. 923-934
Dai W., Yu C., Jiang Z., 2016, An improved hybrid Canopy-Fuzzy C-means clustering
algorithm based on MapReduce model, Journal of Computing Science and Engineering,
Vol. 10, No. 1, pp. 1-8
Kumar A., Ingle Y. S., Pande A., Dhule P., 2014, Canopy clustering: a review on pre-clustering
approach to K-means clustering, Int. J. Innov. Adv. Comput. Sci.(IJIACS), Vol. 3,
No. 5, pp. 22-29
Li X., Song J., Zhang F., Ouyang X., Khan S. U., 2016, MapReduce-based fast fuzzy
c-means algorithm for large-scale underwater image segmentation, Future Generation
Computer Systems, Vol. 65, pp. 90-101
Sun T., Shu C., Li F., Yu H., Ma L., Fang Y., 2009, December, An efficient hierarchical
clustering method for large datasets with map-reduce, In 2009 International conference
on parallel and distributed computing, applications and technologies (pp. 494-499).
IEEE.
Gao H., Jiang J., She L., Fu Y., 2010, A new agglomerative hierarchical clustering
algorithm implementation based on the map reduce framework, International Journal
of Digital Content Technology and its Applications, Vol. 4, No. 3, pp. 95-100
Author
Garvit Chugh received the bachelor's degree in computer engineering from Guru Gobind
Singh Indraprastha University in 2020, the master's degree in computer engineering
from the Indian Institute of Technology, Jodhpur in 2022, and is currently a scholar
for his philosophy of doctorate degree in Mobile and Pervasive Computing in Computer
Engineering from a joint program of the Indian Institute of Technology Jodhpur, and
Kharagpur, respectively. He is experienced as an Android developer. His research areas
include mobile and wireless computing, IoT, Telecom Networks, software design, and
operating systems. He has published various research papers in international journals
and conferences and has been serving as a reviewer for many highly-respected journals.
He considers himself a ‘forever student,’ eager to both build on his academic foundations
in computers and engineering and stay in tune with the latest technical strategies
through continued coursework and professional development. Garvit has worked in various
IT sectors in the Indian Govt. like Centre for Railway Information and Systems, Airports
Authority of India, etc., as an Android developer and led the teams in every project
that was assigned.
Nitesh Singh Bhati received his doctorate from GGISP University, Delhi. He obtained
his M. Tech in CSE and B. Tech in Computer Science and Engineering from the G.G.S.I.P.
University Delhi. He is working as an Assistant Professor in the Department of Computer
Science and Engineering of Delhi Technical Campus, UP. He has published various research
papers in international journals and conferences. His current research area is information
security.
Puneet Kumar is a Professor in the Department of Computer Science & Engineering
at University Institute of Engineering, Chandigarh University, Mohali, India. He has
completed his Master’s and Ph.D. in Computer Science, and believes in the philosophy
of interdisciplinary research. He has also completed a certificate course on intellectual
property rights from WIPO Academy, Geneva. He has more than 18 years of teaching,
research, and industrial experience. His major research interests are Machine Learning,
Data Science, and e-government. He has published various research papers and articles
in national and international journals, and his papers are widely cited by various
stakeholders across the world. He is the recipient of several software copyrights
from the Ministry of Human Resource and Development, Government of India. He has also
published books on e-governance titled “E-Governance in India: Problems, Prototypes
and Prospects”, “Stances of e-Government: Policies, Processes and Technology” and
“Artificial Intelligence and Global Society: Impact and Practices” published by CRC
Press, Taylor and Francis Group. Dr. Kumar is also the active member of the professional
societies Computer Society of India (CSI) and Association for Computing Machinery
(ACM).
Vishal Bharti is working as Professor and Additional Director in Dept of CSE at
Chandigarh University, Mohali, Punjab. He completed his Ph.D. in 2016 in the area
of Information Security. He did his M.Tech. and B.E. from Birla Institute of Technology,
Mesra. Ranchi. He also holds Doctorate in Management Studies in addition to MBA in
IT and E-MBA in S&M. His area of specialization is Cyber Security, Network Security
and Distributed Computing. He is having a mixed bag of experience of 16+ Years in
both Academia and IT Industry He have seven filed patents, twenty eight Copyrights
and 13 Govt. Grants(SERB, DST, NSTEDB, EDI etc.) and two seed money grants to his
credit. He published seventy plus research papers at both National & International
level like IEEE, Springer, Taylor & Francis and was awarded with Best Faculty Award
2010, Academic Leadership Award 2019, Emerging Leader in Higher Education 2019, Best
Young HoD of the Year 2019, Best Computer Teacher Award 2019, Research Excellence
Award in 2020, Outstanding contribution in promoting Education in Rural Areas Award
in 2021 and Best Young Director of the in 2021.