Mobile QR Code QR CODE

2024

Acceptance Ratio

21%


  1. ( Department of Orthodontics, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University, Chennai, India kumaravelk.sdc@saveetha.com)
  2. ( Department of ECE, Dr. Ambedkar Institute of Technology, Outer Ring Road, Near Jnana Bharathi Campus, Mallathahalli, Bengaluru, Karnataka, India meenakshilr.ec@drait.edu.in)
  3. ( Department of Electrical and Electronics Engineering, Sri Sivasubramaniya Nadar College of Engineering, Rajiv Gandhi Salai (Omr), Kalavakkam, Tamilnadu, India leor@ssn.edu.in)
  4. ( Department of Electrical and Electronics Engineering, K.Ramakrishnan College of Technology, Trichy, Tamilnadu, India \qquad mahilj.eee@krct.ac.in)



Leader nodes, Community detection social network, Big data, Map reduce

1. Introduction

The data analysis of social network and the social network are noticed as more and more in recent study in Data Mining, Data Analysis, Machine Learning, Graph Mining and Tweets Classification [1]. The community identification [2] describes at group of nodes concerning to the set of communities to make as strongly linked sub- graphs from the graph [3]. The networks have modeled as graphs to detect the communities is called graph partition issue in modern graph theory [4], also the graph clustering or dense sub graph findings issue [5] in the graph mining region.

2. Related work

V. E. Lee, et al. [6] have suggested an overlapping clique group that, according to the community assumption, supported the Clique Percolation Method (CPM) [7,8]. By searching for neighborhood cliques, CPM finds the community's structures and finds all size $k$ cliques [9,10]. On the basis of link clustering for community recognition, J. M. Kumpula et al. [11]. To utilize the label [12] broadcast method to determine the overlapping community structures [13,14,15] by enabling the node to display the various labels, such as COPRA and SLPA [16,17]. The label distribution shows some label clustering in addition to rapidly revealing an overlapping community in the non-determinacy approach of the network [18]. Remaining paper is structured as: Architecture of the proposed approach is delineated in Section III, Find the every pair shortest pair utilizing Giraph API is portrayed in Section IV, Computing the degree of impact is presented in Section V, Leader node recognition is depicted in Section VI, Community based generation is displayed in Section VII, and the results with discussions are shown in Section VIII. Section IX gives the conclusion of this work.

3. Proposed Architecture

Fig. 1 depicts the structural design of the proposed technique. It has 6 steps: Step 1: Find each pair on the shortest path for the particular graph of social network applying a Giraph API step 2: Applying HADOOP Map Reduce to find the degree, betweenness, and proximity centralities; step 3: Combining all of the centralities identified in step 2 to calculate the degree of impact (DI). Step 4: Determine the leader nodes by using the structural centrality calculation; step 5: Utilizing leader nodes to generate node vectors; step 6. By using the Parallel Fuzzy C-Means Clustering (FCM) Approach, the soft and hard community clusters are developed.

Fig. 1. Structural design of proposed approach.

../../Resources/ieie/IEIESPC.2025.14.1.1/image1.png

4. Find the Every Shortest Pair Utilizing Giraph

The social network folder is uploaded to Giraph's task input in CSV format Runner.run (arg1; arg2) approach called up to begin the process. For instance, the technique uses the SHORTEST PATH action to start the Giraph process from a one node. Image, Fig. 2 portrays Giraph's implementation pattern. Fig. 2 portrays that Giraph's implementation method.

Fig. 2. Giraph's implementation method.

../../Resources/ieie/IEIESPC.2025.14.1.1/image2.png

5. Degree of Influence (DI) computation utilizing Map-Reduce Method

This paper utilizes the DI based on the graph's fundamental centrality measurement to display a node's relevance inside the specified social network. This part outlines the Map-Reduce based centrality computations that we presented, and how these centralities are finally combined to determine the DI.

5.1 Finding Betweenness centrality utilizing Map Reduce

The whole node acts as a bridge as betweenness centrality develops over the shortest path connecting the two nodes. The following analysis examines betweenness vertex v at the graph $G= (V,E)$ including $V$ vertices:

1. Determine which node pair's shortest paths are $(s,t)$.

2. Calculate the percentage of the shortest path that leads from vertex $v$ to each of the node pairs $(s, t)$.

3. Total vertex pairs $(s,t)$ add up to this ratio.

The betweenness centrality indicated as follows:

(1)
$ CB_{v=S\neq v\neq t\in V}={\rho }_{st}(v)/{\rho }_{st} , $

where $\rho_{st}$ implies entire shortest paths through `$s$' node $t$ and $\rho_{st}(v)$ represents overall paths to pass from $v$

Algorithm 1: Map of betweeness centrality job

../../Resources/ieie/IEIESPC.2025.14.1.1/al1.png

Algorithm 2: Betweeness centrality minimize job

../../Resources/ieie/IEIESPC.2025.14.1.1/al2.png

5.2 Map reduce dependent nearness centrality finding

A node's closeness to all other nodes increases with its center position, this is labeled in equation (2).

(2)
$ C(v )= \frac{N-1}{\sum_u dist(v,u) }. $

The less path distance amid the $v$ and $u$ vertices is expressed as $dist(v,u)$, here $N$ refers total nodes in the network that is shown.

Algorithm 3: Closeness centrality map job

../../Resources/ieie/IEIESPC.2025.14.1.1/al3.png

Algorithm 4: Closeness centrality decrease job

../../Resources/ieie/IEIESPC.2025.14.1.1/al4.png

5.3 Map reduce dependent centrality degree finding

A simple method of calculating centrality that counts a node's neighbors is called degree centrality. This centrality can be mathematically stated in (3)

(3)
$ Cd(v)=\frac{\mathrm{deg}(v)}{n-1}, $

where deg($v$) implies the quantity of node real connection node $v$, and $n-1$ is the count of predictable link of every vertex.

Algorithm 5: The degree of centrality on a map job

../../Resources/ieie/IEIESPC.2025.14.1.1/al5.png

Algorithm 6: Decrease the centrality degree for the job

../../Resources/ieie/IEIESPC.2025.14.1.1/al6.png

5.4 Computing Level of Influence

Equations (4), (5), and (6) define the centrality ratios. The aggregation which refers to as the DI and is normalized among 0 and 1 that is exhibited in equation (7). Consider $\alpha$, $\beta=0.4$, and $\gamma =0.2$.

(4)
$ {{\mathrm{deg}}_{\mathrm{ratio}} (vi)}=\frac{\mathrm{deg}\mathrm{(vi)}}{\sum_{\mathrm{vi}{\epsilon }\mathrm{V}}{\mathrm{deg} \mathrm{(vi)}}},\\ $
(5)
$ {{\mathrm{between}}_{\mathrm{ratio}} \left(vi\right)}=\frac{{\mathrm{between} \left(\mathrm{vi}\right)\ }}{\sum_{\mathrm{vi}{\epsilon }\mathrm{V}}{\mathrm{between(vi)}}} ,\\ $
(6)
$ {{\mathrm{closeness}}_{\mathrm{ratio}} (vi)}=\frac{\mathrm{closeness}\mathrm{(vi)}}{\sum_{\mathrm{vi}{\epsilon }\mathrm{V}}{\mathrm{closeness(vi)}}},\\ $
(7)
$ DI\left(vi\right)=\alpha\mathrm{*deg}\mathrm{ratio}\left(\mathrm{vi}\right)+{\beta }*\mathrm{closenessratio}\left(\mathrm{vi}\right)\nonumber\\ \hskip 3pc +{\gamma }*\mathrm{betweenratio} $

• DI(vi) - DI for node vi.

• $\alpha$, $\beta$, $\gamma$ - weight variables $\beta \ge \gamma > \lambda$ with $\alpha +\beta + \gamma =1$.

6. Leader nodes identification

6.1 Relative Distance

Node vi's relative distance, or $\rho$I, is defined as the shorter distance between nodes v${}_{i}$ with high DI values is depicted in equation (8),

(8)
$ \rho_i =\min_{j:DI_{j}>DI_i} (d_{ij}). $

Typically with huge density for the node that take $\rho_{i} = \max_{j}(d_{ij})$.

Structural Centrality

The structural centers can be identified by their greater DI value for node vi is labelled in eqn (9).

(9)
$ Sc_i=D. $

The metric essentially ignores this situation when adjacent nodes along the better level of effect are recognized as structural centers.

Algorithm 7: Identify the leader nodes

../../Resources/ieie/IEIESPC.2025.14.1.1/al7-1.png

../../Resources/ieie/IEIESPC.2025.14.1.1/al7-2.png

7. Community creation

7.1 Node Vector Set Generation

The leader nodes found in the previous step were used to create the node vectors for each node on the chosen network. Fig. 3 shows the node vectors for a certain nodes in the Karathe network.

Fig. 3. Node vector including seed nodes for karathe network.

../../Resources/ieie/IEIESPC.2025.14.1.1/image3.png

7.2 Generation of Soft Community Clusters

Algorithm 8: Map_Reduce_FCM

../../Resources/ieie/IEIESPC.2025.14.1.1/al8-1.png

../../Resources/ieie/IEIESPC.2025.14.1.1/al8-2.png

Algorithm 9: Cluster_Centre_Mapper

../../Resources/ieie/IEIESPC.2025.14.1.1/al9.png

Algorithm 10: Cluster_Center_Reducer

../../Resources/ieie/IEIESPC.2025.14.1.1/al10.png

Algorithm 11: Membership_Mapper

../../Resources/ieie/IEIESPC.2025.14.1.1/al11-1.png

../../Resources/ieie/IEIESPC.2025.14.1.1/al11-2.png

Algorithm 12: Membership_Reducer

../../Resources/ieie/IEIESPC.2025.14.1.1/al12.png

7.3 Allocating leader nodes towards the communities

The community cluster whose centroid has the greatest similarity to the leader node receives the allocation of the leader node. This is labeled in equation (10),

(10)
$ \text{Cluster (V${}_{leader} )= \max_{ci\in C} \{$Cosine(ci,Vleader)$\}$}, $

V${}_{leader}$ - Leader node,

C${}_{i}$ - Centroid for $i$th cluster.

7.3.1 Hard community clusters Creation

The soft clusters are generated by parallel FCM approach. The matrix of community membership was produced by this FCM. The node v${}_{i}$ can be assigned to two communities c${}_{i}$ and c${}_{j}$ if node v${}_{i}$ having the community membership score difference on two communities c${}_{i}$ and c${}_{j}$ not more than $\lambda$. i.emem${}_{vi} ($c${}_{\rm max} )$-mem${}_{vi} ($c${}_{i} )$&mem${}_{vi} ($c${}_{\rm max} )$-mem${}_{vi} ($c${}_{j} ) \le \lambda$, and both c${}_{i}$ and c${}_{j}$ must be top two scorer community for node v${}_{i}$. Here we are setting threshold $\lambda=0.00001$. Fig. 4(a) illustrates the Communities of Karathe Network; Fig. 4(b) shows the Communities of Dolphins Network; Fig. 4(c) denotes Communities of Jazz Network and Fig. 4(d) shows communities of E-Mail Network.

Fig. 4. (a) Communities of Karathe Network. (b) Communities of Dolphins Network. (c) Communities of Jazz Network. (d) Communities of E-Mail Network.

../../Resources/ieie/IEIESPC.2025.14.1.1/image4a.png(a)

../../Resources/ieie/IEIESPC.2025.14.1.1/image4b.png(b)

../../Resources/ieie/IEIESPC.2025.14.1.1/image4c.png(c)

../../Resources/ieie/IEIESPC.2025.14.1.1/image4d.png(d)

8. Results with discussions

8.1 Description of the Dataset

The datasets for our experiments include Dolphins, Football, Karate, Political Blogs, Power, Yeast, Political Book Network, Netscience, Jazz, and Email networks. Detailed information is provided in Table 1, indicating the number of vertices (|V|), links (|E|), average degree (Davg), clustering coefficient (C), and average path length (PLavg).

Table 1. Graph features of various databases.

Networks

∣V∣

∣E∣

Davg

C-coef

PLavg

Football

115

613

10.661

0.403

2.508

Dolphin

62

159

5.129

0.303

3.357

Karathe

34

78

4.588

0.285

1.274

Jazz

198

2742

27.7

0.52

2.21

Pollblocks

1490

19025

25.537

0.172

3.39

Power

4941

6594

2.669

0.107

18.989

Yeast

2375

11693

9.847

0.153

4.14

Pol books

105

441

8.40

0.4875

2.341

E-mail

1133

5451

9.62

0.166

3.65

Net science

1589

2742

3.45

0.123

2.123

Word

112

425

7.59

0.1431

3.214

8.2 Outcomes based upon datasets

This section examines the proposed and existing CORPA, LBCD, LC, SLPA, LFM, and CPM methods under 10 real-world databases. The results produced by these techniques are tabulated in Table 2. Fig. 5 displays the membership matrix for the karate network. Fig. 6 reveals nodes with a membership value of 1 for one cluster, identifying them as leader nodes.

Table 2. EQ values comparison in real world datasets.

Data sets

EQ (Extended Modularity score) / number of communities

CPM

LC

SLPA

LFM

GCE

COPRA

LBCD

Karathe

0.2708 / 3

0.2251 / 2

0.5192/2

0.4852/2

0.3474/2

0.2610/3

0.5295 /2

Dolphin

0.4195 / 4

0.216/10

0.4997/5

0.5183/5

0.4706/5

0.4640/4

0.5486/2

Football

0.4897 /15

0.1729/15

0.4712/11

0.4816/14

0.5907/12

0.3729/18

0.6212/10

Jazz

0.2396 /8

0.0381/6

0.4944/6

0.5198/4

0.3206/2

0.4560/4

0.4816/5

Polbooks

0.5008/5

0.0244/3

0.4959/3

0.4550/3

0.4865/3

0.5115/3

0.4996/3

Pollblocks

0.0106 /28

0.0118/27

0.4328/6

0.4425/8

0.2901/2

0.4290/9

0.4815/4

Power

0.1567/301

0.2149/322

0.559/661

0.5649/167

0.468/300

0.435/427

0.5105/290

Yeast

0.1153/17

0.6388/47

0.6268/53

0.6623/27

0.3544/40

0.2742/52

0.6667/28

Email

0.1163/4

0.0469/21

0.3634/16

0.4272/13

0.4273/35

0.00001/1

0.5619/7

Netscience

0.5745/169

0.8718/334

0.8683/325

0.8816/331

0.7250/126

0.6853/703

0.8980/310

Word

0.1003/4

0.0472/5

0.0910/3

0.0983/4

0.00001/1

0.00001/1

0.1386/7

The relation among EQ and $\beta$ is displayed in Fig. 6. The value of EQ is raised with $\beta$ variable then it attains higher value0.4 to 0.45 in small datasets as shown in Fig. 6(a). and it reaches the greater value 0.35 to 0.4 in large datasets as shown in Fig. 6(b).

Table 3 tabulates the implementation time of proposed approach. This is represented in Fig. 7.

Table 3. Implementation period of four real-world databases.

Database

Time consume at secs

Processor amount (p)

|c|=1

|c|=2

|c|=4

|c|=8

|c|=16

Power

15,500

9850

4550

2560

1752

Yeast

8150

4456

2386

1326

786

Net Science

5256

3001

1656

932

532

Poll Block

4932

2621

1412

802

486

Fig. 5. Membership Matrix of karathe Network.}

../../Resources/ieie/IEIESPC.2025.14.1.1/image5.png

Fig. 6. (a) EQ value on small data sets against ${\beta}$ values. (b) EQ value on large data sets against ${\beta}$ values.

../../Resources/ieie/IEIESPC.2025.14.1.1/image6a.png(a)

../../Resources/ieie/IEIESPC.2025.14.1.1/image6b.png(b)

Fig. 7. Implementation period of four real-world datasets.

../../Resources/ieie/IEIESPC.2025.14.1.1/image7.png

9. Conclusion with future work

This section discusses structural centrality as a method for determining the leader nodes in networks. In order to identify the overlapping communities inside the supplied network, the Fuzzy C-Means Clustering approach, which is based on Parallel Map-Reduce, is dependent on these leader nodes. Advantages in decidable tests are characterized by two elements by LBCD. First, the locating leader (seed) nodes compute the community structures count for a network. Second, in contrast to other approaches, the expansion strategy goes around to the leader nodes and removes the need for non-linear seed selection and human parameter adjustments, which accelerates convergence to the best results and makes the most stable algorithm possible. In the feature, the relationship between any two nodes in the provided social network graph may be predicted by applying this leader-based approach moving feature.

REFERENCES

1 
S. Fortunato, ``Community detection in graphs,'' Physics Reports, vol. 486, no. 3-5, pp. 75-174 2010.DOI
2 
L .Tang and H. Liu, ``Community detection and mining in social media,'' Synthesis Lectures on Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 1-137, 2010.DOI
3 
R Diestel, Random Graphs in Graph Theory, Springer, Berlin, Heidelberg, pp. 323-345, 2017.URL
4 
B. Bollobás, Modern Graph Theory, vol. 184 of Graduate Texts in Mathematics, Springer, New York, NY, p. 4, 1998.DOI
5 
C. C Aggarwal and H. Wang, Managing and Mining Graph Data, Springer, New York, vol. 40, 2010.URL
6 
V. E. Lee, et al., ``A survey of algorithms for dense subgraph discovery,'' Managing and Mining Graph Data, Springer, Boston, MA, pp. 303-336, 2010.DOI
7 
A Clauset, et al., ``Finding community structure in very large networks,'' Physical Review E, vol. 70, no. 6, 066111, 2004.DOI
8 
D. Gibson, et al., ``Discovering large dense subgraphs in massive graphs,'' Proc. of the 31st International Conference on Very Large Data Bases, pp. 721-732, August 2005.URL
9 
M. E. Newman, ``Modularity and community structure in networks,'' Proceedings of the National Academy of Sciences, vol. 103, no. 23, pp. 8577-8582, 2006.DOI
10 
U. N. Raghavan, et al., ``Near linear time algorithm to detect community structures in large-scale networks,'' Physical Review E, vol. 76, no. 3, 036106, 2007.DOI
11 
J. M, Kumpula, et al., ``Sequential algorithm for fast clique percolation,'' Physical Review E, vol. 78, no. 2, 026109, 2008.DOI
12 
F. Zhao and A. K. Tung, ``Large scale cohesive subgraphs discovery for social network visual analysis,'' Proceedings of the VLDB Endowment, vol. 6, no. 2, pp. 85-96, 2012.DOI
13 
J. Saad, ``Dense subgraph extraction with application to community detection,'' IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 7, pp. 1216-1230, 2010.DOI
14 
J. Huang, et al., ``Revealing density-based clustering structure from the core-connected tree of a network,'' IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 8, pp. 1876-1889, 2012.DOI
15 
M. C. Nascimento and A. C. de Carvalho, ``Spectral methods for graph clustering - A survey,'' European Journal of Operational Research, vol. 211, no. 2, pp. 221-231, 2011.DOI
16 
L. C. Freeman, et al., ``Centrality in valued graphs: A measure of betweenness based on network flow,'' Social Networks, vol. 13, no. 2, pp. 141-154, 1991.DOI
17 
M. M. Iqbal and K. Latha, ``An effective community-based link prediction model for improving accuracy in social networks,'' Journal of Intelligent & Fuzzy Systems, (Preprint), pp. 1-17, 2022.DOI
18 
A. Rusinowska, R. Berghammer, H. D. Swart, and M. Grabisch, ``Social networks: Prestige, centrality, and influence,'' Proc. of International Conference on Relational and Algebraic Methods in Computer Science, Springer, Berlin, Heidelberg, pp. 22-39, May 2011.DOI
Kumaravel Kaliaperumal
../../Resources/ieie/IEIESPC.2025.14.1.1/author1.png

Kumaravel Kaliaperumal is currently working an Assistant Professor with the Department of Orthodontics, Saveetha Dental College and Hospitals, Saveetha Institite of Medical and Technical Sciences (SIMATS), Saveetha University, Chennai, India. His research interests include soft computing application in power system Engineering.

Meenakshi L. Rathod
../../Resources/ieie/IEIESPC.2025.14.1.1/author2.png

Meenakshi L. Rathod is currently working an Assistant professor with the Department of ECE, Dr.Ambedkar Institute of Technology, Bengaluru, Karnataka. She obtained her B.E. in electronics & communication engineering from P.D.A. College of Engineering, Gulbarga, in Feb.2002, an M.Tech. degree in VLSI design and embedded system from UTL Technological Ltd. VTU extension center under Visvesvaraya Technological University in 2009, and a Ph.D. degree under Visvesvaraya Technological University, Belagavi. She has published more than 20 researchpapers in the journals of international repute and 4 Patents filed. Her research interests include VLSI and Communications.

Leo Raju
../../Resources/ieie/IEIESPC.2025.14.1.1/author3.png

Leo Raju is an Associate Professor in the Department of Electrical and Electronics Engineering with 24 years of teaching and 4 years of industrial experience. He received his B.E. (EEE) degree from Government College of Technology, Coimbatore, Bharathiyar University in 1994, an M.E. degree in computer science and engineering from Anna University, Chennai in 2005, and a Ph.D. degree from Anna University, Chennai. He is a recognized supervisor at Anna University. His research interests include Smart Grid, Micro-grid Energy Management, Multi-Agent Systems, IoT, Big Data Analytics, Soft Computing, Machine Learning, and Blockchain.

J. Mahil
../../Resources/ieie/IEIESPC.2025.14.1.1/author4.png

J. Mahil is currently working as Professor in the Department of EEE at K.Ramakrishnan College of Technology, Trichy, Tamilnadu, India. She has completed her B.E. degree in electrical and electronics engineering, an M.E degree in process control and instrumentation, and a Ph.D degree in electrical engineering, from the Manonmaniam Sundaranar University, Annamalai University and Noorul Islam University, India, in 2001, 2004, and 2013, respectively. Her researches focus on Neural Networks, Evolutionary Optimization, Biomedical, Computational Intelligence and Bio-Signal Processing. She has 20 years of experience in teaching and research and published many research papers in papers in leading peer reviewed International Journals.