Mobile QR Code QR CODE

  1. (Department of Police General Education, Zhengzhou Police University, Zhengzhou 450000, China Yanfei_Gao72@outlook.com )



OC-SVM, Open Pose model, Behavioral anomaly monitoring, Smart community, Security system

1. Introduction

Police security is the key to building a harmonious community environment. In modern community management, an efficient information technology public security system can improve police efficiency and strengthen the overall sense of security in the community. Unfortunately, many property management and grassroots police units in the application of information technology are at a low level, still using manual surveillance for security monitoring, which is inefficient with the potential for security omissions due to distractions. Jonathan et al. highlighted the impact of rising community crime rates on the residents’ quality of life and property values [1]. This underscores the importance of effective community policing and security systems. Prior research has evolved from basic manual surveillance methods to more sophisticated automated systems. Collins et al. [2] proposed automated surveillance systems for video streaming data, while Socha et al. [3] used video surveillance to improve public space safety. These early efforts laid the groundwork for integrating machine learning into smart community systems [4,5]. Socha et al. used video surveillance to improve the safety of public space [3] and examined how to use artificial intelligence technology to build smart communities. Yao et al. analyzed video surveillance [6] that sends powerful explosions to the users and public security system when abnormal situations are recognized. Unfortunately, the system has a minimum delay of 7.3 seconds for abnormal behaviors. Shehzed et al. used video surveillance on human contours [7] and clustering using unsupervised learning to determine abnormal events. Recently, Bhati’s studies on intrusion detection using Coarse Gaussian SVM [8,9] highlighted the evolution of machine learning techniques in security systems. In addition, Tiwari et al. [10] performed a deep analysis of the prediction of COVID-19 in India using an ensemble regression approach to demonstrate the application of machine learning in diverse fields, including public health.

Two main areas are most likely to cause safety issues in a community: pedestrian hazards caused by falling buildings [11,12] and life safety issues for the elderly caused by falls [13]. In recent years, the global aging rate has been accelerating, and some statistics show that more than 80% of the accidental life accidents of the elderly are caused by falls [14,15]. If a fall by an elderly person can be detected in time and treated quickly [16], the extent of health damage can be reduced by more than 85% [17]. Many researchers have conducted in-depth studies on fall monitoring [18]. The field has witnessed significant advances in recent years. For example, Mirmahboub proposed a fall monitoring algorithm based on the target outer frame [19], which used background differencing to find the external contours of the human body in the video, and then subsequently utilized SVMs to differentiate between fall events and non-fall events. Ma proposed an approach based on the combination of limiting learning classification and shapes to differentiate whether a fall had occurred [20]. Harrou proposed a multivariate exponential average weighted monitoring algorithm [21] to reinforce the distinction of whether abnormal behavior has occurred. These studies have contributed to detecting abnormal behaviors, particularly falls, which are a major concern in community safety.

This paper proposes a novel monitoring system architecture to monitor neighborhood safety conditions more efficiently and in real time. The system combines a lightweight convolutional neural network (CNN) MobileNetV3 and multi-discriminative features to improve the adaptability, speed, and accuracy of abnormal behavior detection. By replacing some of the structures in the OpenPose model, the algorithm in this paper reduces the computational effort and simplifies the model structure. In addition, vectors are generated by processing the coordinates of the key points of the human body using the angle between these vectors and the ground and the aspect ratio of the human body’s calibration frame as the discriminative features of a fall. By incorporating the OC-SVM algorithm, the proposed system improves the anomaly detection accuracy and minimizes false positives, a common issue in traditional surveillance systems. The OC-SVM algorithm can handle unbalanced and unlabeled data and is particularly effective in distinguishing between normal and anomalous behaviors, enhancing the overall reliability of the system. This study also designed a smart community policing security system that collects data in various ways, such as IoT sensors and video cameras, and adopts virtualization technology to allocate computation, storage, and network resources dynamically in a logical, abstract way, which improves the availability and scalability of the system, and ultimately realizes the real-time abnormal behavior detection in multiple scenarios. Finally, the effectiveness and versatility of the algorithm are demonstrated through experimental analysis of the Multiple Cameras Fall dataset and a comparison with traditional methods.

2. Abnormal Behavior Monitoring Algorithms

The abnormal behavior algorithm proposed in this paper consists of two steps: feature extraction and behavioral discrimination.

2.1 MobileNetV3-based OpenPose Feature Extraction

In this study, the OpenPose model was improved to increase the accuracy and real-time detection of anomalous behaviors. The original OpenPose model consisted of the first 10 layers of the VGG-19 network and a two-branch multi-stage CNN for predicting the key points and partial affinity fields [22]. On the other hand, its application in real-time scenarios is limited by its computational intensity. In this study, inspired by the work of Lightweight Open Pose [23], the VGG-19 was replaced with MobileNetV3 [24], a lightweight neural network model known for its efficiency and effectiveness in processing video data. MobileNetV3 reduces the number of parameters and computational complexity significantly using techniques such as depth-wise separable convolutions and average pooling. This makes MobileNetV3 particularly suitable for applications requiring the real-time processing and analysis of video data.

Fig. 1. Structural diagram of the improved Open Pose feature extraction.
../../Resources/ieie/IEIESPC.2024.13.4.414/fig1.png

As shown in Fig. 1, to capture small changes in the body and pose in greater detail, the second stage of this paper in the OpenPose architecture replaces the original 7${\times}$7 convolution kernel with a smaller combination of convolution kernels, one 1${\times}$1 and two 3${\times}$3 convolution kernels. The model is followed by a ResNet structure. This design reduces the number of covariates and the computational cost while increasing the learning and expressive capabilities of the network and helping prevent overfitting. The network also integrates aspects of the residual network structure of ResNet, accommodating the increased depth brought about by these changes. The feature extraction structure of the Mopen Pose model processes the video sequences through MobileNetV3. After obtaining the feature maps, the key point and partial affine vector field predictions are performed through two branches to output the prediction results quickly and accurately.

When an image or video sequence is input into the model, the upper branch generates a site confidence map for the key points of the human body in each image frame:

(1)
$$ \mathrm{C}_{\mathrm{n}, \mathrm{m}}^*(\mathbf{Q})=\exp \left(-\frac{\mathbf{Q}-\mathrm{X}_{\mathrm{n}, \mathrm{m} 2}^2}{\sigma_{\mathrm{n}, \mathrm{m}}^2}\right) $$

where m is the character serial number; n is the human body key point; Q is any point in the confidence map $\mathrm{C}_{\mathrm{n},\mathrm{m}}$; $\mathrm{X}_{\mathrm{n},\mathrm{m}}$ is the true location of the human body’s key point; $\sigma _{\mathrm{n},\mathrm{m}}$ is the probability distribution describing the corresponding key point. Ideally, each key point corresponds only to the unique maximum value in the confidence map. Therefore, the maximum value $\mathrm{C}_{\mathrm{n}}^{\mathrm{*}}\left(\mathrm{Q}\right)=\max _{\mathrm{m}}\mathrm{C}_{\mathrm{n},\mathrm{m}}^{\mathrm{*}}\left(\mathrm{Q}\right)$ can be obtained to determine the location of the Q-point, and the pixel coordinates of the location of the Q-point can be expressed as the coordinates of the key point n. Down-branching is used to predict the true location of two phases of the human body; ${\sigma}\_(n,m)$ denotes the probability distribution describing the corresponding key point. The lower branch is used to predict the partial affinity field between two neighboring key points, n1 and n2, with the integral value:

(2)
$ \mathrm{B}=\int _{\mathrm{t}=0}^{\mathrm{t}=1}\mathrm{k}_{\mathrm{b}}\left(\mathbf{Q}\left(\mathrm{t}\right)\right)\cdot \frac{\mathbf{n}_{2}-\mathbf{n}_{1}}{\mathbf{n}_{2}-\mathbf{n}_{12}}\mathrm{dt} $

where $\mathrm{Q}\left(\mathrm{t}\right)=\left(1-\mathrm{t}\right)\mathrm{n}_{2}+\mathrm{tn}_{1},\,\,\mathrm{t}\in \left(0,1\right);\,\,\,\parallel \mathrm{n}_{2}-\mathrm{n}_{1}\parallel _{2}$denotes the length of the limb; $\mathrm{Q}\left(\mathrm{t}\right)\in \left[\mathrm{n}_{1},\mathrm{n}_{2}\right]$. If the point Q is on the limb, $\mathrm{k}_{\mathrm{b}}\left(\mathrm{Q}\left(\mathrm{t}\right)\right)=\nu ,\nu =\left(\mathrm{m}_{2}-\mathrm{m}_{1}\right)/\parallel \mathrm{m}_{2}-\mathrm{m}_{1}\parallel _{2}$ is the unit vector, otherwise $\mathrm{k}_{\mathrm{b}}\left(\mathrm{n}\left(\mathrm{t}\right)\right)=0$.

The integral value between each key point and its neighboring key points is calculated, and a larger integral value means that the pair of the neighboring key points is closer to the real skeleton connection. Therefore, the correct connection for each type of limb can be obtained by selecting the maximum value of B and connecting the limbs that share the same key points to form the human skeleton.

Twenty-one key body parts were identified (Fig. 2), numbered from 0 to 20 for illustration purposes. No. 0 corresponds to the nose; No. 1 corresponds to the neck; Nos. 5 and 2 correspond to the left and right shoulders, respectively; Nos. 6 and 3 correspond to the left and right elbows, respectively; Nos. 7 and 4 correspond to the left and right wrists, respectively; No. 8 corresponds to the center of gravity; Nos. 13 and 9 correspond to the left and right hips, respectively; Nos. 14 and 10 correspond to the left and right knees, respectively; Nos. 15 and 11 correspond to the left and right ankles, respectively; Nos. 16 and 12 correspond to the left and right feet, respectively; Nos. 19 and 17 correspond to the left and right feet, respectively. Nos/ 16 and 12 correspond to the left and right feet, respectively; Nos. 19 and 17 to the left and right eyes, respectively; Nos. 20 and 18 to the left and right ears, respectively. This numbering system helps describe and analyze the key parts of the human body more accurately and conveniently.

Fig. 2. Human body key point and skeleton detection results.
../../Resources/ieie/IEIESPC.2024.13.4.414/fig2.png

The algorithm in this paper takes the upper left corner of the image as the coordinate origin and assigns a coordinate to each key point. Specifically, these coordinates are (x0, y0) for the nose, (x1, y1) for the neck, and so on, up to (x18, y18) for the right ear and (x20, y20) for the left ear. Such a definition of the coordinates helps pinpoint the location of each key point in the image.

2.2 Principle of OC-SVM Algorithm

The OC-SVM model, proposed by Scholkopf in 1999 [25], is a single-class support vector machine (SVM) that belongs to an unsupervised learning algorithm and is mainly used for outlier detection. Unlike traditional SVMs, which are typically used for binary classification tasks, OC-SVM is designed to detect anomalies in an unsupervised manner. The primary advantage of OC-SVM lies in its ability to handle unbalanced datasets, where the number of normal instances outweighs the number of anomalies. This characteristic makes OC-SVM particularly well-suited for real-time anomaly detection in community policing scenarios, where abnormal behaviors are rare compared to normal activities. The algorithm is described as follows:

Set the sample data $\left\{\chi _{1},\chi _{2},\cdots ,\chi _{\mathrm{m}}\right\}\in \mathrm{X}^{\mathrm{n}};\mathrm{~ m}$ is the number of samples. The expression of the separating hyperplane is $\omega ^{\mathrm{T}}\phi \left(\chi \right)-\rho =0$, where $\phi \left(\chi \right)$ is the function that maps the samples to the feature space, and $\omega ^{\mathrm{T}}$, $\rho $ is the normal vector and offset of the separating hyperplane in the feature space. The objective is to maximize the distance between the separating hyperplane and the origin. Therefore, the optimization problem to be solved by OC-SVM is transformed into a mathematical formulation:

(3)
$$ \left\{\begin{array}{c} \min _{\omega, p, \xi} \frac{1}{2}\|\omega\|^2+\frac{1}{v \mathrm{~m}} \sum_{\mathrm{i}=1}^{\mathrm{m}} \xi_{\mathrm{i}}-\rho \\ \text { s.t. } \\ \omega^{\mathrm{r}} \varphi\left(\chi_{\mathrm{i}}\right) \geqslant \rho-\xi_{\mathrm{i}} \\ \xi_{\mathrm{i}} \geqslant 0, i=1,2, \cdots, m \end{array}\right. $$

where $\xi _{\mathrm{i}}$ is a relaxation variable indicating that outliers can exist; v is a parameter controlling the upper limit of the number of outliers and the lower limit of the number of all support vectors. After introducing the Lagrange multiplier method, the optimal hypersphere is found by maximizing the Lagrange function. The dual of this optimization problem is obtained as

(4)
$$ \left\{\begin{array}{c} \min _a \frac{1}{2} \sum_i^m \sum_j^m \alpha_i \alpha_j \kappa\left(\chi_i, \chi_{\mathrm{j}}\right) \\ \text { s.t. } \\ \sum_{\mathrm{i}=1}^{\mathrm{m}} \alpha_i=1,0 \leqslant \alpha_i \leqslant \frac{1}{v \mathrm{~m}} \\ i, j=1,2, \cdots, m \end{array}\right. $$

where $\alpha _{\mathrm{i}}$ is the Lagrange coefficient corresponding to the sample $\chi _{\mathrm{i}}\,,$ and the kernel function $\kappa \left(\chi _{\mathrm{i}},\chi _{\mathrm{j}}\right)=$ $\left\langle \phi \left(\chi _{\mathrm{i}}\right),\phi \left(\chi _{\mathrm{j}}\right)\right\rangle $ replaces the inner product in the feature space.

After solving the optimization problem (4), the samples corresponding to the Lagrangian coefficient $\alpha _{\mathrm{i}}${\textgreater}0 are the $\chi _{\mathrm{i}}$ support vectors. From these support vectors, the normal vector of the hyperplane $\omega =\sum _{\mathrm{i}=1}^{\mathrm{m}}\alpha _{\mathrm{i}}\phi (\chi _{\mathrm{i}})^{\mathrm{j}}$ and the hyperplane offset $\rho =~ \omega ^{\mathrm{T}}\phi \left(\chi _{\mathrm{SV}}\right)=\sum _{\mathrm{i}=1}^{\mathrm{m}}\alpha _{\mathrm{i}}\kappa \left(\chi _{\mathrm{i}},\chi _{\mathrm{SV}}\right)$, and $\chi _{\mathrm{SV}}$ refers to some support vector, which in turn leads to a classification decision function of

(5)
$ \begin{array}{l} f\left(\mathrm{x}\right)=\mathrm{sgn}\left[\omega ^{\mathrm{T}}\phi \left(\chi \right)-\rho \right]=\mathrm{sgn}\left[\sum _{\mathrm{i}}^{\mathrm{m}}\alpha _{\mathrm{i}}\mathrm{k}\left(\chi _{\mathrm{i}},\chi \right)-\rho \right]~ \end{array} $

The human posture data is tested and brought into Eq. (5). If the output of $\mathrm{f}\left(\mathrm{x}\right)$ is 1, the point is normal data and an anomaly that needs to be attended to if the output is ${-}$1.

2.3 Abnormal Behavior Judgment based on OC-SVM

According to the improved OpenPose, the key points of the human body and the relationships between the key points, such as the posture of the arms (defined by the relative positions and angles between the shoulders, elbows, and wrists) and the posture of the legs (defined by the relative positions and angles between the hips, knees, and ankles) can be extracted from video and picture frames. These relationships and angles are transformed into a series of feature vectors that provide the database for subsequent abnormal behavior judgments.

Fig. 3. Abnormal behavior discriminant feature vector.
../../Resources/ieie/IEIESPC.2024.13.4.414/fig3.png

The key points involved in the algorithm of this paper are the neck, center of gravity point, right knee, right ankle, left knee, and left ankle, whose coordinates are (x1,y1), (x8,y8), (x10,y10), (x11,y11), (x14,y14), and (x15,y15), respectively. As shown in Fig. 3, the feature vectors can be obtained as $\left.\mathbf{V}_{1}=\left(\begin{array}{l} \mathrm{x}_{8}-\mathrm{x}_{1},\mathrm{y}_{8}-\mathrm{y}_{1} \end{array}\right.\right),\,\,\mathbf{V}_{2}=\left(\begin{array}{l} \mathrm{x}_{15}-\mathrm{x}_{14},\mathrm{y}_{15}-\mathrm{y}_{14} \end{array}\right),$ $\left.\mathbf{V}_{3}=\left(\begin{array}{l} \mathrm{x}_{10}-\mathrm{x}_{11},\mathrm{y}_{10}-\mathrm{y}_{11} \end{array}\right.\right).$They represent the human spine vector, left calf vector, and right calf vector, respectively.

The solid arrows in Fig. 3 indicate the human skeleton involved in the algorithm. The dashed arrows indicate the direction vector x = (1,0) of the x-axis, which can be used to represent the direction vector of the real ground because it is always parallel to the ground, and the corresponding angles of $\mathbf{V}_{1}$, $\mathbf{V}_{2}$, and $\mathbf{V}_{3}$ to x are

(6)
$ \theta _{\mathrm{i}}=\arccos \frac{\left[\mathbf{x},\mathbf{S}_{\mathrm{i}}\right]}{\left\| \mathbf{x}\right\| \cdot \left\| \mathbf{S}_{\mathrm{i}}\right\| },\mathrm{i}=1,2,3 $

where $\theta _{1}$ is the angle between the human spine and the ground; $\theta _{2}$ is the angle between the right calf and the ground; $\theta _{3}$ is the angle between the left calf and the ground. The aspect ratio of the human body also changes when the person falls or other abnormal behaviors. In this paper, $\mathrm{X}_{\max },\,\,\mathrm{Y}_{\max },\,\,\mathrm{X}_{\min },\,\,\mathrm{Y}_{\min }$ of all the coordinate points are chosen as the calibration frame of the human body as another feature of abnormal behavior, where $\mathrm{X}_{\max }-\mathrm{X}_{\min }$ is the width of the calibration frame of the human body, and $\mathrm{Y}_{\max }-\mathrm{Y}_{\min }$ is the height of the calibration frame of the human body. The aspect ratio of the human body calibration frame can be expressed as

(7)
$ \mathrm{R}=\frac{\mathrm{Y}_{\max }-\mathrm{Y}_{\min }}{\mathrm{X}_{\max }-\mathrm{X}_{\min }} $

In the video frame, $\theta _{1},\,\,\theta _{2},\,\,\theta _{3}$, and $\mathrm{R~ }$obtained from improved OpenPose are taken into (5). If the $\mathrm{f}\left(\mathrm{x}\right)$ output of the OC-SVM algorithm is ${-}$1, then it is judged that an abnormal behavior occurs, such as falling. If the $\mathrm{f}\left(\mathrm{x}\right)$ output of the OC-SVM algorithm is 1, the human body is in a normal posture, and no warning is required.

2.4 Model Realization Process

The basic idea of this paper for the security anomaly monitoring of video data is to collect all data related to community policing, including video and image data. The security operation model of each application module is constructed based on the data. The inter-frame difference method is then combined to calculate the gray value of the image and extract the target object [26]. This model replaces the first 10 layers of VGG-19 in the Open Pose model with a lightweight neural network MobileNetV3, which is used as the feature extraction network of this paper’s algorithm to improve the speed and accuracy of detection and capture the human posture accurately in real time. At the same time, the 7${\times}$7 convolutional kernel in the Open Pose two-branch structure is replaced with one 1${\times}$1 and two 3${\times}$3 sized convolutional kernels to reduce the computation burden. In addition, to define the fall state of the human body more accurately, the coordinates of the key points of the human body are processed to generate three vectors representing the position and direction of the human spine, the left and right calves, and the angle between the vectors and the ground. The aspect ratio of the human body’s calibrated frame is used as the fall discrimination feature to monitor fall events accurately. Fig. 4 presents the specific implementation process.

Fig. 4. Flowchart of the Security Exception Detection Algorithm.
../../Resources/ieie/IEIESPC.2024.13.4.414/fig4.png

3. OC-SVM based System Design

This smart community policing security system is designed to detect potential safety hazards and improve the residential well-being of community residents. With the integration of computer vision and machine learning technologies, the architecture of the smart community policing security system designed in this paper is divided into five key layers (Fig. 5): the data sensing layer, network transmission layer, basic support layer, data service layer, and functional application layer.

Fig. 5. Architecture diagram of intelligent community security system.
../../Resources/ieie/IEIESPC.2024.13.4.414/fig5.png

Data Sensing Layer: The police security system of the smart community is directly connected to the community environment and collects data from different types of IoT sensors (e.g., RF sensors), video cameras, and vehicle identification cameras, which provide data support for the subsequent analysis and monitoring so to allow a prompt response to security incidents.

The layer design allows for the easy addition or removal of sensors and cameras, making it suitable for both small neighborhoods and large urban areas. This flexibility is crucial for tailoring the system to the specific needs and characteristics of different communities. Table 1 lists the corresponding data types.

Table 1. Correspondence Between Sensor Types and Data Types.

Sensor Type

Data Type

Role in Security Monitoring

Video Camera

Video Stream

Real-time area surveillance

RF Sensor

Signal Strength

Positioning and tracking of people and objects

Vehicle Recognition Camera

Image/Video Stream

Vehicle identification and tracking

Access Control Sensor

Access Control Signal

Personnel entry and exit control

RFID [27]

Tag Information

Item tracking and management

Facial Recognition Camera

Image/Video Stream

Identity verification and authentication

Table 2. Data Transport Performance Indicators.

Network Layer

Bandwidth (Mbps)

Latency (ms)

Throughput (Mbps)

Stability Rating

Dedicated Video Network

500

30

450

9

Local Area Network

1000

10

950

10

Government External Network

200

50

180

8

Network transport layer: The data then flows to the Network Transmission Layer, which acts as the communication backbone of the intelligent community policing security system. The video private network is a video private network dedicated to the transmission of continuous video data streams, characterized by high broadband and low latency. A local LAN is mainly used for transmitting face recognition data, vehicle recognition data, RFID data, and access control data connecting various sensors and data processing units in specific areas and buildings. The Government Extranet is used mainly to transmit sensitive data to public security and medical departments to ensure a prompt response to security emergencies in the community. Encryption protocols and strict access control are essential to protect the privacy of the community residents. Table 2 lists the transport performance metrics of the three networks.

Base Support Layer: After obtaining data from the network transmission layer, the base support layer uses a logical abstraction approach to simplify the core resources of the system. By abstracting computing, storage, and network resources, after encapsulating these basic resources into a dedicated pool, the base support layer provides a scalable architecture that adapts to the future growing demands and improves the reliability of the system. The network architecture is designed to handle varying loads of data, ensuring stable performance regardless of the scale of deployment.

Data Service Layer: The data service layer provides services for the data from the base support layer, cleaning and standardizing the acquired video and photo data, unifying them into accessible, structured data, and constructing a multi-sensory database for easy storage and management. The data service layer can manage and analyze large datasets in terms of data processing. Implementing cloud computing and virtualization technologies within these layers allows for efficient data processing and storage, further enhancing the scalability of the system.

Platform Service Layer: The platform service layer integrates middleware services, such as microservices, container services, and message proxies, providing an environment for efficient development, deployment, and management. Integrating big data analysis tools, improved OpenPose, and OC-SVM algorithms provides powerful support for the real-time monitoring of abnormal behavior.

Application Layer: The application layer obtains interpretable results by calling services from the platform service layer through interfaces and databases from the data service layer. This layer comprises the Community Overview Dashboard, Abnormal Warning, Elderly Fall Warning, and Abnormal Vehicle modules. Among them, the community overview dashboard displays various security anomalies in the current community and can be used as a command center for police security. The anomaly warning includes warnings of falling objects and abnormal behavior of outsiders and uses the OC-SVM algorithm component to mark abnormal behaviors that deviate from normal patterns. The Elderly Fall Warning module monitors falls in real time and sends warnings to the public security department, community managers, and community medical departments.

4. Results and Discussion

4.1 Experimental Data Collection

The effectiveness of the proposed system was validated by carefully selecting the Multiple Cameras Fall dataset as the primary data source. The rationale behind choosing this specific dataset is its comprehensive representation of real-world scenarios involving falls, a critical safety concern in community environments. The dataset includes a diverse range of fall events captured under various conditions and from multiple angles, making it an ideal benchmark for testing the robustness and accuracy of the anomaly detection system.

Developed by the University of Monterey in 2010, the Multiple Cameras Fall dataset consists of video recordings from eight ordinary IP cameras, capturing twenty-four scenarios. These scenarios include fall events and various non-fall activities, such as cleaning, lying on the couch, and sitting. This diversity in the dataset provides a realistic and challenging test environment for the proposed system, ensuring that it can accurately differentiate between falls and other common daily activities (Fig. 6). Moreover, the dataset presents various complexities typical of real-life settings, such as different angles of falls, issues with obstruction by indoor objects, variations in body types, and diverse backgrounds including different clothing and indoor environments. These factors are crucial for assessing the ability of the system to function effectively in real-world community policing scenarios, where similar challenges are frequently encountered. Table 3 provides details of the dataset.

Fig. 6. Different Fall Positions for The Multiple Cameras Fall Dataset.
../../Resources/ieie/IEIESPC.2024.13.4.414/fig6.png
Table 3. Content of Multiple Cameras Fall Dataset.

#

Features

Descriptions

1

Angle

For different angles, the fall is in different directions.

2

Shelter

Problems with obscuring indoor objects, cameras, etc.

3

Body Differences

Physical differences between different bodies (height and size).

4

Background

Different clothing, different indoor environments.

The multiple cameras fall dataset was connected to a computer system equipped with an Intel Core i7 processor and 16GB RAM, running the Windows 10 operating system. PyTorch, a powerful open-source machine learning library, was utilized for data processing and simulation. PyTorch provided the tools necessary for video data processing, algorithm implementation, and performance simulation. The pre-processing of the dataset involves several key steps to ensure optimal analysis. Initially, video frames are standardized to a resolution of 640${\times}$480 pixels, followed by applying a Gaussian blur to reduce noise. This is crucial in enhancing the clarity of human figures against varying backgrounds. Color normalization is then conducted to adjust the contrast and brightness, improving figure-background distinction. Subsequently, human figures are isolated from the background through segmentation techniques. Finally, frames are converted to grayscale to simplify the data for effective feature extraction. These steps are essential for preparing the data for accurate abnormal behavior detection.

4.2 Experimental Results

Behavioral anomaly judgments occur continuously, e.g., a significant change in speed occurs during and after a fall. Therefore, in deep learning, the performance of behavioral anomaly detection models is generally judged by two metrics: precision and sensitivity [28]. In particular, precision indicates the proportion of prediction pairs in the samples where the prediction is a positive example, i.e., the proportion of prediction pairs where the prediction is a fall action. Sensitivity, also known as recall, represents the proportion of predicted pairs in the samples where the true outcome is a positive example, i.e., the proportion of predicted pairs in samples where the actual outcome is a fall. The formulae for sensitivity and specificity are as follows:

(8)
$ \text{Precision}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}} \\ $
(9)
$ \text{Sensitivity}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} $

In this study, all the sample videos in the dataset were normalized to a spatial resolution of 224${\times}$224 with a frame rate of 30fps size. The normalized dataset was then divided into a training set and a test set at a 7:3 ratio, and the test results of the test set were analyzed according to the evaluation index. This study examined the three most common daily movements of the human body (walking, sitting down, and falling) to verify whether the model can accurately discriminate abnormal behaviors. The effectiveness of the algorithm in this paper was tested. First, according to the traditional algorithm, the background difference method was used to find the external contour of the human body. The size of the extracted human body contour was then judged, and the OC-SVM algorithm was used to distinguish between falling and non-falling events. Table 4 lists the test results obtained by this method.

Table 4. Results of the Traditional Algorithms for Security Exception Detection.

Human Behavior States

Precision (%)

Recall (%)

Frame Rate (Frames/second)

Fall

78.9

77.4

10.96

Walk

73.0

70.1

10.96

Sit

63.2

61.8

10.96

Average of All Class

71.7

69.6

10.96

Table 5. Results of OpenPose and OC-SVM for Security Exception Detection.

Human Behavior States

Precision (%)

Recall (%)

Frame Rate (Frames/second)

Fall

86.9

84.3

4.45

Walk

82.5

79.1

4.45

Sit

76.9

73.7

4.45

Average of All Class

82.1

79.0

4.45

Table 5 lists the test results of the proposed method of obtaining images based on the inter-frame difference method and feature extraction by the OpenPose model improved using MobileNetV3 and the OC-SVM algorithm to recognize falling behavior.

The precision of the traditional and proposed methods was 71.7% and 82.1%, respectively, showing a 10.4% improvement compared to the traditional method. The recall of the traditional and proposed methods reached 69.6% and 79.0%, respectively, showing a 9.4% improvement. The monitoring speed of the traditional method was 10.96 frames/second, while the detection speed of the method proposed in this paper reached 4.45 frames/second.

Table 6. Comparison of the OpenPose and OC-SVM and the multimodal approaches.

Data

Accuracy

Precision

F1-Score

Proposed

88.74

82.12

75.43

Martínez-Villaseñor et al. [29]

95.00

77.70

72.80

The proposed method achieved a notable balance in the performance metrics, achieving an accuracy, precision, and F1-Score of 88.74%, 82.12%, and 75.43%, respectively (Table 6). Although the Mart\'{i}nez-Villase\~{n}or approach showed a higher accuracy of 95.00%, the precision was inferior, with a 4.42 % difference. This indicates a higher rate of true positive detections relative to the total number of positive detections made by the proposed system. The F1-Score, which is a harmonic mean of precision and recall, underscores the effectiveness of the proposed approach, demonstrating its capability to maintain balanced performance between precision and recall.

A PR curve was plotted for each category based on precision and recall to average the performance of the model. mAP is the average of the APs of multiple categories, i.e., the average accuracy. mAP is the average of the APs of multiple categories, i.e., the average accuracy. mAP is the average of the APs of multiple categories, i.e., the average accuracy of multiple categories. Figs. 7 and 8 show the PR curves for the traditional and proposed methods, respectively.

Fig. 7. PR curve of the Traditional Algorithms for Security Exception Detection.
../../Resources/ieie/IEIESPC.2024.13.4.414/fig7.png
Fig. 8. PR curve of the OpenPose and OC-SVM for Security Exception Detection.
../../Resources/ieie/IEIESPC.2024.13.4.414/fig8.png

The traditional algorithm had a monitoring accuracy of 0.786 for the falling category, 0.713 for the walking category, 0.625 for the sitting category, and 0.73 after averaging all the categories. The proposed method is based on obtaining the image using the inter-frame difference method and the OpenPose model improved by MobileNetV3 The method of feature extraction and using the OC-SVM algorithm to identify abnormal behaviors improved the accuracy in the falling category by 0.106, walking category by 0.147, and sitting category by 0.209, while the average accuracy improved by 0.132. Hence, the method proposed in this paper has superior accuracy and monitoring speed. This outcome can meet the real-time requirements in designing an intelligent community policing security system that will improve community security and residents’ happiness.

5. Conclusion

This paper presented a significant advance in community policing by developing a smart security system utilizing the OC-SVM algorithm and MobileNetV3-improved OpenPose model. By combining the lightweight convolutional neural network MobileNetV3 and the improved OpenPose model, this system can effectively extract human posture features and realize the real-time detection of abnormal behaviors. This paper first reviewed the traditional OpenPose model and its shortcomings in real-time surveillance applications. Subsequently, this study optimized the computational efficiency of feature extraction and behavioral discrimination by introducing a parallel computing mechanism. The specific conclusions are as follows:

The improved OpenPose model of MobileNetV3 was used for feature extraction, effectively improving the speed and accuracy of data processing. Using the optimized model, experiments conducted on the Multiple Cameras Fall dataset showed that the precision and recall of this system in fall detection reached 86.9% and 84.3%, respectively, which were significantly better than the traditional behavior monitoring methods.

The abnormal behavior monitoring algorithm based on the OC-SVM proposed in this paper realized a rapid response to abnormal behaviors by precisely analyzing feature vectors and human body postures. The experimental results showed that, compared to the traditional algorithm, the method proposed in this study had higher accuracy and real-time performance in recognizing abnormal behavior.

Simulation experiments verified the efficiency and reliability of this system. In terms of processing speed, the system achieved a processing speed of 4.45 frames per second, which meets the requirements of real-time monitoring. In addition, the system maintains stable performance in different environments, which proves its usefulness in smart community security management.

In summary, the smart community policing security system based on the OC-SVM algorithm designed in this study has significant advantages in enhancing community security and residents’ happiness. The successful implementation of this system provides a new technical solution and application model for security monitoring in smart cities, which can effectively assist security managers in security supervision and emergency responses. Future research directions will include exploring the integration of additional sensory inputs to improve the detection capabilities. The potential for adapting this system to different environments and applications, such as healthcare or industrial safety, also presents exciting avenues for expansion. In addition, the system is expected to play an important role in the future construction of smart cities.

REFERENCES

1 
Jonathan O E, Olusola A J, Bernadin T C A, et al. Impacts of crime on socio-economic development. Mediterranean Journal of Social Sciences, 2021, 12(5): 71.URL
2 
Collins R T, Lipton A J, Kanade T, et al. A system for video surveillance and monitoring. VSAM final report, 2000, (1-68): 1.URL
3 
Socha R, Kogut B. Urban video surveillance as a tool to improve security in public spaces. Sustainability, 2020, 12(15): 6210.DOI
4 
Li X, Lu R, Liang X, et al. Smart community: an internet of things application. IEEE Communications magazine, 2011, 49(11): 68-75.DOI
5 
Barrett B F D, DeWit A, Yarime M. Japanese smart cities and communities: Integrating technological and institutional innovation for Society 5.0. Smart Cities for Technological and Social Innovation. Academic Press, 2021: 73-94.DOI
6 
Yao S, Ardabili B R, Pazho A D, et al. Real-World Community-in-the-Loop Smart Video Surveillance--A Case Study at a Community College. arXiv preprint arXiv:2303.12934, 2023.DOI
7 
Shehzed A, Jalal A, Kim K. Multi-person tracking in smart surveillance system for crowd counting and normal/abnormal events detection. 2019 International conference on applied and engineering mathematics (ICAEM). IEEE, 2019: 163-168.DOI
8 
Bhati, B. S., & Rai, C. S. (2021). Intrusion detection technique using Coarse Gaussian SVM. International Journal of Grid and Utility Computing, 12(1), 27-32.DOI
9 
Bhati, B. S., & Rai, C. S. (2020). Analysis of support vector machine-based intrusion detection techniques. Arabian Journal for Science and Engineering, 45, 2371-2383.DOI
10 
Tiwari, D., & Bhati, B. S. (2021). A deep analysis and prediction of covid-19 in India: using ensemble regression approach. Artificial Intelligence and Machine Learning for COVID-19, 97-109.URL
11 
Weaver III A, Ojiambo W, Kemp J, et al. Pedestrian Walkways: Hidden Hazards Related to Common Landscaping Practices. Professional Safety, 2022, 67(07): 14-22.URL
12 
Li Y, Esmaeili B, Gheisari M, et al. Using Unmanned Aerial Systems (UAS) for Assessing and Monitoring Fall Hazard Prevention Systems in High-rise Building Projects. arXiv preprint arXiv:2209, 13137, 2022.DOI
13 
Ang G C, Low S L, How C H. Approach to falls among the elderly in the community. Singapore medical journal, 2020, 61(3): 116.DOI
14 
Vaishya R, Vaish A. Falls in older adults are serious. Indian journal of orthopedics, 2020, 54: 69-74.DOI
15 
Johnson J, Rodriguez M A, Al Snih S. Life-space mobility in the elderly: current perspectives. Clinical interventions in aging, 2020: 1665-1674.DOI
16 
Carpenter C R, Cameron A, Ganz D A, et al. Older adult falls in emergency medicine: 2019 update. Clinics in geriatric medicine, 2019, 35(2): 205-219.DOI
17 
Tanwar R, Nandal N, Zamani M, et al. Pathway of trends and technologies in fall detection: a systematic review. Healthcare. MDPI, 2022, 10(1): 172.DOI
18 
Ramanujam E, Padmavathi S. A vision-based posture monitoring system for the elderly using intelligent fall detection technique. Guide to Ambient Intelligence in the IoT Environment: Principles, Technologies and Applications, 2019: 249-269.DOI
19 
Mirmahboub B, Samavi S, et al. Automatic monocular system for human fall detection based on variations in silhouette area. IEEE transactions on bio medical engineering, 2013, 60(2):427-436.DOI
20 
Ma X, Wang H, et al. Depth-Based human fall detection via shape features and improved extreme Learning Machine. IEEE Journal of Biomedical and Health Informatics,2014,18(6):1915-1922.DOI
21 
Harrou F, Zerrouki N, Sun Y, et al. Vision-based fall detection system for improving safety of elderly people. IEEE Instrumentation and Measurement Magazine, 2017, 20(6):49-55.DOI
22 
Chen W, Jiang Z, Guo H, et al. Fall detection based on key points of human-skeleton using Open Pose. Symmetry, 2020, 12(5): 744.DOI
23 
Osokin D. Real-time 2d multi-person pose estimation on cpu: Lightweight openpose. arXiv preprint arXiv:1811.12004, 2018.DOI
24 
Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3. Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1314-1324.URL
25 
Scholkopf B, Mika S, Burges C J C, et al. Input space versus feature space in kernel-based methods. IEEE transactions on neural networks, 1999, 10(5): 1000-1017.DOI
26 
Weng M, Huang G, Da X. A new interframe difference algorithm for moving target detection. 2010 3rd international congress on image and signal processing. IEEE, 2010, 1: 285-289.DOI
27 
Weinstein R. RFID: a technical overview and its application to the enterprise. IT professional, 2005, 7(3): 27-33.DOI
28 
Ringberg H, Soule A, Rexford J, et al. Sensitivity of PCA for traffic anomaly detection. Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems. 2007: 109-120.DOI
29 
Martínez-Villaseñor, L., Ponce, H., Brieva, J., Moya-Albor, E., Núñez-Martínez, J., & Peñafort-Asturiano, C. (2019). UP-fall detection dataset: A multimodal approach. Sensors, 19(9), 1988.DOI

Author

Yanfei Gao
../../Resources/ieie/IEIESPC.2024.13.4.414/au1.png

Yanfei Gao was born in Henan, China, in 1986. From 2005 to 2009, she studied in Northwestern University and received her bachelor's degree in 2009. From 2009 to 2012, she studied in Northwestern University and received her Master's degree in 2012.Since 2012, she has been working at Zhengzhou Police University. Her research interests are included Sociology and Public Security.