1 Introduction

Biometric recognition verifies a person’s identity automatically based on his or her anatomical and behavioral characteristics [1, 2]. Compared with other biometric characteristics such as fingerprints [3], palms [4], irises [5], voice, gait [6], and ears [7], faces have many distinct advantages [8]. Even at a long distance, face features can be extracted from camera images, which provide a convenient and non-intrusive way to monitor remotely. Moreover, the face also has a richer structure and larger area than other body parts so that the face region is not easily occluded. Hence, face recognition has become an indispensable biological authentication method and attracted much attention over the past decade in various domains.

Compared with two-dimensional (2D) face recognition, 3D face recognition achieves more robust performance under lighting and pose variations. Utilizing 3D facial information appears to be a promising way to improve face recognition accuracy. However the challenge hindering the real deployment of a 3D face recognition system is that the acquisition of 3D face data imposes many restrictions on the scanning equipment. The face must remain completely still for at least several seconds during scanning, which undermines its friendliness, convenience, and non-intrusiveness. Therefore, 2D face recognition is an effective and promising solution for personal identification applications. Under constrained conditions, traditional face recognition algorithms achieve 98 % recognition rate or even higher when using high-resolution frontal face images [9, 10].

However, for wider applications of personal identification ranging from access control, border control, and forensic investigation to ubiquitous monitoring, 2D face images are usually captured from a long distance in an unconstrained environment [11]. Hence, the captured images usually contain pose and lighting variations that severely degrade the performance of current 2D face recognition systems. Moreover, the varied images of face detection need an appropriate method for use and storage. Current challenges include face detection from remote sensing images, accurate recognition feature extraction, image storage, and intuitive visualization of large-scale person localization. Therefore, we aim to move toward meeting the increasingly urgent security requirements such as ubiquitous monitoring and cloud computation for personal recognition [12].

For ubiquitous monitoring, this paper proposes a cloud-based monitoring system that consists of face detection and recognition, cloud storage, and data visualization modules. The face detection and recognition module performs the face image capture, face detection, and face recognition processes in a monitor client. The obtained data, including the detected face number, location, personal identification, and other information, are then sent to cloud storage, and then rendered in a global map such as Google Maps or OpenStreetMap (OSM) for ubiquitous monitoring. The proposed data visualization module operates as the ubiquitous monitoring interface and visualizes the global distribution of the rendered data such as the face detection and recognition results via a client Web browser.

Most face images captured from a long distance in unconstrained environments usually come with complex background and contain multiple faces, which lead to poor visual and recognition results [13]. To satisfy the recognition requirements of the proposed system, face detection is necessary before faces can be recognized. To solve these problems, this paper deals with multi-face detection within complicated backgrounds. Face detection consists of detecting, locating, and segmenting the faces present in the image. Existing face detection methodologies include feature-based, statistics-based, template-based, and skin-color segmentation methods. In this paper, a modified AdaBoost algorithm is used to conduct the face detection [14].

Face recognition approaches in 2D images are often implemented using holistic and local features. The holistic features are analyzed and extracted by several typical methods, such as principle component analysis (PCA), linear discriminant analysis (LDA) [15], and independent component analysis (ICA) [16]. They have been proved to be effective for recognition in large databases. The local feature extraction approaches mainly include local binary pattern (LBP) [17], Gabor [18], and SIFT [19] methods and their modified models. In addition, they have proved to be more robust to slight lighting and pose variations, especially the Gabor features. However, most Gabor-based features are defined as a concatenation of the magnitude coefficients achieved at different scales and orientations. Thus, the Gabor model always generates redundant features of extremely high dimensions and lacks orientation and precise texture information [20], which is conducive to the robustness of the recognition feature. Moreover, the center-symmetric local binary pattern (CS-LBP) features not only keep crucial local features like LBP, but also have a lower number of feature dimensions. Hence, we proposed a novel feature extraction approach based on multi-scale Gabor features combined with CS-LBP. Compared with LBP and Gabor-based features, the proposed algorithm is able to precisely extract orientation and structural information and avoids redundant information in the augmented Gabor features.

The rest of this paper is organized as follows. Section 2 describes the automated multi-face detection approach and specifies the proposed novel feature extraction approach using Gabor and CS-LBP features. Section 3 describes the architecture of the proposed cloud-based monitoring system. Section 4 conducts sufficient face recognition experiments to evaluate the proposed methods. Finally, the concluding remarks are drawn in Sect. 5.

2 Face detection and recognition algorithm

This section describes the face detection method and specifies the proposed novel center-symmetric local Gabor binary pattern (CS-LGBP) feature extraction algorithm.

2.1 Face detection

Because most face images are captured in unconstrained environments from a long distance, multi-face detection under complex background conditions needs to be conducted to acquire pure faces for the subsequent feature extraction. In practice, multi-face detection has to deal with the challenging problems caused by face size and pose variation as well as lighting and background changes. These challenges usually invalidate the detection algorithm and cause a high error detection rate. In addition, multi-face detection usually leads to a considerable increase in computation compared with single face detection.

Fig. 1
figure 1

The proposed multi-face detection procedure. a Detected face-like regions. b Rough face detection result. c Spatial distribution of facial features. d Refined result

As shown in Fig. 1, the multi-face detection approach integrates a modified AdaBoost algorithm combined with a cascade structure with post-processing based on the face component. As shown in Fig. 1a, all possible candidate faces are detected by applying the face detector of the modified AdaBoost algorithm using a low threshold. A rough face detection result is shown as Fig. 1b, which contains a non-face area. Considering that the four important components, left eye, right eye, mouth and nose, have a strong spatial correlation, as shown in Fig. 1c, a refining process verifies the spatial correlation of these four components on each candidate face to obtain accurate results without the non-face region, as shown in Fig. 1d.

2.2 Face recognition based on CS-LGBP

2.2.1 Multi-scale Gabor feature

Gabor wavelets have been widely studied in pattern recognition for extracting local spatial features of different directions and scales, because they represent the properties of spatial orientation, localization, and frequency selectivity. Two-dimensional Gabor kernels are usually defined as follows:

$$\begin{aligned} \psi _{m,n} =\frac{||k_{m,n}||^{2}}{\sigma ^{2}}\hbox {e}^{\left( {-\frac{||k_{m,n} ||^{2}||z||^{2}}{2\sigma ^{2}}} \right) }\left( {\hbox {e}^{\left( {\hbox {i}k_{m,n} z} \right) }-\hbox {e}^{(-\sigma ^{2}/2)}} \right) . \end{aligned}$$
(1)

Here, m represents the orientation of the Gabor filters, n is defined as the scale of the Gabor filters, \(z=f(x, y)\) represents the pixel count of an image, and \({\vert }{\vert }*{\vert }{\vert }\) represents the norm domain. In our application, the variables are set to \(k_{m,n} =k_n e^{i\varphi _m }\), \(k_n =\hbox {k}_\mathrm{max} /\mu ^{n}\), \(\hbox {k}_\mathrm{max} =\pi /2\), and \(\varphi _{m}=\uppi m/8\).

The Gabor feature of the face image, called the Gabor image, is expressed as the convolution of the Gabor kernels with the face image, expressed as follows:

$$\begin{aligned} \hbox {G}_{m,n} \left( z \right) =f\left( z \right) ^{*}\psi _{m,n} \left( z \right) . \end{aligned}$$
(2)

Here, \(\psi _{m, n }(z)\) denotes a Gabor kernel function and \(f(z)=f(x, y)\) represents the input face image. Figure 2 shows the Gabor wavelets of the face image, which contain a series of Gabor kernels. Each image pixel is represented for a real part and an imaginary part of a Gabor kernel. Figure 2a–c, respectively, shows the real part, imaginary part, and amplitude of the Gabor kernels at three scales and four orientations.

Fig. 2
figure 2

Gabor wavelets. a Real part of the Gabor kernels. b Imaginary part of the Gabor kernels. c Amplitude of the Gabor kernels

2.2.2 CS-LBP feature

LBP features, proposed by Heikkila, have been proved to be effective at texture feature description [21]. However, texture features extracted by LBP are always too nuanced to be robust in flat areas of images. An illustration of the original LBP and CS-LBP patterns is shown in Fig. 3. CS-LBP encodes the change in the image in four different directions using the center-symmetric principle, which compensates for the disadvantages of LBP.

Fig. 3
figure 3

Original LBP and CS-LBP features

CS-LBP features can be described by the following two equations:

$$\begin{aligned} \hbox {CSLBP}_{\textit{R,N,T}} \left( {\textit{X,Y}} \right)= & {} \mathop \sum \limits _{j=0}^{\left( {N/2} \right) -1} \hbox {S}\left( {n_j -n_{j+\left( {N/2} \right) } } \right) 2^{j} \end{aligned}$$
(3)
$$\begin{aligned} s(x)= & {} \left\{ {{\begin{array}{ll} 1&{} \quad {x>T} \\ 0&{} \quad {\mathrm{otherwise}} \\ \end{array} }} \right. \end{aligned}$$
(4)

Here, \(n_{j }\) and \(n_{j+(N/2) }\) correspond to the gray value of the center-symmetric area pixels and N represents the pixel numbers on a circle of radius R. Compared with traditional LBP, CS-LBP has lower dimensions and lower computational complexity. It is robust to noise interference as well. Hence, CS-LBP is used for feature extraction to preserve more useful information of the image and reduce the impact of noise such as pose variation.

2.2.3 CS-LGBP feature extraction

Gabor features have been proven to be robust to slight lighting and pose variations as well as other factors that influence face recognition. At the same time, Gabor features have a high dimensionality because they are usually defined as a concatenation of the magnitude coefficients obtained at different orientations and scales. To reduce the accompanying computational complexity, dimension reduction approaches such as PCA, LDA, and KFDA are usually applied as a post-processing step. Moreover, these dimension reduction approaches definitely cause useful visual perception information of the Gabor features to be lost. Besides, the Gabor features lack precise texture information, which is crucial for the robustness of a face recognition feature.

A novel feature extraction method called as center-symmetric local gabor binary pattern (CS-LGBP) has been proposed. This method is based on combining improved multi-scale Gabor features and CS-LBP. Figure 4 illustrates the algorithm used to extract the CS-LGBP features of a face image.

Fig. 4
figure 4

CS-LGBP feature extraction

First, the input face image is convolved with the Gabor kernels functions to obtain the magnitude information of different orientations and scales. In our application, we chose four different orientations and three different scales. Thus, the parameter of the Gabor kernel function \(\psi _{m,n} \left( z \right) \) is defined as \(n\in \{1, 2, 3\}\) to indicate the three different scales and \(m\in \{1, 2, 3, 4\}\) to indicate the four different orientations.

Next, the four different orientations at the same scale are accumulated to form a new scale feature Gabor scale(z, n) to reduce redundancy.

$$\begin{aligned} \hbox {Gaborscale}\left( {z,n} \right) =\mathop \sum \limits _m S\left( {z,m,n} \right) . \end{aligned}$$
(5)

The CS-LGBP features of each scale are then computed using the CS-LBP descriptor from the obtained Gabor scale images, formulated as:

$$\begin{aligned} \hbox {CSLGBP}(z, n) = \hbox {CSLBP}[\hbox {Gaborscale}(z, n)]. \end{aligned}$$
(6)

Finally, the face in the image is recognized by comparing its extracted features with a face database using a Chi-square distance classifier.

3 Cloud-based monitoring system

This section describes the cloud-based monitoring system, shown in Fig. 5a, using the proposed face recognition method based on CS-LGBP features. The proposed system consists of three modules: face detection, face recognition, and monitoring visualization. In the face detection module, the images are captured by a camera that is installed at a distributed client. After the faces in the images are detected by the client computer, the data of the computation result are uploaded to the cloud storage. These data include the client IP address, face count, and image ID. Using the cloud storage, the system provides a data visualization interface for intuitively displaying a global person distribution status. In the face recognition module, the system provides an HTTP interface for an administrator to input a specific face image that needs to be localized. The submitted image is transmitted to all the clients through the cloud. The distributed clients use the proposed face recognition algorithm to compute whether the person in the face image is sensed by their cameras. If so, the recognition result is delivered to the cloud.

Fig. 5
figure 5

The proposed cloud-based monitoring system. a Architecture of the cloud-based monitoring system. b Data flow of the data visualization module

In the data visualization module shown in Fig. 5b, we implemented data visualization technology on a Web browser that renders the geographic location to provide a global distribution of the face detection and recognition results. In the cloud storage, the received client IP address is translated into geographic information. Using the converted geographic datasets, the person counts computed by the distributed client are located on the global map, which is performed by an OSM in this system. After the person distribution information is visualized in the OSM, the administrator browses the face detection result. Meanwhile, the administrator asks the system to find a certain person by sending a face image to the distributed clients. The face recognition result is also rendered on the Web-based geographic map to mark the location of the person of interest.

4 Experiments and results

In this section, we analyze the performance of the proposed face detection and recognition methods as well as the cloud-based face monitoring system. The experiments were implemented on an Intel i3-3240 CPU 3.4 GHz computer with a 4 GB RAM and a Windows 7 operating system. The experiments were implemented using MATLAB 2015b and MyEclipse Professional 2014.

4.1 Face detection results

To demonstrate the effectiveness of the detection algorithm, we tested 122 images of a multi-face dataset and 120 images of a single-face dataset using our proposed methods. The background in these images was complex, with posture variation and different kinds of people. The multi-face and single face detection results are shown in Fig. 6a, b, respectively. The face detection algorithm is able to track and detect frontal or pose-varied single and multiple faces against complex backgrounds.

Fig. 6
figure 6

Results of face detection. a Examples of images of the multi-face dataset. b Examples of images of the single face dataset

Table 1 Result of the detection algorithm

Table 1 shows the detection rate and false acceptance rate of the detection algorithm on both datasets. The detection rates were 91.23 and 96.67 % on the multi-face and single-face datasets, respectively. The false acceptance rates were 7.60 and 4.47 % on the multi-face and single-face datasets, respectively. The face detection in the multi-face images suffers from more interference, such as more complex lighting conditions and background, smaller image size, and greater numbers of similar non-face regions. Therefore, the accuracy of the face detection in multi-face images was lower than that in the single face images.

4.2 Face recognition results

A number of face recognition experiments were conducted on the ORL face database to evaluate the proposed CS-LGBP feature. The ORL face database consists of 40 persons, with 10 different images per person in various poses and with different expressions. The approximate frontal face image of each person was selected to form the test dataset (of 40 images). The remaining nine images of the same person were used to form the training dataset so that it consisted of 360 images. Figure 7 shows an example of ten source images from the ORL database.

Fig. 7
figure 7

Example images of the ORL database

Figure 8 compares the recognition rates of the proposed feature extraction method CS-LGBP and three other methods including traditional LBP, CSLBP, and Scale Gabor features. The recognition rate of rank 1 using the proposed method is 100 %, which is at least 5 % higher than those using the other three typical algorithms. Hence, the proposed method clearly performs better than the other three typical methods.

Fig. 8
figure 8

Recognition rate comparison on the ORL database

To further illustrate the effectiveness of CS-LGBP, more experiments were conducted on the Yale-B database. The Yale-B face database was captured from 576 viewing conditions under 64 different illumination conditions and nine poses. It is usually divided into five subsets according to the angle between lighting and camera position: subset 1 \((0{^{\circ }}\)\(12{^{\circ }})\), subset 2 \((13{^{\circ }}\)\(25{^{\circ }})\), subset 3 \((26{^{\circ }}\)\(50{^{\circ }})\), subset 4 \((51{^{\circ }}\)\(77{^{\circ }})\), and subset 5 (above \(78{^{\circ }})\). Specifically, we randomly selected one image from each subset for training, and the rest of the samples were used for testing the current subset. Subsets 1–5, with different illumination conditions, are shown in Fig. 9.

Fig. 9
figure 9

Yale-B database for different lighting conditions. a Subset 1. b Subset 2. c Subset 3. d. Subset 4. e Subset 5

Figure 10a–d shows our experimental results on the Yale-B database, compared with those of the LBP, CS-LBP, and Scale Gabor features separately. The proposed CS-LGBP method achieved 100 % rank 1 recognition rate in subsets 1, 2, and 3, which is 2.5–12.5 % higher than the results of the other three algorithms. It is obvious that the proposed CS-LGBP method performs remarkably better than CS-LBP and Scale Gabor algorithms, even for subsets 4 and 5, which are severely influenced by lighting variance. Compared with the other recognition methods, the proposed method is more robust to illumination variations and has lower complexity.

Fig. 10
figure 10

Recognition rate comparison of different methods on the Yale-B database. a Recognition rate comparison on subsets 1 and 2. b Recognition rate comparison on subset 3. c Recognition rate comparison on subset 4. d Recognition rate comparison on subset 5

The Yale database contains 165 frontal face grayscale images from 15 individuals. There are 11 images per subject captured under various lighting condition and expression such as surprised, glasses, no-glasses, sad, and sleepy. The images of the Yale database in Fig. 11 have a \(320 \times 243\) resolution.

Fig. 11
figure 11

Example images of the Yale database

Figure 12 compares the performance comparison of our approach with that of the LBP, CS-LBP, and Scale-Gabor methods. Our proposed CS-LGBP descriptor improved recognition rate by around 20 % with respect to the original LBP and achieved around a 10 % higher recognition rate than the CS-LBP and Scale Gabor methods.

Fig. 12
figure 12

Recognition rate comparison on the Yale database

4.3 Data visualization results

To illustrate our proposed data visualization, we implemented the proposed face detection and recognition algorithm to access data information. The simulation environment was implemented using jdk1.8.0_60, MySQL Server 5.5, MyEclipse Professional 2014, and Tomcat7x. As shown in Fig. 13a, OSM was imported in the MyEclipse environment as the geographic background. Moreover, the programming interface between Java and MATLAB was implemented by a Java Native Interface. This way, the Java-based Web browser was able to call the proposed face detection and recognition algorithms executed by MATLAB.

In the cloud-based monitoring system, the face detection and recognition algorithms were installed at the distributed clients. The face detection results were uploaded to the cloud storage with the client IP address, face count, and image ID. After the cloud storage was updated, the monitoring system presented the person count detected at each client with round shapes. If more people were monitored at a client, a larger shape was rendered at its location on the map, as shown in Fig. 13b.

When the system manager was interested in a face image, he or she could send it to all the clients through the cloud. The distributed clients used the proposed face recognition algorithm to determine whether this face image was sensed by their cameras. If so, the recognition results were delivered to the cloud. The geographic locations of the requested face were rendered on the map to display where that person was located, as shown in Fig. 13c.

Fig. 13
figure 13

Ubiquitous monitoring visualization. a OSM background image. b Face detection results. c Face recognition results

5 Conclusions

This paper proposed a cloud-based ubiquitous monitoring system via face recognition. The modified AdaBoost algorithm combined with a cascade structure and face component was implemented to achieve multi-face detection on complex backgrounds. A novel feature extraction method CS-LGBP was proposed by combining the scale gabor feature with CS-LBP. Compared with the LBP, CS-LBP, and scale gabor features, the proposed CS-LGBP method achieved a significantly improved recognition performance on the ORL, Yale-B, and Yale databases. Both the face detection and recognition algorithms were verified to be suitable for ubiquitous monitoring. In addition, we proposed a monitoring system to visualize the numbers of people and surveillance results on a Web browser. Our proposed system provides a feasible way to combat increasingly urgent security requirements. In future, we aim to improve the recognition speed of the CS-LGBP algorithm to promote the effective development of online real-time recognition.