Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
[2.1] 3D Imaging, Analysis and Applications-Springer-Verlag London (2012).pdf
Скачиваний:
12
Добавлен:
11.12.2021
Размер:
12.61 Mб
Скачать

330

A. Mian and N. Pears

8.5.8 Classifiers for 3D Face Matching

In the large array of published 3D face recognition work, all of the well-known classification schemes have been applied by various researchers. These include k-nearest neighbors (k-NN) in various subspaces, such as those derived from PCA and LDA, neural nets, Support Vector Machines (SVM), Adaboost and many others. The choice (type, complexity) of classifier employed is related to the separation of subjects (classes) within their feature space. Good face recognition performance depends on choosing a classifier that fits the separation of the training data well without overfitting and hence generalizes well to unseen 3D face scans, either within testing or in a live operational system.

Pattern classification is a huge subject in its own right and we can only detail a selection of the possible techniques in this chapter. We refer the reader to the many excellent texts on the subject, for example [10, 28].

8.6 ICP-Based 3D Face Recognition

In this section and the following three sections, we will present a set of wellestablished approaches to 3D face recognition, with a more tutorial style of presentation. The aim is to give the reader a solid grounding before going on to more modern and more advanced techniques that have better performance. We will highlight the strengths and limitations of each technique and present clear implementation details.

One of the earliest algorithms employed for matching surfaces is the iterative closest points (ICP) algorithm [9], which aims to iteratively minimize the mean square error between two point sets. In brief, ICP has three basic operations is its iteration loop, as follows.

1.Find pairs of closest points between two point clouds (i.e. probe and gallery face scans).

2.Use these putative correspondences to determine a rigid 6-DOF Euclidean transformation that moves one point cloud closer to the other.

3.Apply the rigid transformation to the appropriate point cloud.

The procedure is repeated until the change in mean square error associated with the correspondences falls below some threshold, or the maximum number of iterations is reached. In the context of 3D face recognition, the remaining mean square error then can be used as a matching metric.

Due to its simplicity, ICP has been popular for rigid surface registration and 3D object recognition and many variants and extensions of ICP have been proposed in the literature [77]. Readers interested in a detailed description of ICP variants are referred to Chap. 6 of this book, which discusses ICP extensively in the context of surface registration.

Here, we will give a presentation in the context of 3D face recognition. Surface registration and matching are similar procedures except that, in the latter case, we

8 3D Face Recognition

331

are only interested in the final registration error between the two surfaces. A fuller and more formal outline of ICP is presented in the following subsection.

8.6.1 ICP Outline

Let pj = [xj yj zj ]T where j = 1 . . . M be a zero-mean set of 3D points of a probe face scan and let gi = [xi yi zi ]T where i = 1 . . . N be a zero-mean set of 3D points of a gallery face scan. Since both point clouds are zero mean, they have an implicit coarse translational alignment. An initial coarse orientation alignment may also be required to avoid convergence to a local minimum but, for now, we will assume that it is not required and we will discuss this issue later.

We can stack the points in M × 3 and N × 3 data matrices to give:

 

p1T

g1T

 

.

 

.

 

(8.1)

.

,

.

.

P = .

G = .

pMT

 

gNT

 

 

Let F be a function that, for each point in P , finds the nearest point in G:

 

(c, d) = F (P, G),

 

(8.2)

where c and d are vectors of size M each, such that c and d contain, respectively, the index number and distance of the j th point of P to its nearest point in G. This is called the closest points step where the pairs of closest points within the two point clouds are assigned as tentative correspondences. (We use the word tentative because they are not accurate correspondences until ICP has converged to the correct global minimum.) This is the most critical and computationally the most expensive step of the algorithm. It is critical because highly inaccurate tentative correspondences could, in the worst case, cause ICP convergence to a local minimum. Even if this is not the case, they will affect the accuracy of the final registration and hence the matching metric between the two point clouds. Typically, the following filters are applied to minimize these effects:

1.Tentative correspondences whose distance is greater than a threshold are removed. The threshold is usually chosen as a multiple of the point cloud resolution which is initially set relatively high, depending on the quality of the initial registration. It is then gradually reduced through the iterations until it is equal to the resolution.

2.Tentative correspondences whose surface normals have a mutual angle above a threshold are removed. The threshold is chosen based on the accuracy of the normals and the quality of the initial registration. Also, it is possible to reduce this threshold through the iterations.

3.Tentative correspondences at the boundary points are avoided. This step is useful when the two point clouds only partially overlap, as occurs with partial views of the probe face.

332

A. Mian and N. Pears

4.Sometimes additional information such as texture is also used to remove poor quality tentative correspondences.

To speed up the search for nearest points within the gallery scan, the points in each gallery scan are arranged in a k-d tree (see Chap. 4) and these structures are precomputed offline. A further speed up can be achieved by storing the index of the nearest neighbor from the centers of a set of voxels in a voxelized space around the gallery scan [91]. Again this can be precomputed offline. (Note that the ICP technique based on Levenberg-Marquardt minimization [34], described in Chap. 6,

also uses a pre-computed distance transform.)

Let pk = [xk yk zk ]T and gk = [xk yk zk ]T (where k = 1 . . . m) be the remaining m corresponding pairs of points of the probe and gallery face respectively. The next step is to find the rotation matrix R and translation vector t that minimizes the mean square error (MSE) between these tentative correspondences. The error to be minimized is given by:

 

1

m

 

 

 

 

 

 

 

 

 

e

 

 

Rpk

+

t

gk

 

2.

(8.3)

 

 

 

= m k=1

 

 

 

 

 

The unknowns (R and t) in Eq. (8.3) can be calculated using a number of approaches including quaternion methods and Singular Value Decomposition (SVD). We will give details of the widely-used SVD approach [7]. The means of pk and gk are given by

 

 

 

 

 

1

 

m

 

 

 

 

 

p

 

 

pk

(8.4)

 

 

 

 

 

 

 

 

 

 

¯

= m k=1

 

 

 

 

 

 

 

1

 

m

 

 

 

 

 

g

 

 

gk

(8.5)

 

 

 

 

 

 

 

 

 

 

¯

= m k=1

 

 

and the cross-covariance matrix C is given by the mean of outer products:

 

 

1

m

 

 

 

 

 

 

 

C

(pk

p)(gk

g)T .

(8.6)

 

 

 

= m k=1

 

¯

− ¯

 

Equivalently, this can be expressed as:

 

 

 

 

 

 

 

 

 

 

 

1

 

T

 

 

 

 

 

C =

 

PmGm,

(8.7)

 

 

 

m

where Pm, Gm are the zero-centered data matrices in Eq. (8.1) trimmed to m × 3 matrices of m tentative correspondences. Performing the SVD of C gives us:

USVT = C,

(8.8)

where U and V are two orthogonal matrices and S is a diagonal matrix of singular values. The rotation matrix R can be calculated from the orthogonal matrices as [7]:

R = VUT .

(8.9)

8 3D Face Recognition

333

Note that if this is indeed a rotation matrix, then det(R) = +1. However, in principle, it is possible to obtain det(R) = −1, which corresponds to a reflection. This degeneracy is not what we want but, according to Arun et al. [7] and in our own experience, it usually does not occur. Once the rotation matrix has been computed, the translation vector t can be calculated as:

t = g¯ − Rp¯ .

(8.10)

Then the original point cloud of the probe face is transformed as:

 

P T = RPT + tJ1,n,

(8.11)

where J1,n is a 1 × n matrix of ones.

Subsequently, tentative correspondences are established again between the transformed probe face and the gallery face (Eq. (8.2)). This process is iterated until e approaches a minimum value, detected by its change over one iteration falling below some threshold. If the initial coarse alignment is within the convergence zone of the global minimum MSE, the final value of e is a good measure of the similarity between two face scans, where smaller values of e mean that the faces are more similar. In identification scenarios, the probe face is matched with every gallery face and the identity of the one with the minimum value of e is declared as the probe’s identity. If two gallery identities have very similar low e values, sometimes the number of correspondences, m, in the final iteration can also be used to make a better judgement. Alternatively, a probe can be verified against a claimed gallery identity if the computed value of e falls below some appropriate threshold.

8.6.2 A Critical Discussion of ICP

The main advantage of the ICP algorithm is that it iteratively corrects registration errors as it matches two faces, provided that the initial alignment of the surfaces is within the zone of convergence of the global minimum. If the initial coarse registration is not good enough ICP can converge to a local minimum and hence the algorithm fails. Iterative realignment comes at a heavy computational cost, particularly in the establishment of closest points. For this step, a basic search has complexity O(MN ), where M and N are the number of points in the probe and gallery scans respectively. This is reduced to O(M log N ) when using k-d tree structures for the closest points search in the gallery scans. It is important to remember that the average time per gallery scan match is also scaled by the average number of iterations per match which is often different for different ICP variants.

Another disadvantage of ICP is that it does not extract features from the face, thus ruling out the possibilities of training classifiers on multiple instances of a face, and ruling out indexing of features from gallery faces for faster matching or feature-level fusion. Thus, the probe must be matched to the complete gallery, thereby making the recognition time linear to the gallery size. This means that, in relation to scan resolution and gallery size, NG, a single ICP-based probe to gallery match has overall complexity O(NGM log N ).