论文 DeepID 系列


DeepID 已演进数次。

  • DeepID1: Deep Learning Face Representation from Predicting 10,000 Classes
  • DeepID2: deep learning face representation by joint identification-verification
  • DeepId2+: Deeply learned face representations are sparse, selective, and robust
  • DeepID3: Face Recognition with Very Deep Neural Networks


DeepID 算法优化的主要手段就是增大数据集。

Whole Process
在上述的流程中,DeepID可以换为Hog,LBP等传统特征提取算法。Classifier可以是SVM,Joint Bayes,LR,NN等任意的machine learning分类算法。



face patchs ---- ConvNet ----> high-level feature of last hidden layer
features ---- joint bayesian or neural network ----> face verification

DeepID features -> Last hidden layer of each ConvNet (160d)
200+ ConvNets (each ConvNets are corresponding to one patch)
Feature Extraction Process

Deep ConvNets

Convnet Structure

注意倒数第二层,DeepID feature 那一层,与 Convolutional layer 4 和 Max-pooling layer 3 相连,是为了减少信息损失,既考虑局部的特征,又考虑全局的特征。

The last hidden layer of DeepID is fully connected to both the third and fourth convolutional layers (after max- pooling) such that it sees multi-scale features. This is critical to feature learning because after successive down-sampling along the cascade, the fourth convolutional layer contains too few neurons and becomes the bottleneck for information propagation.

Feature Extraction

Face Regions
人脸图片的预处理方式 aligned and patch。

  • Faces are globally aligned by similarity transformation according to the two eye centers and the mid-point of the two mouth corners.
  • Features are extracted from 60 face patches with ten regions, three scales, and RGB or gray channels.

Face Verification

Joint Bayesian

Joint Bayesian for Face Verification

Neural Network

Neural Network for Face Verification
Input layer: 60 groups, each has [2 (a patch pair) * 160 (d features of a convnet) * 2 (patch and its horizontally flipped counterpart)]
Features in the same group are highly correlated.


  • 使用multi-scale patches的convnet比只使用一个只有整张人脸的patch的效果要好。
  • DeepID自身的分类错误率在40%到60%之间震荡,虽然较高,但DeepID是用来学特征的,并不需要要关注自身分类错误率。
  • 使用DeepID神经网络的最后一层softmax层作为特征表示,效果很差。
  • 随着DeepID的训练集人数的增长,DeepID本身的分类正确率和LFW的验证正确率都在增加。


face identification signal + face verification signal

DeepID1的卷积神经网络最后一层softmax使用的是Logistic Regression作为最终的目标函数,也就是识别信号 face identification signal;
但在DeepID2中,目标函数上添加了验证信号 face verification signal,两个信号使用加权的方式进行了组合。

Identification-Verification Guided Deep Feature Learning

The ConvNet structure for DeepID2 extraction
f=Conv(x,θc)f = Conv(x, \theta_c)
x is input face patch, f is DeepID2 vector, θc is convnet parameters to be learned.

Two supervisory signals:

  • face identification signal 识别信号
    Classifies each face image into one of n different identities.
    Formular of Face Identification Signal
    f is the DeepID2 vector, t is the target class, θ is the softmax layer parameters, p is the target probability distribution, p hat is the predicted probability distribution.
  • face verification signal 验证信号
    encourages DeepID2 extracted from faces of the same identity to be similar.
    Regularize DeepID2 to reduce the intra-personal variations. Can be L1/L2 norm and cosine similarity.
    Formular of Face Verification Signal
    f1 and f2 are DeepID2 vectors of two images; y=1 means same identity, minimize L2; y=-1 means different identity, distance larger than margin m.

The DeepID2 learning algorithm





Patches selected for feature extraction

Variance Compare


  • DeepID的训练集人数越多,最后的验证率越高。
  • 对不同的验证信号,包括L1,L2,cosin等分别进行了实验,发现L2 Norm最好。


Compared with the DeepID2, DeepID2+ added the supervisory signal in the early layers and increases the dimension of hidden repsresentation.
In the DeepID2+,author discover some nice property of neural network: sparsity, selectivity and robustness.

  • Sparsity
  • Selectivity
  • Robustness

DeepID2+ Nets

和 DeepID2 相比有三点改动。

  • DeepID 层从160维提高到512维。
  • 训练集将 CelebFaces+ 和 WDRef 数据集进行了融合,共有12000人,290000张图片。
  • 将 DeepID 层不仅和第四层和第三层的 max-pooling 层连接,还连接了第一层和第二层的 max-pooling层。

DeepID+ Neural Net
DeepID+ Neural Net
joint face identification-verification
supervisory signals

Moderate Sparsity of Neural Activations

  • Sparsity for each image
    一张 image 差不多激活半数的 neuron,使不同身份的 face 更可区分。
  • Sparsity for each neuron
    一个 neuron 差不多被半数的 image 激活,使其有更大区分度。


Activation patterns are more important than precise activation values. 所以使用阈值对最后输出的512维向量进行了二值化处理,发现效果降低有限。
Comparison of the Original and the Binary

Selectiveness on Identities and Attributes


Robustness of DeepID+ Features

Occluded Images

Occulusion Ratio
Occulusion Block


Explore 2 very deep neural network architectures.
Stacked convolution in VGG net
Inception layers in GoogLeNet

DeepID3 Net

DeepID3 Net
DeepID3 比起 DeepID2+ 并没有明显的优势。

[1] https://www.researchgate.net/publication/283749931_Deep_Learning_Face_Representation_from_Predicting_10000_Classes
[2] https://arxiv.org/abs/1406.4773
[3] https://arxiv.org/abs/1412.1265
[4] https://arxiv.org/abs/1502.00873
[5] https://blog.csdn.net/stdcoutzyx/article/details/42091205