Figure 1. Facial action units code nearly any anatomically possible facial expression and can be used for emotion recognition in an ambient intelligent environment.
Overview of the proposed IdenNet
Figure 2. The method is implemented in an architecture of multi-task network cascades in which the two sub-tasks, face clustering and AU detection, share a common network and own their specific CNN layers.
Local distribution in a feature space
Figure 3. The proposed method aims to learn an image representation in which feature vectors of images captured from the same subject will be close to each other but far away from other subjects’ image feature vectors, such as the blue, green, and yellow ovals presenting three bunches of features captured from three different subjects.
Identity normalization for improved AU representation
Figure 4. Although facial expressions are insignificant, they still contribute to feature vectors and form an AU-specific distribution that is highly similar to other subjects' distributions in the feature space. Because the vectors’ local structures are similar, we normalize those vectors by removing their identity-level signals so that the new feature vectors will better represent AUs contained in those images.
References
1. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Conference on Neural Information Processing Systems.
2. Li, W., Abtahi, F., & Zhu, Z. (2017). Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal Fusing. IEEE Conference on Computer Vision and Pattern Recognition.
3. Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep Learning Face Attributes in the Wild. IEEE International Conference on Computer Vision.
4. Lucey, P., Jeffrey, C. F., Prkachin, K. M., Solomon, P. E., & Matthews, I. (2011). Painful Data: The UNBC-McMaster Shoulder Pain Expression Archive Database. IEEE Conference on Automatic Face and Gesture Recognition.
5. Mohammad Mavadati, S., Mahoor, M. H., Bartlett, K., Trinh, P., & Cohn, J. F. (2013). DISFA: A Spontaneous Facial Action Intensity Database. IEEE Transactions on Affective Computing, 4(2), 151-160.
6. Wu, X., He, R., Sun, Z., & Tan, T. (2015). A Light CNN for Deep Face Representation with Noisy Labels. arXiv preprint. arXiv:1511.02683 [cs.CV]
7. Zen, G., Sangineto, E., Ricci, E., & Sebe, N. (2014). Unsupervised Domain Adaptation for Personalized Facial Emotion Recognition. International Conference on Multimodal Interaction.
8. Zhang, X., Yin, L., Cohn, J. F., Canavan, S., Reale, M., Horowitz, A., Girard, J. M. (2014). BP4D-Spontaneous: A High-resolution Spontaneous 3D Dynamic Facial Expression Database. Image and Vision Computing, 10(32), pp. 692-706.
9. Zhao, K., Chu, W.-S., & Zhang, H. (2016). Deep Region and Multi-label Learning for Facial Action Unit Detection. IEEE Conferences on Computer Vision and Pattern Recognition.
Andy Cheng-Hao Tu
Department of Computer Science and Information Engineering
Chih-Yuan Yang
Post-doctoral Researcher, Department of Computer Science and Information Engineering
Jane Yen-jen Hsu
Professor, Department of Computer Science and Information Engineering