This article has Open Peer Review reports available.
3D facial landmarks: Inter-operator variability of manual annotation
- Jens Fagertun1,
- Stine Harder1,
- Anders Rosengren2,
- Christian Moeller2,
- Thomas Werge2,
- Rasmus R Paulsen†1 and
- Thomas F Hansen†2Email author
© Fagertun et al.; licensee BioMed Central Ltd. 2014
Received: 7 May 2014
Accepted: 29 September 2014
Published: 11 October 2014
Manual annotation of landmarks is a known source of variance, which exist in all fields of medical imaging, influencing the accuracy and interpretation of the results. However, the variability of human facial landmarks is only sparsely addressed in the current literature as opposed to e.g. the research fields of orthodontics and cephalometrics. We present a full facial 3D annotation procedure and a sparse set of manually annotated landmarks, in effort to reduce operator time and minimize the variance.
Facial scans from 36 voluntary unrelated blood donors from the Danish Blood Donor Study was randomly chosen. Six operators twice manually annotated 73 anatomical and pseudo-landmarks, using a three-step scheme producing a dense point correspondence map. We analyzed both the intra- and inter-operator variability, using mixed-model ANOVA. We then compared four sparse sets of landmarks in order to construct a dense correspondence map of the 3D scans with a minimum point variance.
The anatomical landmarks of the eye were associated with the lowest variance, particularly the center of the pupils. Whereas points of the jaw and eyebrows have the highest variation. We see marginal variability in regards to intra-operator and portraits. Using a sparse set of landmarks (n=14), that capture the whole face, the dense point mean variance was reduced from 1.92 to 0.54 mm.
The inter-operator variability was primarily associated with particular landmarks, where more leniently landmarks had the highest variability. The variables embedded in the portray and the reliability of a trained operator did only have marginal influence on the variability. Further, using 14 of the annotated landmarks we were able to reduced the variability and create a dense correspondences mesh to capture all facial features.
The research field of facial morphology has advanced rapidly over the last ten years, with the introduction of better, faster, and cheaper systems for facial 3D scanning. The systems have enabled more accurate and objective methods of capturing differences in facial morphology. Analysis of facial morphology is based on facial distances i.e. the distance between facial landmarks [1–3] or on statistical models [1, 4]. One widely used statistical method, uses Principal Component Analysis (PCA) to assess the population variance and is referred to as a Point Distribution Model (PDM) . Both methods rely on manually annotated landmarks that are used directly or as a basis for constructing a dense point correspondence [1, 4–6]. This means that both direct distances and statistically based methods are prone to human operator annotation errors. There exist several surface-based automatic registration methods for point correspondence, still for manual annotation, at least on a sparse set of landmarks, is widely used when facial analysis is used in clinical applications. Understanding the variance (noise) introduced by manually annotated landmarks is important for knowing the statistical power of such studies, i.e. the interpretation and application, and aiding future study design in this field.
The reliability of facial landmark annotation has not been as thoroughly studied as landmark annotations in other fields, e.g. cephalometry . For example, Buschang et al. assessed the inter-operator annotation variability of anatomical landmarks on the skull for use in orthodontics and cephalometric analysis, using ANOVA analysis. Similarly, recent have also addressed the reliability of cranial-anatomical landmarks [9–11]. By Larsen et al. the inter-operator annotation variance was included in the PCA when analyzing cranial growth . Here the landmark variance was addressed using a weighting scheme giving most weight to annotation landmarks with low variance.
In this study, we exclusively work with human facial features. We address the reliability of facial feature annotation with respect to inter/intra operators and samples (portraits). To the best of our knowledge, this is the first report on variability of face morphology with respect to the measurements of the face surface, per se. In effort to reduce annotation variability i.e. reduce the signal to noise ratio, we suggest a sub-set of landmarks that yields a superior dense-point correspondence compared to the original landmarks, based on the reliability of facial landmarks.
Sample and image data
The data used in this work consists of 36 facial scans of healthy unrelated subjects, recruited among volunteers in the Danish Blood Donor Study (DBDS) . The 36 subjects were chosen by simple random sampling from our database consisting of facial scans from 641 subjects, having 50% males. The facial scans were captured using a Canfield Vectra M3 Imaging System, at the DBDS facility at Glostrup University Hospital. Each 3D facial scan contains about 70,000 to 100,000 3D points and has shape information (x-, y-, z-point positions) and texture information (red, green, blue intensities) for every 3D point. The study was approved ethically by the Danish Scientific Committee and was reported to the Danish Data Protection Agency. All the patients have given written informed consent prior to inclusion in the project. The facial image used in figures, is a statistically average face and does picture any participant.
Description of annotation points
Correction by human operator of the pre-annotated landmarks (see section “Manual annotation tools and standard”)
- 3.Post processing (see Section “Dense point correspondence”)
Creation of dense point correspondence meshes.
Data pre-processing by automatic annotations
A 2D image was created by orthographic projection of the 3D scan. The face and eyes are automatically detected by a Viola-Jones Rapid Object Detection [17, 18], and serve as a starting point for an AAM search. When the AAM converges, the 73 2D annotation points (Figure 1) can be extracted. These annotation points are then transformed from the 2D image to the 3D scan. The 2D to 3D transformation is likely to fail in high curvature areas like the jaw as points from 2D images are wrongly projected onto the neck. To compensate for this limitation, an ASM search, initialized by an Iterative Closest Point search , is performed to locate the jaw in 3D. The annotation points are then manually corrected by an operator see section “Manual annotation tools and standard”. In summary, the low curvature points are found by a 2D AAM and transformed to 3D image, while high curvature points are found by a 3D ASM.
The 2D AAM and 3D ASM were constructed based on 605 individuals recorded by a Nikon D90 in 2D and a Canfield Vectra M3 Imaging System in 3D, respectively. Both the 2D and 3D data were annotated to create correspondence between individuals, in the same fashion as described in the following section.
Manual annotation tools and standard
The object of the manual annotation was to reach a consistent and stable standard for annotation. Prior to the study the annotation scheme was explained and discussed during a three-hour workshop (common training program), to ensure a common frame of reference. Further, all operators had annotated more than 100 scans prior to this training program. The manual annotation is a two-step process. First, the annotation is performed in a fixed frontal view by a custom-made annotation tool. The fixed view was chosen over free flying mode to allow faster annotation time. Second, points in high curvature areas are adjusted in fixed frontal, profile, and top/down views. The high curvature points are the jaw and nose points (35-45 and 63-73).
Dense point correspondence
To analyze facial shape variation at positions not annotated by landmarks, a dense point correspondence is created. A variety of methods exist for establishing dense correspondence. In this work we employ a method that has previously produced excellent results when a sparse set of landmarks exist .
This method is based on propagating a well-formed template mesh to all shapes in the training set. For each shape the template mesh is initially deformed using a volumetric thin-plate spline warp  and using the sparse set of corresponding landmarks. In the next step the mesh vertices of the deformed template mesh are propagated to the target shape. This approach is very similar to the method used to create the dense surface models described by Hutton et al.[1, 4, 5]. While propagating each vertex to the Euclidian closest point on the target surface works for simple anatomy, it fails in regions with moderate curvature. A proven solution is to regularize the correspondence field and add curvature information in the propagation step. In Paulsen  and Hilger  this regularization is cast into a Markov Random Field (MRF) framework , where a prior and an observation term are defined. The prior model imposes a Gaussian prior on the deformation field that favors smooth deformation fields. The curvature of the deformed template mesh and the target shape is used in the observation term to guide the correspondence to areas with similar curvature. The mean curvature is estimated as the radius of a locally fitted sphere . Finally, the regularization is bounded so the projected points are on the surface of the target shape. The optimal correspondence field is found using stochastic optimization. The involved weighting between the prior and observation terms is found as the weight that creates the most compact shape model as described by Hilger . The result is a regularized dense correspondence between the template and all the shapes in the training set. In our experiments, the dense correspondence consists of 39,653 points and the associated mesh connectivity from the template mesh.
All results were produced with SAS version 9.4 and Matlab version R2010b.
Intra/inter operator variability
Statistical model fit
Mean annotation reconstruction errors in mm
Dense point correspondence optimization
Dense point mean variance
Selection 2’ prediction error and inter-operator error of all 73 landmarks
Scheme 2 error
To the best of our knowledge, this is first study to address the variation of human-annotated 3D facial landmarks. Understanding the variation of manual annotations is important as components of registration, recognition, and machine learning are influenced by manual annotation errors. However, the current literature is sparse in area pertaining to 3D facial morphology and variation. We expect that an increase in the availability, accuracy, user friendliness (i.e. fewer operator demands) of 3D imaging scanners will probe the use of shape models in clinical diagnostics, as seen for example in orthopedic surgery . However, to assess the putative clinical impact of such tools, it is important to understand the variability embedded in manual annotation. Our analysis focused on facial morphology, suggests a procedure to retrieve a dense correspondence mesh of the face with low variance and minimal human operator assigned annotation points.
We first address the variability of 73 facial 3D landmarks, and that the variability is highly correlated with specific annotation point. As expected, landmarks that are easier to define in consensus (here, landmark of the pupils) have the lowest inter- and intra-operator variability. More leniently defined landmarks such as the points defining the jaw line are associated with the highest variation. The portray itself was associated with relative low annotations variability, thus is seems that variables associated with the portray such as age and gender does not seem to influence the annotations.
One obvious application of the annotated points is to identify minor facial abnormalities,that may assist in the clinical diagnosis of syndromes. Such abnormalities can be identified by using absolute measures or the ratio between manually annotated landmarks, or by using a dense correspondence mesh. Our study supports the preferential use of dense correspondence mesh for identification of minor abnormalities, as this facilitates the use of landmarks/points not manually annotated and thus a larger data set. In a clinical setting, different operators will be used, and although such operators will be ideally trained, the variability will lead to increased signal to noise ratio and reduced analytical power. Therefore, we suggest an approach to limit the number of annotation points, which minimize variability and is able capture facial features. This approach uses 14 landmarks to create a dense correspondence mesh with a point mean variance of 0.54. Further, this approach shows less variability in 16 of the manually annotated points not included creating the correspondence mesh. Using fewer annotation points will decrease the operator time, thus improving feasibility of use.
There is one obvious limitation with regard to generalizability of the study. We used subjects that are Caucasian with Scandinavian background, thus we cannot exclude that the variability of the annotation landmarks is different from other ethnicities, e.g. the texture of blonde eyebrows on light skin may be difficult to separate, whereas dark eyebrows may not. One other limitation of our study is that annotation was performed only two times, thus we cannot address whether additional repeat measure (>2) would notable influence the annotation variation.
We found that the variability of manual annotated facial landmarks, was associated with the specific landmark, and did not seem to be influence by the portray, i.e. gender and age, or the (trained) operator. Using 14 of the 73 landmarks we were able to decreasing the mean variance and create a dense correspondence mesh capturing all facial feature.
The authors of this manuscript express their gratitude to participants and staff involved in the Danish Blood Donor Study.
- Chinthapalli K, Bartolini E, Novy J, Suttie M, Marini C, Falchi M, Fox Z, Clayton LMS, Sander JW, Guerrini R, Depondt C, Hennekam R, Hammond P, Sisodiya SM: Atypical face shape and genomic structural variants in epilepsy. Brain: J Neurol. 2012, 135 (10): 3101-3114.View ArticleGoogle Scholar
- Liu F, van der Lijn F, Schurmann C, Zhu G, Chakravarty MM, Hysi PG, Wollstein A, Lao O, de Bruijne M, Ikram MA, van der Lugt A, Rivadeneira F, Uitterlinden AG, Hofman A, Niessen WJ, Homuth G, de Zubicaray G, McMahon KL, Thompson PM, Daboul A, Puls R, Hegenscheid K, Bevan L, Pausova Z, Medland SE, Montgomery GW, Wright MJ, Wicking C, Boehringer S, Spector TD, et al: A genome-wide association study identifies five loci influencing facial morphology in Europeans. PLoS Genet. 2012, 8 (9): e1002932-View ArticlePubMedPubMed CentralGoogle Scholar
- Paternoster L, Zhurov AI, Toma AM, Kemp JP, St Pourcain B, Timpson NJ, McMahon G, McArdle W, Ring SM, Smith GD, Richmond S, Evans DM: Genome-wide association study of three-dimensional facial morphology identifies a variant in PAX3 associated with nasion position. Am J Hum Genet. 2012, 90 (3): 478-485.View ArticlePubMedPubMed CentralGoogle Scholar
- Hammond P, Hutton TJ, Allanson JE, Campbell LE, Hennekam RCM, Holden S, Patton MA, Shaw A, Temple IK, Trotter M, Murphy KC, Winter RM: 3D analysis of facial morphology. Am J Med Genet Part A. 2004, 126A (4): 339-348.View ArticlePubMedGoogle Scholar
- Hutton TJ, Buxton BR, Hammond P: Dense surface point distribution models of the human face. Proceedings IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2001). 2001, IEEE, 153-160.View ArticleGoogle Scholar
- Paulsen RR, Hilger KB: Shape modelling using markov random field restoration of point correspondences. Inf Process Med Imaging. 2003, 18: 1-12.View ArticlePubMedGoogle Scholar
- Douglas TS: Image processing for craniofacial landmark identification and measurement: a review of photogrammetry and cephalometry. Comput Med Imaging Graph. 2004, 28 (7): 401-409.View ArticlePubMedGoogle Scholar
- Buschang PH, Tanguay R, Demirjian A: Cephalometric reliability: a full ANOVA model for the estimation of true and error variance. Angle Orthod. 1987, 57 (2): 168-175.PubMedGoogle Scholar
- von Cramon-Taubadel N, Frazier BC, Lahr MM: The problem of assessing landmark error in geometric morphometrics: theory, methods, and modifications. Am J Phys Anthropol. 2007, 134 (1): 24-35.View ArticlePubMedGoogle Scholar
- Sholts S, Flores L, Walker P, Wärmländer S: Comparison of coordinate measurement precision of different landmark types on human crania using a 3D laser scanner and a 3D digitiser: implications for applications of digital morphometrics. Int J Osteoarchaeol. 2011, 21 (5): 535-543.View ArticleGoogle Scholar
- Barbeito-Andrés J, Anzelmo M, Ventrice F, Sardi ML: Measurement error of 3D cranial landmarks of an ontogenetic sample using computed tomography. J Oral Biol Craniof Res. 2012, 2 (2): 77-82.View ArticleGoogle Scholar
- Larsen R, Baggesen K: Statistical shape analysis using non-euclidean metrics. Med Image Anal. 2003, 7 (4): 417-423.View ArticlePubMedGoogle Scholar
- Pedersen OB, Erikstrup C, Kotzé SR, Sørensen E, Petersen MS, Grau K, Ullum H: The Danish blood donor study: a large, prospective cohort and biobank for medical research. Vox Sang. 2012, 102 (3): 271-View ArticlePubMedGoogle Scholar
- Fagertun J: Face Recognition. M.sc.eng. thesis. Technical University of Denmark; 2005Google Scholar
- Cootes TF, Edwards GJ, Taylor CJ: Active appearance models. IEEE Trans Pattern Anal Mach Intell. 2001, 23 (6): 681-685.View ArticleGoogle Scholar
- Cootes TF, Taylor CJ, Cooper DH, Graham J: Active shape models-their training and application. Comput Vis Image Underst. 1995, 61 (1): 38-59.View ArticleGoogle Scholar
- Viola P, Jones MJ: Robust real-time face detection. Int J Comput Vis. 2004, 57 (2): 137-154.View ArticleGoogle Scholar
- Viola P, Jones MJ: Robust real-time object detection. Proc. of IEEE workshop on Statistical and Computational Theories of Vision. 2001, IEEEGoogle Scholar
- Zhang Z: Iterative point matching for registration of free-form curves and surfaces. Int J Comput Vis. 1994, 13 (2): 119-152.View ArticleGoogle Scholar
- Bookstein FL: Principal warps: thin-plate splines and the decomposition of deformations. Pattern Anal Mach Intell IEEE Trans. 1989, 11 (6): 567-585.View ArticleGoogle Scholar
- Hilger KB, Paulsen RR, Larsen R: Markov random field restoration of point correspondences for active shape modeling. Medical Imaging 2004. 2004, International Society for Optics and Photonics, 1862-1869.View ArticleGoogle Scholar
- Li SZ: Markov Random Field Modeling in Image Analysis. 2009, SpringerGoogle Scholar
- Paulsen RR: Statistical shape analysis of the human ear canal with application to in-the-ear hearing aid design. Ph.D. thesis. IMM, Informatik og Matematisk Modellering, Danmarks Tekniske Universitet; 2004Google Scholar
- Sugano N: Computer-assisted orthopedic surgery. J Orthopaedic Sci. 2003, 8 (3): 442-448.View ArticleGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2342/14/35/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.