Automatic colorimetric calibration of human wounds

Background Recently, digital photography in medicine is considered an acceptable tool in many clinical domains, e.g. wound care. Although ever higher resolutions are available, reproducibility is still poor and visual comparison of images remains difficult. This is even more the case for measurements performed on such images (colour, area, etc.). This problem is often neglected and images are freely compared and exchanged without further thought. Methods The first experiment checked whether camera settings or lighting conditions could negatively affect the quality of colorimetric calibration. Digital images plus a calibration chart were exposed to a variety of conditions. Precision and accuracy of colours after calibration were quantitatively assessed with a probability distribution for perceptual colour differences (dE_ab). The second experiment was designed to assess the impact of the automatic calibration procedure (i.e. chart detection) on real-world measurements. 40 Different images of real wounds were acquired and a region of interest was selected in each image. 3 Rotated versions of each image were automatically calibrated and colour differences were calculated. Results 1st Experiment: Colour differences between the measurements and real spectrophotometric measurements reveal median dE_ab values respectively 6.40 for the proper patches of calibrated normal images and 17.75 for uncalibrated images demonstrating an important improvement in accuracy after calibration. The reproducibility, visualized by the probability distribution of the dE_ab errors between 2 measurements of the patches of the images has a median of 3.43 dE* for all calibrated images, 23.26 dE_ab for all uncalibrated images. If we restrict ourselves to the proper patches of normal calibrated images the median is only 2.58 dE_ab! Wilcoxon sum-rank testing (p < 0.05) between uncalibrated normal images and calibrated normal images with proper squares were equal to 0 demonstrating a highly significant improvement of reproducibility. In the second experiment, the reproducibility of the chart detection during automatic calibration is presented using a probability distribution of dE_ab errors between 2 measurements of the same ROI. Conclusion The investigators proposed an automatic colour calibration algorithm that ensures reproducible colour content of digital images. Evidence was provided that images taken with commercially available digital cameras can be calibrated independently of any camera settings and illumination features.


Background
Chronic wounds are a major health problem, not only because of their incidence, but also because of their time-and resource-consuming management. This study was undertaken to investigate the possible use of colorimetric imaging during the assessment of human wound repair. The outline design of the current study is based on the system requirements for colorimetric diagnostic tools, published previously [1,2].
Digital photography is considered an acceptable and affordable tool in many clinical disciplines such as wound care and dermatology [3][4][5][6][7][8][9][10][11][12], forensics [13,14], pathology [15], traumatology, and orthodontics [16,17]. Although the technical features of most digital cameras are impressive, they are unable to produce reproducible and accurate images with regard to spectrophotometry [18][19][20][21][22]. Taking two pictures of a wound with the same camera and settings, immediately after one another, normally results in two slightly different images. These differences are exacerbated when the lighting, the camera or its settings are different. Therefore, reproducibility is poor. This may be less important when photographs are taken for documentation purposes, but when digital photography becomes part of medical evaluation or is used to perform measurements, it becomes critically important [5,6,18,[23][24][25][26][27][28][29]. In our view, the quality of medical photography is principally defined by its reproducibility and accuracy [21]. Without reproducibility and accuracy of images, any attempt to measure colour or geometric properties is of little use [27]. A simple, practical and validated algorithm to solve this problem is necessary (Figure 1).
Almost all colours can be reconstructed using a combination of three base colours; red, green and blue (RGB) [30]. Together, these three base colours define a 3-dimensional colour space that can be used to describe colours.
The accurate handling of colour characteristics of digital images is a non-trivial task because RGB signals generated by digital cameras are 'device-dependent', i.e. different cameras produce different RGB signals for the same scene. In addition, these signals will change over time as they are dependent on the camera settings and some of these may be scene dependent, such as the shutter speed and aperture diameter. In other words, each camera defines a custom device-dependent RGB colour space for each picture taken. As a consequence, the term RGB (as in RGB-image) is clearly ill-defined and meaningless for anything other than trivial purposes. As measurements of colours and colour differences in this paper are based on a standard colorimetric observer as defined by the CIE (Commission Internationale de l'Eclairage), the international standardizing body in the field of colour science, it is not possible to make such measurements on RGB images if the relationship between the varying camera RGB colour spaces and the colorimetric colour spaces (colour spaces based on said human observer) is not determined. However, there is a standard RGB colour space (sRGB) that is fixed (deviceindependent) and has a known relationship with the CIE colorimetric colour spaces. Furthermore, sRGB should more or less display realistically on most modern display devices without extra manipulation or calibration (look for a 'sRGB' or '6500K' setting) [31]. One disadvantage of sRGB is that it cannot represent all the colours detected by the human eye. We believe that finding the relationship between the varying and unknown camera RGB and the sRGB colour space will eliminate most of the variability introduced by the camera and lighting conditions.
The transformation between the input RGB colour space and the sRGB colour space was achieved via a colour target-based calibration using a 'reference chart', namely the MacBeth Colour Checker Chart Mini [MBCCC] (GretagMacBeth AG, Regensdorf, Switzerland). This chart provides a checkerboard array of 24 scientifically prepared coloured squares or patches in a wide range of colours with known colorimetric properties under a CIE D65, noon daylight illuminant (6504 K). Many of these squares represent natural objects of special interest, such as human skin, foliage and blue sky. These squares are not only the same colour as their counterparts, but also reflect light the same way in all parts of the visible spectrum. Different calibration algorithms defining the relationship between the input RGB colour space of the camera and the sRGB colour space have been published using various methods such as 3D look-up tables and neural networks. The algorithm in this study is based on three 1D look-up tables and polynomial modelling, as previously published by Vander Haeghen et al. [32] (Figure 2). This is a little different than e.g. the general methods used in the wellknown ICC profiles http://www.color.org/index.xalter. In ICC profiles the relationship of an unknown colour spaces to the so-called 'profile connection space' (PCS, usually CIE XYZ) are computed and stored. Output is then generated by going from this PCS to the desired output colour space, which in our case would be sRGB. This means 2 colour space transformations are required (RGB to PCS to sRGB), while our algorithm only needs 1. Although an inherently more flexible system, ICC profiling seems overkill for our intended application (straight camera RGB to sRGB transformation, without the need of determining and storing or embedding a device profile). However, It must be said that the advent of e.g. LittleCMS http://www.littlecms.com/ which is a free colour management system that focuses on determination and immediate application of profiles on images may change this view in the future, and that such a system could be a viable alternative for the current colour space transformation algorithms in our system.

Methods
The research has been carried out in accordance with the Helsinki Declaration; the methods used were subject to ethical committee approval (B32220083450 Commissie voor Medische Ethiek Faculteit Geneeskunde Leuven Belgium). Patients received detailed written and verbal explanation and patient authorization was required before inclusion and analysis of the images.

Experiment 1
The purpose of the first experiment was to investigate whether camera settings or lighting conditions negatively affect the quality of the colorimetric calibration [33]. Chronic wounds are assessed in different locations and environments. Therefore, we assessed the calibration algorithm under extreme lighting conditions and with inappropriate camera settings.

Image Acquisition
Digital images of the MBCCC on a grey-coloured background, in a Colour Assessment Cabinet CAC 120-5 (VeriVide Leicester, UK), were taken using two digital cameras; the Nikon D200 SLR (10.2 effective mega pixels) with a 60 mm AF Micro Nikkor lens, and the Canon Eos D10 (6.3 million effective pixels) with a 50 mm Canon EF lens. All images were processed in high-quality jpeg mode. This means after the camera has applied processing (demosaicing, colour correction curves, matrixing, etc. ...) to the images. (Table 1).

Calibration Procedure
During the calibration procedure uniform illumination is assumed, as is a reference chart as part of the image of interest. The calibration provides a means of transforming the acquired images (defined in an unknown colour space, which is normally RGB), to a standard, welldefined colour space i.e. sRGB [34]. sRGB has a known relationship to the CIE L*a*b* colorimetric space, allowing computation of perceptual colour differences. The CIE L*a*b* colorimetric space, or CIELAB space with coordinates L*, a* and b*, refers to the colour-opponent space; L* refers to Luminance, a* and b* refer to the colour-opponent dimensions [34][35][36]. The 'detection of the MBCCC' in the digital image can be done manually or automatically. The algorithm behind MBCCC detection is based on the initial detection of all the bright areas in an image (areas with pixel values close to 255), followed by a shape analysis. Shapes that are not rectangular, and either too small or too large compared with the image dimensions, are discarded (in pixel, we do not know the real dimension yet). The remaining areas are candidates for the MBCCC white patch. For each of the white patch candidates, the corresponding MBCCC black patch is searched for, taking into account the typical layout of the colour chart and the dimensions of the white patch candidate. If this succeeds, the patches are checked for saturation (average pixel value > 255-δ or < \delta with \delta a small number, e.g. 3) in each of the colour channels individually. If the number of saturated patches is acceptable (typically fewer than 6 out of 24 patches), calibration proceeds and its quality is assessed. Quality assessment consists of examining various conditions relating to the colour differences between the known spectrophotometric and the computed sRGB values, in accepted and rejected patches. If any of these tests fail, the algorithm rejects the calibration and continues the search.

Analysis
In this experiment precision is defined as a measure of the proximity of consecutive colour measurements on an image of the same subject. This is also known as reproducibility. The precision of the MBCCC chart detection, together with the calibration process, were evaluated by computing the perceptual colour differences between all the possible pairs of measurements of each colour square of the MBCCC chart. These perceptual colour differences are expressed in CIE units, and are computed using the Euclidean metric in the CIE L*a*b* colour space. Theoretically, one unit is the 'just noticeable colour difference' and anything above five units is 'clearly noticeable'.
The accuracy of a procedure is a measure of how close its results are to the 'real' values, i.e. those obtained using the 'standard' procedure or measurement device. For colour measurements this would be a spectrophotometer. Consequently, the accuracy of the chart detection and colour calibration can be assessed by computing the perceptual colour differences between the measurements of the colour squares of the MBCCC chart and the spectrophotometric values of these squares. For this assessment the calibration was performed using half the colour patches of the MBCCC chart, while the other half were utilised in evaluation of accuracy. Accuracy is likely to be higher when the whole chart is used for calibration purposes. Precision and accuracy result in a probability distribution for the dE_ab errors. Tukey's five-number summary of the dE_ab colour differences of each patch was also calculated and visualized using a box plot (the minimum, the lower quartile, the median, the upper quartile and the maximum). Wilcoxon rank-sum statistics were used to test the calibration, which compares the locations of two populations to determine if one population has been shifted with respect to another. A sum of ranks comparison, which works by ranking the combined data sets and summing the ranks for each dE_ab, was utilised to compare the sum of the ranks with significance values based on the decision alpha (p < 0.05).

Experiment 2
The second experiment was designed to quantify the impact of the automatic calibration procedure i.e. the chart detection, on real-world measurements. This may be of importance in a clinical setting, where automatic calibration of large batches of images in a single run is required. To examine this, 40 different images of real wounds were acquired, and a region of interest (ROI) was selected within each image. Three rotated versions (at 90°, 180°and 270°) of each image were created and automatically calibrated ( Figure 3). Comparisons between the colour measurements of the ROIs of the rotated versions of each image highlighted the errors introduced by the automatic chart detection component of the calibration procedure.

Image Acquisition
Digital images (n = 40) of the chronic wounds were taken using a Sony Cybershot DSC-F828 digital camera (8.0 million effective pixels) and Carl Zeiss 28 -200 mm equivalent lens, with fully automatic settings at different indoor locations, as is usually the case in daily clinical practice.

Calibration Procedure and Analysis
The calibration procedure was carried out in accordance with that recorded for experiment 1. The dE_ab colour differences between the average colour of the ROI of the four rotated versions of each image were computed and visualized using a probability distribution graph.      different cameras. The images contained many saturated patches (see the 'x's on the patches) that were not used for the calibration, resulting in a lower quality calibration.

Experiment 1
The accuracy and reproducibility of the colour calibration using different cameras, camera settings and illumination conditions are presented using a probability distribution of dE_ab errors of all the MBCCC patches ( Figure 10). A distinction is made between the full set of images and the 'normal' images, which were acquired with proper camera settings: correct manual or automatic white balance and no exposure bias. Indeed, the full set contains several images that were strongly over-or underexposed, or had a mismatched white balance. These images demonstrated the effectiveness of the calibration method, but are not representative of day-to-day photography. Moreover, the term 'proper patch' was used to indicate patches that were not saturated during acquisition i.e. those patches with pixel values too close to 255 or 0. It was not possible to calculate the pixel value of these saturated patches, and their calibration was unfeasible.
The accuracy and reproducibility results for the set of proper patches of normal images are representative for colours in properly photographed images, which are different from the colours of the patches that were disregarded during calibration due to saturation (marked by 'x' on the calibrated image) (Figure 11). Saturation in normal images or skin imaging is rare, but if it does occur it normally manifests itself as an overexposure of white, deep red, yellow and orange MBCCC patches. If this problem is frequent with a particular camera, it can be remedied by slightly underexposing images by, for example, half an f-stop (exposure bias).
Tukey's five-number summary of the dE_ab colour differences of each proper patch of the normal images was calculated and visualized using a box plot (the minimum, the lower quartile, the median, the upper quartile and the maximum) ( Figure 12). Outliers were marked with a red 'x'. To evaluate accuracy, the chart patches were split in two groups of 12 patches and only the second group was used for calibration, resulting in a lower quality calibration than if 24 patches had been used. The first group of 12 patches was used to check the accuracy.
Colour differences between the measurements and real spectrophotometric measurements revealed median dE_ab values of 6.40 for proper patches of calibrated normal images and 17.75 for uncalibrated images, respectively, demonstrating an important improvement in accuracy after calibration ( Figure 10). The result for the patches used in the calibration was also included, and they had a median of 1.59 dE_ab. Figure 12 presents the accuracy box plot for the proper patches of the normal images. As mentioned  above, we could only use patches that had not been used in computing the calibration in order to check accuracy, therefore only 12 patches are shown in this figure.
As figure 11 demonstrates, the reproducibility, visualized by the probability distribution of the dE_ab errors between two measurements of the patches of the images, had a median of 3.43 dE* for all calibrated images, 23.26 dE_ab for all uncalibrated images, a median of 2.83 dE_ab for all 'normal' calibrated images, and 14.25 dE_ab for all 'normal' uncalibrated images. Restricting the calculation to the proper patches of normal calibrated images, the median was 2.58 dE_ab. Wilcoxon sum-rank testing (p < 0.05) between uncalibrated normal images and calibrated normal images with proper squares was equal to zero, demonstrating a highly significant improvement in reproducibility.
Examining dE_ab errors for each MBCCC patch individually revealed that the greatest errors were found in the red, orange yellow, orange and yellow patches. Examination of cyan patches was excluded as these cannot be represented accurately in the sRGB colour space ( Figure 13).

Experiment 2
The reproducibility of the chart detection during automatic calibration is presented using a probability distribution of dE_ab errors between two measurements of the same ROI. Ideally this should be as close to zero as possible and comparable to the measurements of the same ROI depicted in the presented figures. Reproducibility: box plot for the proper patches of the normal images. The rotated versions of an image should all be equal. Any deviation from this would indicate variability in the chart detection, leading to a slightly different calibration and thus different measurements ( Figure 14).

Discussion
The research presented here provides evidence that images taken with commercially available digital cameras can be calibrated independently of camera settings and illumination features, provided that illumination in the field of view is uniform and a calibration chart is used. This may be particularly useful during chronic wound assessment, as this is often performed in different locations and under variable lighting conditions. The proposed calibration transforms the acquired images in an unknown colour space (usually RGB) to a standard, well defined colour space (sRGB) that allows images to be displayed properly and has a known relationship to the CIE colorimetric colour spaces. First, we challenged the calibration procedure with a large collection of images containing both 'normal' images with proper camera settings and images that were purposely over-or underexposed and/or had white balance mismatches. The reproducibility and accuracy of the calibration procedure is presented and demonstrates marked improvements. The calibration procedure works very well on the images with improper camera settings, as evidenced by the minimal differences between the error distributions of the complete set of images and the set with only the 'normal' images. An innovative feature demonstrated during our research is the automatic 'detection and calibration of the MacBeth Colour Checker Chart Mini [MBCCC]' in the digital image. Secondly, we tested the effect of this MBCCC chart detection on subsequent real-world colour measurements. Figure 14 demonstrates the probability distribution of errors between two colour measurements of the same region of interest that can be attributed to variations in the chart detection process. The majority of these errors were below 1 dE_ab, demonstrating that the chart detection is robust.
This experiment is part of the research presented by the Woundontology Consortium, which is a semi-open, international, virtual community of practice devoted to advancing the field of research in non-invasive wound assessment by image analysis, ontology and semantic interpretation and knowledge extraction http://www. woundontology.com. The interests of this consortium are related to the establishment of a community driven, semantic content analysis platform for digital wound imaging with special focus on wound bed surface area and color measurements in clinical settings. Current research by the Woundontology Consortium is related to our concerns of the interpretation of clinical wound images without any calibration or reference procedure. Therefore we are investigating techniques to promote standardization. The platform used by this Consortium is based on Wiki technology, a collaborative environment to develop a "woundontology" using the Collaborative Ontology Development Service (CODS) and an image server. Research on wound bed texture analysis is performed by a computer program: "MaZda". This application has been under development since 1998, to satisfy the needs of the participants of the COST B11 European project "Quantitative Analysis of Magnetic Resonance Image Texture" (1998)(1999)(2000)(2001)(2002). Additionally, wound bed texture parameter data-mining is analyzed using "Rapid-Miner" which is one of the world-wide leading opensource data mining solution. Recently

Conclusions
To our knowledge, the proposed technology is the first demonstration of a fundamental, and in our opinion, essential tool for enabling intra-individual (in different phases of wound healing) and inter-individual (for features and properties) comparisons of digital images in human wound healing. By implementing this step in the assessment, we believe that scientific standards for research in this domain will be improved [37].