A consistency evaluation of signal-to-noise ratio in the quality assessment of human brain magnetic resonance images

Background Quality assessment of medical images is highly related to the quality assurance, image interpretation and decision making. As to magnetic resonance (MR) images, signal-to-noise ratio (SNR) is routinely used as a quality indicator, while little knowledge is known of its consistency regarding different observers. Methods In total, 192, 88, 76 and 55 brain images are acquired using T2*, T1, T2 and contrast-enhanced T1 (T1C) weighted MR imaging sequences, respectively. To each imaging protocol, the consistency of SNR measurement is verified between and within two observers, and white matter (WM) and cerebral spinal fluid (CSF) are alternately used as the tissue region of interest (TOI) for SNR measurement. The procedure is repeated on another day within 30 days. At first, overlapped voxels in TOIs are quantified with Dice index. Then, test-retest reliability is assessed in terms of intra-class correlation coefficient (ICC). After that, four models (BIQI, BLIINDS-II, BRISQUE and NIQE) primarily used for the quality assessment of natural images are borrowed to predict the quality of MR images. And in the end, the correlation between SNR values and predicted results is analyzed. Results To the same TOI in each MR imaging sequence, less than 6% voxels are overlapped between manual delineations. In the quality estimation of MR images, statistical analysis indicates no significant difference between observers (Wilcoxon rank sum test, pw ≥ 0.11; paired-sample t test, pp ≥ 0.26), and good to very good intra- and inter-observer reliability are found (ICC, picc ≥ 0.74). Furthermore, Pearson correlation coefficient (rp) suggests that SNRwm correlates strongly with BIQI, BLIINDS-II and BRISQUE in T2* (rp ≥ 0.78), BRISQUE and NIQE in T1 (rp ≥ 0.77), BLIINDS-II in T2 (rp ≥ 0.68) and BRISQUE and NIQE in T1C (rp ≥ 0.62) weighted MR images, while SNRcsf correlates strongly with BLIINDS-II in T2* (rp ≥ 0.63) and in T2 (rp ≥ 0.64) weighted MR images. Conclusions The consistency of SNR measurement is validated regarding various observers and MR imaging protocols. When SNR measurement performs as the quality indicator of MR images, BRISQUE and BLIINDS-II can be conditionally used for the automated quality estimation of human brain MR images.


Background
Medical image quality is highly related to many clinical applications, such as screening, abnormality detection and disease diagnosis. Nowadays, various kinds of imaging modalities are daily used, such as computerized tomography (CT) and magnetic resonance (MR) imaging, not to speak of these devices under development [1][2][3]. At the same time, massive medical images are collected and used to support the clinical decision making in each day. Therefore, how to evaluate the medical image quality wins increasing attention [4,5].
Medical image quality assessment (MIQA) is crucial in the equipment quality assurance [6][7][8], comparison of algorithms for image restoration [9][10][11][12][13], image interpretation [14][15][16][17] and disease diagnosis [18,19]. These MIQA algorithms can be grouped into the full-and noreference categories [19][20][21][22][23]. The full-reference algorithms require the access to the reference image, while it is often unavailable in the medical imaging domain. To tackle this problem, the images from advanced devices are used as the reference to validate the proposed methods with images from common devices [24,25]. However, this kind of approaches leads to new obstacles due to uncontrollable motion and particularly the different imaging characteristics. Comparatively, no-reference MIQA algorithms are more useful and challenging, and no reference information can be borrowed [20,23,26].
As a quality indicator of medical images, signal-tonoise ratio (SNR) is widely used to evaluate the development of new hardware and image processing algorithms [19,23,[26][27][28][29][30][31]. The most common approach for SNR measurement, known as a "two-region" approach, is based on the signal statistics in two separate regions of interest (ROIs) from a single image. One is the tissue ROI (TOI) which determines the signal and the other ROI is localized in the object-free region which measures the noise [27,28,32]. The quality comparison of medical images with SNR measurement is still difficult across studies [23]. Above all, SNR values might vary according to the delineation of ROIs. For specific purposes, different tissues are concerned. And regarding the same purpose, it is impossible to delineate an identical tissue region. Moreover, the quality of MR imaging acquisition is closely related to the magnetic field strength (1.5 T, 3 T, etc), imaging protocol (T 1 , T 2 , etc), field of view (FOV), reconstruction methods and other significant factors. Furthermore, medical imaging is prone to unavoidable noise and artifacts. Besides, a great challenge might come from the fact that there are diverse imaging characteristics across modalities. Therefore, a consistency evaluation of SNR measurement is helpful in the further comparison of medical image quality.
In this paper, we evaluate the reliability of SNR measurement regarding different observers. At the preliminary stage, this study is confined to human brain MR images and four MR imaging sequences are analyzed. To the best of our knowledge, the most similar work is [26], in which it conducted the correlation analysis between subjective evaluation and 13 full-reference models. These models are primarily used for natural image quality assessment (NIQA). However, the study is with poor generalization. First, the experiment was based on synthesized distortions on 25 reference MR images and the result might be not so convincing in regard to real-life medical images. Second, the study involved subjective estimation to score the image quality, which is time consuming and expensive. On contrary, in this study, 411 in vivo human brain MR images are collected and 2 observers are involved to localize the tissue regions of white matter (WM) and cerebral spinal fluid (CSF) as the TOI for SNR measurement. Most importantly, this study investigates the SNR consistency regarding different observers. After the reliability of SNR measurement is verified, 4 no-reference NIQA models are borrowed from the computer vision community to predict the MR image quality, and furthermore, the correlation between the predicted results and SNR values is explored. On the whole, this study might shed some light on automated objective MIQA with less time and expenditure.

Data collection
In total, 192 T 2 * weighted MR images of healthy brain, 88 T 1 , 76 T 2 and 55 contrast enhanced T 1 (T 1 C) weighted MR images of brain with cancerous tumors are collected. Participants were scanned with a 3.0 T scanner (Siemens, Erlangen, Germany) and an 8-channel brain phased-array coil was used.
Specifically, T 2 * weighted images are acquired using gradient-echo pulse sequence. Its time of repetition (TR) is 200 ms and time of echo (TE) varies from 2.61 ms to 38.91 ms with an equal interval of 3.3 ms. The flip angle is 15 o , FOV is 220 × 220 mm 2 , slice thickness is 3.0 mm and the resultant image matrix is 384 × 384. Note that the original purpose of multi-echo T 2 * weighted image acquisition is toward tissue dissimilarity analysis [12]. T 1 , T 2 and T 1 C weighted images are acquired using spin echo protocol with different TR and TE pairs (535 ms and 8 ms; 3500 ms and 105 ms; 650 ms and 9 ms). The flip angle is 15 o , FOV is 220 × 220 mm 2 and slice thickness is 1 mm or 2 mm. The resultant image size of T 1 and T 1 C weighted MR images varies from 512 × 432 to 668 × 512, while the matrix size of T 2 weighted MR images is ranged from 384 × 324 to 640 × 640.

Image pre-processing
To each image, pixel intensity is linearly scaled to [0, 255]. Then, two TOIs (WM and CSF) are outlined in addition to two air regions. A non-physician (observer A, OA) and a radiologist with more than 15-year experience (observer B, OB) are asked to determine ROIs manually. Since the observers work separately and independently, they agree on that the size of outlined ROIs should be as large as possible. Furthermore, to T 1 , T 2 and T 1 C weighted MR images, they also agree on that TOIs should be homogeneous and keep away from the tumor areas. The initial shape of each ROI is approximated with six points (the red sparkles in Fig. 1) and further refined by using a freeform curve-fitting method [33,34]. The curve-fitting method takes the six points as the control points and Hermite cubic curve [35] is utilized for smooth interpolation between the points. In the end, outlined regions are as input to our in-house built algorithm with MATLAB (Mathworks, Natick, MA, USA) to measure the WM-based SNR (SNR wm ) and CSF-based SNR (SNR csf ) values. Note that the procedure is repeated on another day within 30 days for intraobserver reliability analysis. Figure 1 shows T 2 * (A), T 1 (B), T 2 (C) and T 1 C (D) weighted MR images. In each image, WM, CSF and AIR regions are in closed curves which are highlighted with pink, blue and yellow lines, respectively. Note that the red sparkles are primarily points localized by observers and images have been cropped for display purpose.

SNR measurement
Two approaches exist for SNR measurement. The most common one requires two separate ROIs from a single image [27,28]. By taking the signal (S) to be the average intensity in a tissue ROI (μ TOI ) and the noise (σ) to be the standard deviation of the pixel intensity in a background ROI (σ AIR ), we can approximate the SNR value of the image as below, Due to the Rician distribution of the background noise in a magnitude image, the factor of 0.655 arises because noise variations can be negative and positive [27,28].
If the image is not homogeneous, the SNR measurement can be derived from the second approach [36,37]. At first, a couple of images are acquired by consecutive scans and the MR device is equipped with identical imaging settings. And then, a difference image is derived by subtracting the images one from the other. Since the images are consecutively acquired on without any instability, the noise should be the only difference between the two original images. Taking the signal (S) as the mean pixel intensity value in a tissue ROI (μ oTOI ) on one original image and the noise as the standard deviation (σ) in the same ROI on the subtracted image (σ sTOI ),SNR can be estimated as where the factor of ffiffi ffi 2 p arises because the standard deviation (σ) is derived from the subtraction image but not from the original image.
This study utilizes Eq. (1) to measure SNR values of MR images, since image homogeneity is warranted in this study. In addition, the second approach is commonly used for equipment quality assurance and requires scanning the object twice.

No-reference NIQA
Massive NIQA models are developed each year, while few models are used in the medical imaging community [38][39][40]. This study makes use of four automated noreference NIQA methods to predict the MR image quality. The correlation analysis between SNR values Fig. 1 Manual outline of tissue regions and air regions. a, b, c, d are T 2 * , T 1 , T 2 and T 1 C weighted MR images, respectively. b, c, d demonstrates one example of a subject. Primarily points localized by observers are noted with red sparkles. Outlined WM, CSF and AIR regions are in closed curves with pink, blue and yellow lines, respectively. Note that images have been cropped for display purpose and NIQA results aims to find potential no-reference NIQA models for MIQA applications.
Involved NIQA models utilize natural scene statistics (NSS) to estimate the general quality of natural images. Specifically, the blind image quality index (BIQI) [41] estimates the image quality based on the statistical features extracted in discrete wavelet transform (DWT). It requires no knowledge of the distortion types and can be extended to any kinds of distortions. The second indicator (BLIINDS-II) [42] is an improved version of blind image integrity notator using discrete cosine transform (DCT) statistics [38]. It adopts a general statistical model for score prediction. The third one, blind/referenceless image spatial quality evaluator (BRISQUE) [43], makes use of the locally normalized luminance coefficients and quantifies possible losses of "naturalness" which is a holistic measure of image quality. The last one is the natural image quality evaluator (NIQE) [44]. It builds a "quality-aware" selector that collects statistical features for natural image quality estimation.
These NIQA models are implemented with MATLAB (the Mathworks, Natick, MA, USA) and the codes provided by the authors are accessible online. The models are evaluated without modifications in this study. Full details of these algorithms can be referred to corresponding literature [41][42][43][44].

Experiment design
The experiment is divided into three steps. First, the overlapping ratio of manually outlined TOIs between and within observers are concerned and Dice index is employed. The index is defined as d ¼ 2 Â jX∩Y j jXjþjY j Â 100%, where X and Y stand for the TOI, and the signal | | indicates TOI computed as the number of voxels in the region. The Dice index equal to 100% means the two TOIs are identical, while it equal to 0% indicates the two TOIs are absolutely non-overlapping.
Then, with respect to the same TOI in each imaging sequence, the inter-observer difference is assessed with Wilcoxon rank sum test [45,46] and paired-sample ttest [47]. The statistical analysis is performed using R (http://www.Rproject.org) and a significance level is set as 0.05. Moreover, the test-retest reliability is evaluated in terms of intra-class correlation coefficient (ICC, p icc ) using a two-way mixed-effects model [48]. The values of p icc ranging from 0.81 to 1.00 suggest very good reliability and 0.61 to 0.80 good reliability.
In the end, the correlation between SNR values and NIQA results is analyzed by using Pearson correlation coefficient (r p ) [49]. Note that the values of r p ranging from 0.81 to 1.00 indicate very strong or good correlation, while 0.61 to 0.80 good or strong correlation. Table 1 summarizes the number of voxels in TOIs in each MR sequence (the mean and standard deviation, μ ± σ). It is found that hundreds of voxels are outlined for SNR measurement and the minimum is 330±72.

Overlapped voxels in TOIs
Specifically, the overlapping ratio is described with Dice index as shown in Table 2. It indicates that less than 6% voxels are overlapped between and within observers in the manual delineation of TOIs. Figure 2 shows the first-time measurement of SNR values by using Bland & Altman plots [50]. It is a scatter diagram of the differences plotted against the averages of two SNR observations. In each plot, the average and the

Inter-observer difference
Inter-observer difference of SNR observations is analyzed with Wilcoxon rank sum test (p w ) and paired-sample t test (p p ). Corresponding results are show in Table 3. Note that the minimum value is boldfaced in each test. It is observed that the minimal p w is 0.11 and p p is 0.26. It is also found that both p w and p p from SNR wm are larger than those from SNR csf , correspondingly. Table 4 lists the result of test-retest reliability. Note that ICC 1 and ICC 2 respectively stands for intra-and interobserver correlation coefficient. As shown in the Table, very good intra-observer reliability of the experience radiologist (OB) is found (p icc ≥ 0.81). Similar results are found on the non-physician (OA) except that only good reliability is achieved for SNR csf on T 2 * (p icc ≥ 0.79) and T 2 (p icc ≥ 0.76) weighted MR images. Furthermore, good to very good inter-observer reliability is found (p icc ≥ 0.80) but only good inter-observer reliability is found for SNR csf in T 2 * weighted MR imaging sequence (p icc ≥ 0.74). Table 5 shows the correlation coefficients (r p ) between mean SNR values of each TOI (two measurements each observer) and NIQA results. The bold-faced r p values in red and blue denote r p ≥ 0.60. Specifically, to SNR wm , BIQI, BLIINDS-II and BRISQUE on T 2 * (r p ≥ 0.78), BRISQUE and NIQE on T 1 (r p ≥ 0.77), BLIINDS-II on T 2 (r p ≥ 0.68), and BRISQUE and NIQE on T 1 C (r p ≥ 0.62) images show strong correlation; while to SNR csf values, BLIINDS-II correlates well on T 2 * (r p ≥ 0.63) and T 2 (r p ≥ 0.64) weighted MR imaging sequence.

Discussion
This paper has validated the consistency of SNR measurement in the quality assessment of human brain MR images. Moreover, the correlation between TOI-based SNR measurement and NIQA models has been analyzed.
The study suggests that off-the-shelf NIQA models used in computer vision community are full of potential for automated and objective MIQA applications.
The consistency evaluation indicates that SNR measurement is reliable to different observers in each MR imaging sequence. In image pre-processing, TOIs are randomly localized. When no overlapping between TOIs, the Dice index would be zero. On average, TOIs are slightly overlapped by no more than 6% [ Table 2], while the statistical analysis indicates that SNR values are not significantly changed between observers [ Table 3]. That means independent localization of TOIs makes no difference to SNR measurement. Moreover, the test-retest reliability study suggests good to very good intra-and inter-observer reliability (Table 4). That might be the reason why SNR is widely used in clinical situations. And accordingly, a non-physician can independently perform the SNR measurement of MR images as good as an experienced physician does.
The correlation between SNR values and NIQA models shows that BLIINDS-II correlates well with SNR csf on T 2 * and T 2 weighted MR images, since CSF presents relatively higher voxel intensity over other tissues that leads to the robust estimation of SNR csf . In comparison to SNR csf , more NIQA results are in good  Therefore, the authors suggest that tissue regions with higher intensities should function as the TOI in SNR measurement. On the whole, BRISQUE performs well as an automated no-reference NIQA model for the quality assessment of T 2 * , T 1 and T 1 C weighted MR brain images, and BLIINDS-II is superior on assessing the quality of T 2 * and T 2 MR images independent of the TOI selection. Consequently, it is full of potential to modify NIQA models developed in the computer vision community for MIQA applications in the medical imaging domain [51]. It should be mentioned that the correlation of SNR values and predicted results is not very good (r p ≤ 0.85) and further improvement or modifications of existing NIQA models is needed.
SNR is frequently used as an image quality indicator in clinic. It is a local measure regarding the whole MR image. The SNR measurement can also be formulated from the global signal by using the whole object region as the tissue region. An overview of existing definitions of SNR measurement can be referred to [23]. More general and automated MIQA algorithms include using Shannon's theory to describe the image content and then to model the spatial spectral power density of the image as the quality indicator [21] or analyzing the background of magnitude images of structural brain to represent the image quality [52]. In particular, some researchers explore to bridge the gap between SNR measurement and diagnostic accuracy or detectability [9,18]. These studies show superiority over the physical measure of image quality, since the ultimate goal of medical imaging aims at abnormality detection and disease diagnosis.

Conclusions
The consistency of SNR measurement is validated regarding different observers. The correlation between SNR measurement and NIQA models indicates that BRISQUE works well for automated MIQA of T 2 * , T 1 and T 1 C weighted brain MR images, and BLIINDS-II is superior over T 2 * and T 2 weighted images independent of the TOI selection. Our future work will focus on the connection of SNR measurement, NIQA models and MIQA applications.

Availability of data and materials
The datasets analyzed during the current study are not publicly available. These data could only be accessed to the physicians and researchers to ensure participant confidentiality.