Image quality assessment of pediatric chest and abdomen CT by deep learning reconstruction

Background Efforts to reduce the radiation dose have continued steadily, with new reconstruction techniques. Recently, image denoising algorithms using artificial neural networks, termed deep learning reconstruction (DLR), have been applied to CT image reconstruction to overcome the drawbacks of iterative reconstruction (IR). The purpose of our study was to compare the objective and subjective image quality of DLR and IR on pediatric abdomen and chest CT images. Methods This retrospective study included pediatric body CT images from February 2020 to October 2020, performed on 51 patients (34 boys and 17 girls; age 1–18 years). Non-contrast chest CT (n = 16), contrast-enhanced chest CT (n = 12), and contrast-enhanced abdomen CT (n = 23) images were included. Standard 50% adaptive statistical iterative reconstruction V (ASIR-V) images were compared to images with 100% ASIR-V and DLR at medium and high strengths. Attenuation, noise, contrast to noise ratio (CNR), and signal to noise (SNR) measurements were performed. Overall image quality, artifacts, and noise were subjectively assessed by two radiologists using a four-point scale (superior, average, suboptimal, and unacceptable). A phantom scan was performed including the dose range of the clinical images used in our study, and the noise power spectrum (NPS) was calculated. Quantitative and qualitative parameters were compared using repeated-measures analysis of variance (ANOVA) with Bonferroni correction and Wilcoxon signed-rank tests. Results DLR had better CNR and SNR than 50% ASIR-V in both pediatric chest and abdomen CT images. When compared with 50% ASIR-V, high strength DLR was associated with noise reduction in non-contrast chest CT (33.0%), contrast-enhanced chest CT (39.6%), and contrast-enhanced abdomen CT (38.7%) with increases in CNR at 149.1%, 105.8%, and 53.1% respectively. The subjective assessment of overall image quality and the noise was also better on DLR images (p < 0.001). However, there was no significant difference in artifacts between reconstruction methods. From NPS analysis, DLR methods showed a pattern of reducing the magnitude of noise while maintaining the texture. Conclusion Compared with 50% ASIR-V, DLR improved pediatric body CT images with significant noise reduction. However, artifacts were not improved by DLR, regardless of strength. Supplementary Information The online version contains supplementary material available at 10.1186/s12880-021-00677-2.

in emergency rooms and in tumor patients, is an important imaging test in children. With the development of technology, efforts to reduce the radiation dose have continued steadily, with the development and use of iterative reconstruction (IR) as a typical example.
Over the past decade, the IR algorithm has been used to produce high-resolution images by decreasing image noise through the use of computational processing, resulting in better image quality with lower radiation dose compared with single reconstructed filtered back projection (FBP) in adults [1,2] and children [3][4][5][6]. The recently developed adaptive statistical iterative reconstruction-V (ASIR-V) technique provides a short reconstruction time with better image quality and lowers radiation dose than other IR algorithms [7,8]. However, ASIR-V still does not overcome excessive image smoothing and unnatural image appearance. Hybrid IR images that blend IR with FBP can be used to decrease this texture problem, although a trade-off between image noise and image texture occurs [9].
Recently, image denoising algorithms using artificial neural networks, termed deep learning reconstruction (DLR), have been applied to CT image reconstruction to overcome the drawbacks of IR while achieving good image quality [10][11][12][13][14][15]. However, there have been a limited number of studies evaluating this technique in a small number of children and the technique was only evaluated in abdomen CT images [16][17][18]. The purpose of our study was to compare the objective and subjective image quality of DLR and IR on pediatric abdomen and chest CT images.

Methods
This was a retrospective study approved by the institutional review board at our institution, and the need for informed consent was waived.

Study population
We included all consecutive pediatric patients who underwent chest or abdomen CT at our institution between February 2020 and October 2020 with the same CT system (Revolution CT; GE Healthcare), which has a routine protocol including DLR. We retrospectively reviewed 51 patients. There were 34 boys and 17 girls with a mean age of 11.5 ± 4.6 years (range 1-18 years). Non-enhanced chest CT (n = 16), contrast-enhanced chest CT (n = 12), and contrast-enhanced abdomen CT (n = 23) images were included. Height and weight were recorded at the time of CT examination and BMI was calculated. Body weight group was divided as < 20 kg, 20-60 kg, and > 60 kg.

Phantom study
In general, signal to noise (SNR) and contrast to noise ratio (CNR) are used to measure the amount of noise (magnitude) in images. However, the standard deviation (SD) used in the SNR and CNR calculations has different values depending on the region of interest (ROI) position in the human body image with a non-homogeneous medium, and SNR and CNR only evaluate the noise magnitude. Noise power spectrum (NPS) is a method that can evaluate the magnitude and texture of image noise in the spatial frequency domain [19] and it can overcome the drawbacks of SD measurement in SNR and CNR calculation. For NPS analysis, we scanned the uniformity module of the Catphan 500 phantom (Catphan 500, The Phantom Laboratory, NY, USA), and performed three scans including the dose level of the patient image used in this study. We directly implemented a 3D-based NPS based on the method presented by the American Association of Physicists in Medicine (AAPM) [20], and used Matlab (Version R2017a, The MathWorks, Inc., MA, USA) for this calculation.

Scanning technique and radiation dose measurements
All patients were examined using a 256-slice CT (Revolution CT; GE Healthcare). Peak kilovoltage (kVp) was divided in to three groups by weight: 100 kVp for > 40 kg, 80 kVp for 15-40 kg, and 70 kVp for < 15 kg. An automatic dose modulation technique (Smart mA; GE Healthcare) was used with a range of 50-200 mAs. The noise index was 33 for abdomen CT and 22 for chest CT. Other parameters used to generate images were as follows: gantry rotation time, 0.35 s; coverage speed, 226.79 mm/s; pitch, 0.992:1; and slice thickness, 2.5 mm.
Weight-based IV contrast injection was used with settings of 1.5-2.0 ml/kg with a maximum of 100 ml, using 300 mg iodine/ml concentration intravenous contrast iobitridol (Xenetix; Laboratoires Guerbet). The contrast was injected through an upper extremity peripheral intravenous line, followed by a saline chaser of 0.5 ml/kg. Injection speed was adjusted for a total injection time of 15 s or less. For contrast-enhanced abdomen CT, a fixed time interval of 60 s after contrast injection for portal phase without bolus tracking was used. For contrastenhanced chest CT, a circular ROI was placed at the main pulmonary artery and the CT scan began 4 s after the threshold attenuation of 100 Hounsfield units (HU) was reached.
Four axial reconstructions were generated for each patient with a 2.5 mm slice thickness and 2.5 mm slice interval according to the standard algorithm: 50% ASIR-V, 100% ASIR-V, medium-and high-strength DLR (True-Fidelity; GE Healthcare). We set the blending factors to 50% and 100% according to previous experience [3,4]. DLR provides three selectable reconstruction strength levels (low, medium, and high) to control the amount of noise reduction with a standard reconstruction kernel. We chose medium and high based on our preliminary experience. TrueFidelity is the first clinically available deep learning-based CT reconstruction technique which is based on deep neural network trained with low-dose raw CT projection data. The ground truth data used to train the algorithm were filtered back projection CT images resulting from ideal data acquisition conditions, both from phantoms and patients in a clinical setting. The output is a reconstructed image that appears as if it had been reconstructed from high-dose raw CT data. However, the details about the network architecture and the training process are not publicly available [21].
The CT dose index volume (CTDIvol, mGy) and doselength product (DLP, mGy × cm) of all patients were recorded in both CT examinations. CTDIvol was converted to size-specific dose estimates (SSDE) based on the American Association of Physicists in Medicine Report 204 [22]. Patient-specific dimensions were obtained from axial CT images at the carina on chest CT and at the main portal vein on abdomen CT. We used the sum of anteroposterior and lateral dimensions to determine patient effective diameter and conversion factors. The following equation was used to calculate the effective dose (ED, mSv): ED = DLP × W T (tissue-weighting factor; variable according to kVp, organ, and age [23]). Tissueweighting factors of less than 80 kVp are unknown, so a tissue-weighting factor of 80 kVp was adopted for 70 kVp studies.

Quantitative image analysis
Quantitative analysis of axial images was performed by a board-certified radiologist with 9 years of experience. The mean attenuation (HU) and SD were measured by manually placing the round ROI (8-10 mm in diameter) using a picture archiving and communication system (PACS) workstation (Centricity Radiology RA1000; GE Healthcare) in the mediastinal/soft-tissue window setting (window level, 50 HU; window width, 350 HU). On chest CT images, ROIs were placed in lung and paraspinal muscles at the level of the carina. On abdomen CT images, ROIs were placed in liver, aorta, and paraspinal muscles at the level of the main portal vein on axial images. To obtain reliable measurements for the areas, each ROI was positioned to encompass the homogeneous portion and did not include surrounding structures or vessels. Image noise was defined as the SD of the pixel values obtained from the paraspinal muscle. Both contrast-and signal-to-noise ratios (CNR and SNR) were defined as CNR = |HU object − HU muscle |/SD noise and SNR = HU object /SD noise [24]. Also, we calculated the NPS peak (HU 2 mm 2 ) and NPS average spatial frequency (mm −1 ) from each NPS curve measured using phantom. The NPS peak shows the magnitude of the noise, and the NPS average shows the texture of the noise.

Qualitative image analysis
CT images were independently reviewed by two boardcertified pediatric radiologists with 17 and 9 years of experience who were blinded to the clinical findings and the CT reconstruction methods. Images were displayed on the PACS in random order and two radiologists independently recorded their opinions on overall image quality, noise, and motion or beam hardening artifacts. A four-point scale was used: 4 was superior, 3 was average, 2 was suboptimal, and 1 was unacceptable.

Statistical analysis
All statistical analyses were performed using MedCalc software (version 12.1.0; MedCalc Software). Patient demographic characteristics and dose descriptors (CTDI vol , DLP, SSDE, and ED) are summarized and presented as the mean and SD. Repeated measures ANOVA with pairwise comparisons and Bonferroni correction were performed to compare the reconstructions concerning attenuation, noise, CNR, and SNR. Wilcoxon signed

Results
The
Medium strength DLR also showed decreased noise in abdomen CT, but no significant difference was found in noise in chest CT when compared with 50% ASIR-V. Medium strength DLR showed better CNR and SNR in both non-contrast and contrast-enhanced chest CT; however, there was no significant difference in CNR and SNR in abdomen CT.
When compared with 100% ASIR-V, high strength DLR showed improved CNR in chest CT images without contrast enhancement by 24%. However, there was no significant improvement in CNR in both chest CT and  Table S1). Figure 3 shows the NPS curves according to the clinical dose levels and image reconstruction methods, and the NPS peak and average spatial frequency for each NPS curve are summarized in Table 2. In all image reconstruction methods, as the dose increased (1 to 5 mGy), the NPS peak decreased, and the decrease rate was similar to about 21%. At the same dose level, the NPS peaks of all reconstitution methods decreased in the order of 50% ASIR-V, DLR-M, 100% ASIR-V, and DLR-H. However, the peaks of 100% ASIR-V and DLR-M were almost similar. In all image reconstruction methods, the NPS average spatial frequency showed no significant difference according to the change in dose. However, DLR methods overall showed higher average spatial frequency values than ASIR-V, and in particular, the average spatial frequency of 100% ASIR-V showed the lowest average. Overall, the DLR methods showed a pattern of remarkably reducing the magnitude of noise while maintaining the texture.
We also analyzed the effects of body weight on noise reduction. In DLR group, the paraspinal muscle noise reduction was better in patients over 20

Qualitative image assessment
The results of the subjective image quality analyses are summarized in Table 3 and Fig. 4. The subjective assessment of overall image quality and noise were also better on DLR images both on medium and high strength compared to 50% ASIR-V (p < 0.001). The agreement was moderate for overall image quality and good for noise in high strength DLR (p < 0.001). However, there was poor There was no significant difference in motion or beam hardening artifacts between reconstruction methods with an excellent interobserver agreement (κ = 0.944, p < 0.001) (Fig. 5).

Discussion
Our study found that DLR can improve the quantitative and qualitative image quality in pediatric chest and abdomen CT relative to advanced IR technique, our standard 50% ASIR-V. High-strength DLR showed significant noise reduction with increased CNR and SNR. DLR also scored significantly better for image quality and noise subjectively. However, motion or beam hardening artifacts were not decreased with deep learning method, regardless of strength.
There have been efforts to improve image quality of low dose CT imaging by decreasing noise and artifacts with various reconstruction methods [25][26][27]. Recently, the DLR algorithm has been developed for CT to remove image noise. The effect of DLR on image quality and its potential to lower patient radiation dose is being investigated. A phantom study demonstrated that DLR had superior noise, magnitude, noise texture, and spatial resolution [11]. Another study also showed that DLR improves the image quality through noise reduction and increased CNR without altering the image texture on abdomen CT [12]. They demonstrated that subjective diagnostic confidence was increased in all DLR images when compared with ASIR-V with a 30% blending factor, and the higher strength in DLR lowers the noise with increased sharpness [13]. The SNR and CNR values of high-strength DLR images were higher than those of ASIR-V with 80 or 100% blending factor. Similar results were also reported in studies with different vendor systems and algorithms [10,14,15].
DLR has been introduced to pediatric patients in a few studies of abdomen CT [16][17][18]. Lim et al. [16] studied a 5-year-old patient's phantom and pediatric abdomen CT exams using a vendor-neutral DLR technique and demonstrated similar image quality with a hybrid IR technique. Brady et al. [17] used contrast-enhanced abdomen CT with DLR algorithm showing improved object detectability, reduced image noise, and high radiologist preference when compared to conventional IR images. About a 51% dose reduction using DLR was hypothesized based on mathematical extrapolation from this retrospective study. Lee et al. [18] used DLR with low iodine concentration abdominal dual-energy CT and showed decreased noise in DLR images without difference in CNR, overall image quality, and diagnostic quality of lesions. The CTDI vol and total iodine administration were lower in dual energy CT with DLR. Both studies suggested that DLR has the potential to improve image quality and potentially reduce patient radiation dose. However, no   Our study shows similar results in noise reduction and quality improvement. High strength DLR was associated with noise reduction in non-contrast chest CT, contrast-enhanced chest CT, and contrast-enhanced abdomen CT with an increase in both CNR and SNR. The subjective assessment of overall image quality and noise were also better on DLR images both on medium and high strength DLR compared to 50% ASIR-V. Our study showed no significant difference in attenuation values of the organs in pediatric chest and abdomen. This result is comparable with a previous report with an adult population [12]. Therefore, we can use CT images with DLR for attenuation analyses such as emphysema index measurements.
Previous studies have focused on noise reduction and image quality improvement of DLR with little focus on artifacts. DLR scored better on artifacts than 30% ASIR-V images in a previous study [12]. Another study reported no DLR related image artifacts [14]. A prior study has reported more frequent distortion artifacts with DLR [28]. In our study, there was no significant difference in artifacts between reconstruction methods with excellent inter-observer agreement on artifacts. Mainly these artifacts were beam hardening artifacts from metal or dense contrast media in vessels. The motion and beam hardening artifact reduction were not significant by TrueFidelity in our study. This may be due to a lack of learning about these artifacts and may suggest that TrueFidelity is weak in this perspective. Future learning about these artifacts may be required for better image reconstruction. However, unlike previous study, there was no significant distortion artifacts in our study. Depending on the purpose and input data of the DLR technology, the role of DLR may vary. It would be better if DLR algorithm is developed as an open source so that it can be used in various equipment and undergo further development by other researchers.
Our study has limitations. First, the sample size of our retrospective study was small, and we could not evaluate lesion detectability or diagnostic accuracy. Second, the data is from a designated vendor's DLR algorithm. Since it was hard to get the projection data from the vendors directly, we could not compare other DLR, such as the image-domain-based method. Third, the number of patients with artifacts was not the majority of the patient population. Fourth, from the retrospective nature of our study, we could not compare images between FBP and DLR. Fifth, our study cannot suggest an estimated radiation dose reduction using DLR. Additional prospective studies with more patients are needed.

Conclusions
Compared with 50% ASIR-V, DLR improved the CT evaluation of pediatric chest and abdomen images with significant noise reduction. However, motion or beam hardening artifacts were not decreased by DLR, regardless of strength.