75% radiation dose reduction using deep learning reconstruction on low-dose chest CT

Objective Few studies have explored the clinical feasibility of using deep-learning reconstruction to reduce the radiation dose of CT. We aimed to compare the image quality and lung nodule detectability between chest CT using a quarter of the low dose (QLD) reconstructed with vendor-agnostic deep-learning image reconstruction (DLIR) and conventional low-dose (LD) CT reconstructed with iterative reconstruction (IR). Materials and methods We retrospectively collected 100 patients (median age, 61 years [IQR, 53–70 years]) who received LDCT using a dual-source scanner, where total radiation was split into a 1:3 ratio. QLD CT was generated using a quarter dose and reconstructed with DLIR (QLD-DLIR), while LDCT images were generated using a full dose and reconstructed with IR (LD-IR). Three thoracic radiologists reviewed subjective noise, spatial resolution, and overall image quality, and image noise was measured in five areas. The radiologists were also asked to detect all Lung-RADS category 3 or 4 nodules, and their performance was evaluated using area under the jackknife free-response receiver operating characteristic curve (AUFROC). Results The median effective dose was 0.16 (IQR, 0.14–0.18) mSv for QLD CT and 0.65 (IQR, 0.57–0.71) mSv for LDCT. The radiologists’ evaluations showed no significant differences in subjective noise (QLD-DLIR vs. LD-IR, lung-window setting; 3.23 ± 0.19 vs. 3.27 ± 0.22; P = .11), spatial resolution (3.14 ± 0.28 vs. 3.16 ± 0.27; P = .12), and overall image quality (3.14 ± 0.21 vs. 3.17 ± 0.17; P = .15). QLD-DLIR demonstrated lower measured noise than LD-IR in most areas (P < .001 for all). No significant difference was found between QLD-DLIR and LD-IR for the sensitivity (76.4% vs. 72.2%; P = .35) or the AUFROCs (0.77 vs. 0.78; P = .68) in detecting Lung-RADS category 3 or 4 nodules. Under a noninferiority limit of -0.1, QLD-DLIR showed noninferior detection performance (95% CI for AUFROC difference, -0.04 to 0.06). Conclusion QLD-DLIR images showed comparable image quality and noninferior nodule detectability relative to LD-IR images. Supplementary Information The online version contains supplementary material available at 10.1186/s12880-023-01081-8.


Introduction
Low-dose chest CT (LDCT) is widely used for the diagnosis and follow-up of various lung diseases.Specifically, lung cancer screening using LDCT has been confirmed to reduce lung cancer mortality in several large-scale, randomized trials [1][2][3]; therefore, increasing number of nations are implementing lung cancer screening programs and recommending annual LDCT screening for high-risk asymptomatic individuals [4][5][6].However, the cumulative radiation dose could be a major concern given the increasing number of LDCT examinations.With the technical advances achieved by CT vendors and improvements in reconstruction techniques, the radiation dose required for acquiring reliable chest CT images has steadily decreased.In particular, iterative reconstruction (IR), which substantially reduces image noise by sequentially adjusting the estimated reconstructions and the measured projections, has become a standard reconstruction technique for most CT vendors [7,8].
Deep learning has been widely applied for various indications of medical imaging, including lesion detection, classification, segmentation, and noise reduction [9][10][11].Several deep learning-based noise reduction algorithms have been proposed and tested in LDCT, and these algorithms have been reported to reduce noise and improve image quality substantially [12,13].Jiang et al. recently reported that lung nodule detection performance on ultralow-dose chest CT improved when using deep-learning reconstruction compared with IR [14].However, insufficient evidence exists regarding whether deep-learning reconstruction may reproduce image quality and lesion detectability using images acquired with a decreased radiation dose.In this study, we aimed to evaluate the image quality and lung nodule detectability of CT images generated using a quarter of the low dose (QLD) and reconstructed with a commercial vendoragnostic deep-learning image reconstruction (DLIR) in comparison with those of standard LDCT images reconstructed with a dedicated IR algorithm.

Patients and LDCT
Data from patients who underwent LDCT using a dualsource scanner (SOMATOM Force; Siemens Healthineers) between August 2018 and September 2018 were retrospectively collected at a tertiary care center (Seoul National University Hospital).All CT images were reviewed by a radiology resident (G.D.J. with 4 years of experience in chest CT interpretation).The patients were collected consecutively, while patients with more than five nodules, acute lung disease including pneumonia or pneumothorax, or severe architectural distortion were excluded to focus more clearly on the aims of the present study (Fig. 1).Demographic information (age, sex) and CT radiation dose information (CTDI vol , DLP, and effective dose) were documented.All CT images were reviewed and the presence of a lung nodule was determined consensually by a thoracic radiologist (J.G.N. with 9 years of experience) and a radiology resident (G.D.J.).The size of all nodules was calculated as the average of the maximal long-axis and the maximal perpendicular short-axis measurements.

CT Acquisition and Image Reconstruction
For all LDCT scans, radiation was provided using two generators, whose radiation dose was split into a 1:3 ratio.QLD CT images were generated using the data acquired from a single generator, which provided a quarter dose of radiation, while standard-dose LDCT images were generated using the combined data from two generators.For each scan, the CT parameters were set as follows: tube voltage, 120 kVp; automatic tube current modulation by Care Dose 4D system (Siemens Healthineers) with quality reference tube-current-time product (15 mAs for one tube and 5 mAs for the other tube) and target CTDI vol (1.36 mGy); detector collimation, 0.6 mm; detector pitch, 1.15; and gantry rotation period, 285 ms.The median scan range was 41.1 (IQR, 38.6-42.6)cm.Contrast media was not used.
For image reconstruction, advanced modeled iterative reconstruction (ADMIRE, level 3) was applied to standard-dose LDCT images (hereafter, LD-IR images) [7], and deep-learning image reconstruction (high level for soft-kernel reconstruction, intermediate level for sharpkernel reconstruction) was applied to QLD CT images (hereafter, QLD-DLIR images).Representative images are shown in Figs. 2, 3 and 4.

Deep-learning image reconstruction
We used a commercial, vendor-agnostic deep-learning image reconstruction software (DLIR; ClariCT.AI, ClariPI Inc.), which has received European (CE Mark) and Korean regulatory approval (Korean Food and Drug Administration) [15].This software takes filtered back-projection images as input and generates denoised images, and users can manipulate the optimal denoising level.This software is applicable to any type of filtered back projection images regardless of CT vendor, scan protocol, reconstruction kernel, and section thickness.A detailed description of the software is provided in Appendix E1 (online).

Quantitative and qualitative image quality assessment
Image noise and the signal-to-noise ratio (SNR) were measured by a radiology resident (G.D.J.) in five different locations, including the lung parenchyma, trachea, aorta, muscle, and axillary fat (Figure E1).Image noise was defined as the standard deviation of the HU values within a region of interest larger than 0.5 cm 2 , while the signalto-noise ratio was calculated as the absolute average HU value divided by the noise.To assess the spatial resolution, the edge-rise-distance (ERD) was measured semiautomatically at pulmonary vessels running in the axial plane.The ERD was defined as the distance between two points yielding 10% and 90% of the maximal intravascular HU values [12,16].All quantitative measures were evaluated for both 3-mm section-thickness standard-kernel images and 1-mm section-thickness sharp-kernel images.
For a qualitative assessment, three fellowship-trained thoracic radiologists (J.H.H., D.S.K., J. P. with 8-10 years of experience) evaluated the image quality of 200 randomly arranged image sets (QLD-DLIR and LD-IR) from 100 patients.Each radiologist assessed all 200 sets of images and these images were randomly distributed, ensuring that pairs of QLD-DLIR and LD-IR images from the same patient were not presented together.The radiologists independently reviewed subjective noise, spatial resolution, the presence of artifacts (distortion and beam-hardening artifacts), and overall image quality using a 4-point scale (1-4; a higher score indicated better image quality, Table E1 [online]).A distortion artifact was defined as the presence of image distortion generated from image reconstruction algorithms, typically false miliary nodules on the lung-window setting and granular distortion of mediastinal structures [12].The primary evaluation was conducted using 3-mm sectionthickness standard-kernel images, and 1-mm sectionthickness sharp-kernel images were provided as a pair.The radiologists were blinded to patients' demographics, clinical indications, and the reconstruction technique of the images.Inter-reader agreement was assessed using the intraclass correlation coefficient (ICC) based on a two-way mixed-effect model incorporating consistency and average measures.The agreement levels were categorized as follows: poor (< 0.50), fair (0.50-0.75), good (0.75-0.90), and excellent (0.90-1.00).

Lung nodule detectability assessment
For nodule detectability, a performance test was conducted using the same 200 randomly arranged image sets used in the qualitative assessment.The three aforementioned fellowship-trained thoracic radiologists detected and localized all clinically significant nodules (solid or part-solid nodules ≥ 6 mm; Lung-RADS category 3 or 4) [17].The radiologists rated their confidence in lesion detection using a 5-point scale, where a higher score indicated higher confidence in the presence of a Lung-RADS category 3 or 4 nodule [18].For evaluation, 3-mm section-thickness standard-kernel images were provided as main images, and 1-mm section-thickness sharp-kernel images were provided as a pair for the further evaluation of nodule morphology (i.e., solid or subsolid) and to make accurate measurements.The radiologists were blinded to patients' demographics, clinical indications, and the reconstruction technique of the images.

Statistical analysis
Image quality metrics from QLD-DLIR and LD-IR were compared using the Wilcoxon signed-rank test or paired t-test, as appropriate.To assess lung nodule detectability, area under the jackknife free-response receiver operating characteristic curves (AUFROCs) were evaluated and compared between QLD-DLIR and LD-IR images, and the noninferiority limit was established as -0.1 [19,20].Sensitivity and specificity were compared using the McNemar test for individual radiologists and generalized estimating equations based on an exchangeable correlation matrix for the pooled radiologists.AUFROC analysis was performed using JAFROC version 4.2.1 and ICC was calculated using MedCalc version 20.218 (MedCalc software, Mariakerke, Belgium), while other statistical analyses were performed using SPSS version 25 (IBM Corp., Armonk, NY, USA).The statistical analyses were conducted by two radiologists (G.D.J. and J.G.N.) with 4 and 9 years of experience in medical statistical analyses.For all tests, P < .05indicated statistical significance.

Discussion
In this study, we compared the image quality of chest CT generated using a quarter dose of radiation and reconstructed with commercial deep-learning software (QLD-DLIR) to that of conventional LDCT images generated using full radiation dose and reconstructed with a dedicated IR technique (LD-IR).In the quantitative analysis, the QLD-DLIR images showed overall better noise, SNR, and ERD than the LD-IR images, suggesting better noise, image contrast, and spatial resolution, respectively.In the subjective, qualitative assessment, the QLD-DLIR and LD-IR images received comparable image quality scores in the lung evaluation, whereas the QLD-DLIR images showed lower spatial resolution with more noise and artifacts in the mediastinal evaluation.The three thoracic radiologists found no significant differences in overall image quality between the QLD-DLIR and LD-IR images.
The detection performance of significant lung nodules was also evaluated.The radiologists did not show significantly different performance in detecting Lung-RADS 3 or 4 nodules on the QLD-DLIR and LD-IR images.
The noninferiority of QLD-DLIR relative to LD-IR was confirmed.
The radiation dose required for reliable lung evaluation has substantially decreased since the introduction of IR, enabling LDCT to become the mainstream CT protocol in screening for lung diseases, including lung cancer.More recently, deep learning-based reconstruction has demonstrated excellent noise-reduction power, surpassing IR, suggesting the possibility of further radiation dose reduction [21][22][23].Several studies have reported the feasibility of ultralow-dose chest CT images reconstructed with deep learning software [12,14,24].However, those  The commercial deep-learning software we used in this study, ClariCT.AI, has the advantage of being applicable to any CT images (vendor-agnostic) without any vendor-specific adaptations.The software was trained with multi-vendor images through the synthetic sinogrambased low-dose simulation technique [25], producing generalizable denoising quality for diverse images.It has also been reported to produce less deep learning-specific image distortion [12], possibly by preserving the noise frequency spectrum during the denoising process.We tested a single software instead of various deeplearning reconstruction models, as it was technically the only applicable commercial deep-learning denoising software for the CT images taken from our dual-source scanner; however, a further comparison with other deeplearning models using multi-vendor CT scans would be warranted.
To assess the clinical feasibility of the images, we compared nodule detectability between QLD-DLIR and LD-IR.Detecting lung nodules is one of the major indications of CT, especially in the screening setting.We considered Lung-RADS category 3 or 4 nodules as positive, since those nodules alter follow-up plans in lung cancer screening programs [17].Three thoracic radiologists showed no significant differences in sensitivity, specificity, false-positive rates, and AUFROC between QLD-DLIR and LD-IR.When nodules missed by the radiologists were reviewed (false-negative), the nodules were visualized comparably in both LD-IR and QLD-DLIR images (Figure E3), suggesting that the missing of nodules was mainly attributed to random human error rather than a different imaging technique.Noninferior nodule detection performance of QLD-DLIR over LD-IR was demonstrated in the JAFROC analysis.Jiang et al. also demonstrated the feasibility of deep learning-reconstructed ultralow-dose CT for nodule detection [14]; however, that study did not compare the performance to conventional LDCT images.Our study results suggest that radiation dose reduction up to 75% could be tried for LDCT scans conducted for the purpose of lung nodule screening.However, further validation on diverse vendors in detecting diverse abnormalities other than lung nodules should be warranted.
While the QLD-DLIR images exhibited superior results over the LD-IR images for most parameters in the quantitative image quality assessment, the radiologists assessed that LD-IR showed comparable to better image quality than QLD-DLIR for most parameters.Of particular note, the radiologists gave LD-IR better scores in overall image quality for 1-mm, sharp-kernel images.This discrepancy may primarily be due to the following two reasons.First, the radiologists were more accustomed to the texture of IR-reconstructed images and were relatively unfamiliar with DLIR-reconstructed images.In addition, the radiologists found a considerably higher level of beamhardening artifacts (3.01 vs. 3.47; P < .001),which was not assessed by the quantitative measures, and this might have affected the overall image quality assessment.Second, as DLIR was trained to reduce quantitative metrics, typically measured image noise, DLIR might have advantages in quantitative assessment.As we optimized the DLIR settings mainly for 3-mm, standard-kernel images, the 1-mm, sharp-kernel images generally yielded lower scores.Further optimization of DLIR for each image type and the addition of a beam-hardening artifact-reduction algorithm would enhance image quality and reader preferences for QLD-DLIR images.
Our study has some limitations.First, because of its retrospective nature, selection bias could have affected the comparison of image quality and lung nodule detection performance.To minimize selection bias, patients were selected consecutively.In addition, patients with six or more nodules were excluded to focus the review on nodule detection performance, which could have also yielded additional selection bias.Second, images were taken using a single CT scanner, limiting the generalizability of the study results.Third, only one deep-learning reconstruction software was tested.Fourth, the number of subsolid nodules included in this study was small (n = 4), and thus meaningful subgroup analysis for subsolid nodules was limited.Further studies assessing if DLIR may properly preserve the morphology and size of the subsolid nodules would be beneficial.Fifth, the radiologists showed poor inter-reader agreement in assessing some qualitative parameters including subjective noise and presence of distortion artifacts.In addition, the radiologists gave comparable to lower scores to QLD-DLIR in assessing subjective noise while it showed lower measured noise level, possibly affected by different image textures or variable reader-familiarity to the technique.Lastly, the limit for noninferiority in nodule detection performance was set empirically rather than from preliminary analyses of nodule detection performance.
In conclusion, deep learning-reconstructed QLD images showed comparable image quality and noninferior nodule detectability to standard LDCT images reconstructed with IR.

Fig. 2 A
Fig. 2 A woman received low-dose chest CT for the follow-up evaluation of a previously detected 8-mm ground-glass nodule in the left upper lobe.(A-C) Conventional low-dose chest CT images were reconstructed with iterative reconstruction, and (D-F) the images generated using a quarter dose of radiation were reconstructed with commercial deep-learning software.(A, D) Soft-tissue structures, including the aorta, subcutaneous fat, and paraspinal muscles, were visualized with a lower noise level in the mediastinal-window setting.(C, F) A ground-glass nodule in the left upper lobe was well visualized with a sharp margin on both images (arrow)

Table 1
Patient Characteristics and Radiologic Findings Note.-Categorical variables are presented as counts (%) and continuous variables as median[IQR].CTDI vol =volume CT dose index, LDCT = low-dose CT, IQR = interquartile range, Lung-RADS = lung imaging reporting and data system, QLD = quarter of the low dose.

Table 2
Quantitative Image Quality Assessment ResultsNote.-Dataare presented as means ± standard deviations.Italicized data indicate that the values are lower (for noise and edge-rise-distance) or higher (for SNR) than the compared counterpart.DLIR = deep-learning image reconstruction, IR = iterative reconstruction, LD = low dose, QLD = quarter of the low dose, SNR = signalto-noise ratio *P-values were calculated using the paired t-test

Table 3
Qualitative Image Quality Assessment Results

Table 4
Nodule-Based Estimates of the Detection of Lung-RADS Category 3 or 4 NodulesNote.-AUFROCvalues are presented with 95% confidence intervals.FP rates were calculated as the total number of FP nodules divided by the total number of patients (n = 100) AUFROC = area under the jackknife free-response receiver operating characteristic curve, DLIR = deep-learning image reconstruction, FP = false positive, IR = iterative reconstruction, LD = low dose, Lung-RADS = lung imaging reporting and data system, QLD = quarter of the low dose *P-values were calculated using jackknife free-response receiver operating characteristic curve analysis (for the AUFROC), the McNemar test (for the sensitivity of individual radiologists), the chi-square test (for the FP rate of individual radiologists), or generalized estimating equations (for the sensitivity and specificity of pooled radiologists)