Skip to main content

Evaluation of the models generated from clinical features and deep learning-based segmentations: Can thoracic CT on admission help us to predict hospitalized COVID-19 patients who will require intensive care?



The aim of the study was to predict the probability of intensive care unit (ICU) care for inpatient COVID-19 cases using clinical and artificial intelligence segmentation-based volumetric and CT-radiomics parameters on admission.


Twenty-eight clinical/laboratory features, 21 volumetric parameters, and 74 radiomics parameters obtained by deep learning (DL)-based segmentations from CT examinations of 191 severe COVID-19 inpatients admitted between March 2020 and March 2021 were collected. Patients were divided into Group 1 (117 patients discharged from the inpatient service) and Group 2 (74 patients transferred to the ICU), and the differences between the groups were evaluated with the T-test and Mann–Whitney test. The sensitivities and specificities of significantly different parameters were evaluated by ROC analysis. Subsequently, 152 (79.5%) patients were assigned to the training/cross-validation set, and 39 (20.5%) patients were assigned to the test set. Clinical, radiological, and combined logit-fit models were generated by using the Bayesian information criterion from the training set and optimized via tenfold cross-validation. To simultaneously use all of the clinical, volumetric, and radiomics parameters, a random forest model was produced, and this model was trained by using a balanced training set created by adding synthetic data to the existing training/cross-validation set. The results of the models in predicting ICU patients were evaluated with the test set.


No parameter individually created a reliable classifier. When the test set was evaluated with the final models, the AUC values were 0.736, 0.708, and 0.794, the specificity values were 79.17%, 79.17%, and 87.50%, the sensitivity values were 66.67%, 60%, and 73.33%, and the F1 values were 0.67, 0.62, and 0.76 for the clinical, radiological, and combined logit-fit models, respectively. The random forest model that was trained with the balanced training/cross-validation set was the most successful model, achieving an AUC of 0.837, specificity of 87.50%, sensitivity of 80%, and F1 value of 0.80 in the test set.


By using a machine learning algorithm that was composed of clinical and DL-segmentation-based radiological parameters and that was trained with a balanced data set, COVID-19 patients who may require intensive care could be successfully predicted.

Peer Review reports


Severe COVID-19 patients who are admitted to the inpatient ward due to the need for supplemental oxygen or due to evidence of systemic inflammation must be monitored for the development of critical illness, a rapid increase in oxygen needs and/or an increasing systemic deterioration [1]. For patients who progress to a critical illness level, transfers to the intensive care unit (ICU) are required; additionally, depending on the severity of the condition, the patient may also need oxygen delivery through a high-flow device, noninvasive ventilation, invasive mechanical ventilation, or extracorporeal membrane oxygenation [1]. Planning the ICU bed capacity is of primary importance during pandemic surges [2] since limitations in the ICU bed capacity have been reported to have an effect on mortality [3]. Thus, it is important to predict the need for ICUs, especially for patients with a severe clinical condition that requires inpatient treatment [4, 5]. Additionally, starting remdesivir in the ward was recommended if disease progression was predicted [1].

Although models for identifying ICU candidate patients have been reported, most of these models are based only on clinical data [6,7,8,9,10,11]. In a study where the candidate parameters included the presence or absence of chest X-ray findings, it was noted that this parameter was not included in the final model [12]. Promising results were achieved by combining the clinical data with the semiquantitative visual severity scores (VSS) depending on the volume, type, and extent of the infiltration that were measured on chest X-ray and CT [13,14,15].

Radiomics analysis extracts different quantitative data from medical images with various algorithms, and these data are used in further analyses for decision support [16]. Studies using radiomics models and machine learning methods have shown that these methods can diagnose COVID-19 [17, 18] and can determine its prognosis [19]. Although combined models of the clinical and radiomics parameters in RT-PCR-positive cohorts were reported [20], studies evaluating the efficacy of models that include clinical, quantitative volumetric and radiomics parameters for predicting disease progression in hospitalized COVID-19 patients are lacking.

Using deep learning (DL) for a COVID-19 diagnosis was previously studied by using chest X-ray and CT parameters in pretrained or customized models, and the results were successful [21]. DL networks are also used for automated segmentation, and a high accuracy was shown in the U-Net architecture for the CT images in COVID-19 patients [22].

The aim of this study was to generate and compare models that predict the need for ICUs in hospitalized COVID-19 patients using clinical features and volumetric and radiomics data that were calculated by automated segmentations.

Materials and methods

This retrospective, cross-sectional, single-center study was approved by our institution’s review board (EK-E1-21-2090), and written informed consent was waived. All of the procedures that were performed in this study were in accordance with the 1964 Helsinki Declaration and its later amendments.

Study population

A total of 268 RT-PCR-positive severe COVID-19 patients hospitalized consecutively in our inpatient ward between March 2020 and March 2021 were evaluated (Fig. 1). All these patients had one or more of the criteria for severe illness [1]: an SpO2 < 94% when breathing room air, a < 300 mmHg arterial partial oxygen pressure to fraction of inspired oxygen (PaO2/FiO2) ratio, a respiratory rate > 30 per minute or infection involving more than 50% of the lung parenchyma. Patients were transferred to the ICU when one or more of the signs of critical illness [1], including acute respiratory distress syndrome, septic shock, and multiorgan failure, had developed.

Fig. 1
figure 1

Flowchart of the study. Clinical, Radiological and Combined models are the final models in cross-validation. LR is Logistic regression

The inclusion criteria in this patient group were patients older than 18 years of age, patients who had a thoracic CT scan in our hospital, patients who did not receive any steroid or antiviral treatment before the CT study, patients with no interstitial pulmonary disease, and patients who had not undergone pulmonary surgery, and 222 patients met these criteria. In this group, patients with enhanced CT examinations (n = 7, all were suspicious for embolism), respiratory artifacts (n = 14), massive pleural effusion (over two-thirds of the hemithorax) (n = 4), pneumothorax (n = 1), and cystic lung disease (n = 1) were excluded from the study.

CT protocol

CT studies were performed with a 128-detector system (GE Revolution, General Electric, Milwaukee, WI) from the first rib to the adrenal glands, nonenhanced by using the following parameters: 100 kV, 110 mAs, body filter, a 1.25 mm slice thickness, a 512 × 512 reconstruction matrix, a spiral pitch factor of 1.375:1, BonePlus convolution kernel, adaptive statistical iterative reconstruction of 70%.

Deep learning segmentation and radiomics feature calculation

The entire lung parenchyma and pneumonic lesions were segmented by using Quibim’s U-Net model, which is a convolutional neural network architecture that uses the ResNet-34 backbone, which was developed for the ‘A European initiative for automated diagnosis and quantitative analysis of COVID-19 on imaging’ project.

Slices of the studies were preprocessed as the segmentation model input by applying a constant lung window level (WW = 1600, WL = − 600), normalization in range [0, 1], as well as by using the Balance Contrast Enhancement Technique (BCET preprocessing). Thus, the basic shapes of the image histograms were maintained.

Several metrics were accounted for in order to evaluate the segmentation model. Dice Similarity Coefficient (DSC) and Intersection over Union (IoU) values were calculated on all of the scans with at least 1000 voxels in the ground truth segmentation. Due to the fact that DSC and IoU are zero in cases without ground truth mask, the final test set for these metrics was determined by using a histogram-based threshold of more than 1000 positive voxels. Average false positive and false negative volumes were calculated for all of the scans. In addition, Pearson's correlation coefficients of positive prediction and ground truth were determined.

Two authors (MG, 16 years of experience and EO, last year of residency training) checked whether all ground glass opacities (GGO), consolidation or crazy paving areas in the CT studies were segmented by the DL algorithm. It was noted that DL did not segment GGO that were smaller than 1 mL, and patients who only had such lesions were excluded from the study (n = 4). Finally, the final study population consisted of 191 patients.

The radiomics features were calculated using Quibim Texture Analysis software (Quibim SL, Valencia, Spain) from the obtained segmentations by the following parameters: (1) Resampled voxel size 1 × 1 × 1 mm3 by using bicubic interpolation, (2) Fixed bin-width of 25 for gray value discretization, (3) Density normalization according to Eq. (1):

$$f\left( x \right) = \frac{{\left( {x - \mu_{x} } \right)}}{{\sigma_{x} }} \times S$$

where f(x) is the normalized voxel density, x is the original density, μx is the mean density, σx is the standard deviation and S is the scaling factor (set to 500). (4) A voxel array shift of 1024 was added to prevent the negative values from being squared. (5) Second-order matrices were calculated using a distance of 1 voxel and 13 isotropic displacement vectors at angles of 0°, 45°, 90° and 135°.

Statistical analysis

Patients were categorized into Group 1 (patients who recovered with treatment in the inpatient ward) and Group 2 (patients transferred to the ICU for progressive disease from the inpatient ward).

The data obtained from the patients were divided into (1) clinical data consisting of the demographic data of the patients, comorbid disease history, therapeutics given to the patient in the inpatient service, oxygen saturation, complete blood count, biochemical parameters, and acute phase reactants obtained at admission and (2) radiological data consisting of the volumetric data of the whole lung, the inflamed lung parenchyma as segmented by DL and the first- and second-order radiomics parameters calculated from the segmented lesions.

Comparison of the nominal data between the two groups was performed with the chi-squared test or the Fisher's exact test. For continuous data, the values of a normally distributed parameter were given as the mean ± SD, and the values of nonnormally distributed parameters were provided as the median (IQR). Comparisons of the groups were conducted with the T-test or Mann–Whitney test, accordingly.

If a parameter differed significantly between the two groups, the area under the curve (AUC) was calculated with the receiver operator characteristic (ROC) test, and the cutoff value, optimal sensitivity, and specificity were determined by using the Youden index. Logistic regression was used for the univariate nominal parameters to calculate the sensitivity and specificity.

After the patient population was randomly divided into a training and cross-validation set (n = 152, 79.6%) and a test set (n = 39, 20.4%), logit fit models were created by using the Bayesian information criterion (BIC) from the training set. The clinical model was selected from the clinical data, and the radiological model was selected from the radiological data. A combined model from both clinical and radiological data was also constructed. The adequacy of the model’s parameters in predicting the categorical outcomes was evaluated with the Hosmer–Lemeshow goodness-of-fit test. Multicollinearity was evaluated by calculating the variance inflation factor (VIF).

Models were optimized by calculating cost function (log loss), and the gradient descent optimization algorithm and initial theta vectors were replaced with the optimized ones. By using a tenfold cross-validation, the mean sensitivity, specificity, and accuracy values of the model were calculated by averaging all of the cross-validation results, and the model-specific cutoff values for each model were calculated via the Youden index of ROC analyses [23]. The test set results were obtained by using optimized models and model-specific cutoff values. The C-index and 95% CI values of the models were further separately calculated for the training and cross-validation sets via 1000 bootstrapping studies.

To solve the class imbalance problem, the Synthetic Minority Oversampling Technique (SMOTE) algorithm was used by using the “smotefamily” package in the R statistical computing environment (R Foundation for Statistical Computing, Vienna, Austria) [24]. During the generation of the synthetic data, k = 3 was selected for the K-nearest neighbor algorithm.

Another model including all of the clinical and radiological parameters in the study was created via the random forest classification algorithm, and this model was trained with the balanced training set containing the synthetic data. The effectiveness of the final random forest model was evaluated with the same test set that was used for the logit-fit models.

Statistical analyses were performed using IBM SPSS v23 (IBM Corp, Armonk, NY), MedCalc v20.011 (MedCalc Software bvba, Ostend, Belgium), R v4.0.2 (R foundation, Vienna, Austria) and XLStat statistical and data analysis v2021.3.1 (Addinsoft, NY, USA). The power analysis was conducted using G*Power 3.1.


Group features, demographics, symptoms and findings

There were 117 patients in Group 1 (61.6%) and 74 in Group 2 (38.4%). The mean age of the patients was 65.45 ± 14.02 (26–96 years), 57.6% of the patients were male, and 42.4% were female. ICU patients were followed in the ward for an average of 3.2 days (1–12 days) prior to transfer to the ICU. Whereas Group 1 patients were discharged with a mean duration of 8.8 ± 4.7 days (2–29 days), Group 2 patients had a mean duration of 19.2 ± 13.8 days (7–41 days, including ICU stay) of hospitalization that resulted in either death (n = 3) or discharge (n = 71).

The mean age of the patients and the number of males were higher in Group 2, and the differences were significant (Table 1). Regarding the symptoms and findings, only fever was significantly different between the two groups. Among the comorbidities, patients diagnosed with chronic renal failure or coronary heart disease required significantly more ICU admissions (Table 1). Patients who needed corticosteroids in the ward were more frequently transferred to the ICU (Table 1).

Table 1 Demographics, symptoms, findings, comparison between the groups and discrimination assessment

Laboratory findings

Most of the laboratory findings differed significantly between the two groups (Table 2). However, at the time of admission, it was observed that patients who needed ICU care did not have a lower oxygen saturation value. Among the blood tests, procalcitonin was the most effective univariate classifier (Table 2).

Table 2 Laboratory findings, comparison between the groups and results of ROC analysis

DL segmentation findings and radiomics

The time between the onset of symptoms and the CT examination was 6 [7] days in Group 1 and 5 [5] days in Group 2 (p = 0.775, Mann–Whitney test). The positive RT-PCR test result and the CT study were conducted on the same day.

The DL algorithm segmented both the whole lung tissue and the pneumonic areas of COVID-19 infection in the patients. In Group 2 patients, both the percentage and volume of pneumonic tissue secondary to COVID-19 were significantly higher (Table 3) than those in Group 1 patients. Additionally, in Group 2, the mean total lung volume was decreased by 11.2% compared to that in Group 1.

Table 3 Volumetric data, comparison between the groups and results of ROC analysis

Eighteen first-order and 58 second-order radiomics parameters were calculated from the segmentations (Table 4). The skewness was higher in Group 1, and the mean density was higher in Group 2 (Table 4).

Table 4 Radiomic parameters, comparison between the groups and results of ROC analysis

Predictive logit-fit models

None of the clinical, volumetric or radiomics parameters provided a dependable univariate classifier. Therefore, logit-fit models were created (Table 5).

Table 5 Features of clinical, radiological and combined models in detecting ICU candidate COVID-19 patients

The clinical model’s PP was calculated using Eq. (2):

$$\begin{aligned} {\text{PP}} & = {1}/({1} + {\text{exp}}{-}({-}{5}.{18}0 + 0.0{46} \times {\text{Age}}{-}0.00{41} \times {\text{PLT}}{-}0.0{1}0 \times {\text{GFR}} \\ & \quad + 0.0{11} \times {\text{AST}} + 0.00{8} \times {\text{LDH}} + {2}.{556} \times {\text{PCT}} + 0.{787} \times {\text{Fever}})) \\ \end{aligned}$$

where PLT is the platelet count, GFR is the estimated glomerular filtration rate, AST is aspartate aminotransferase, LDH is lactate dehydrogenase and PCT is procalcitonin. Although the clinical model had good specificity, its sensitivity was limited in the training and validation sets (Table 5).

Radiological model’s PP calculated using Eq. (3):

$$\begin{aligned} {\text{PP}} & = {1}/({1} + {\text{exp}}{-}({-}{48}.{219} + 0.0{45} \times {\text{PIL}}{-}0.0{3}0 \times {\text{RMAD}} - {3}.{664} \\ & \quad \times {\text{Skewness}} + {8}.{532} \times {\text{GLCM Sumentropy}} + {82}.{32}0 \times {\text{LDLGLE}})) \\ \end{aligned}$$

where PIL is the percent of infected lung, RMAD is Robust Mean Absolute Deviation, and LDLGLE is GLDM-LargeDependenceLowGrayLevelEmphasis. This model showed a better sensitivity but a worse specificity than the clinical model (Table 5).

Combined model’s PP calculated using Eq. (4):

$$\begin{aligned} {\text{PP}} & = {1}/({1} + {\text{exp}}{-}({2}.{633} + 0.0{51} \times {\text{PIL}}{-}{4}.{596} \times {\text{S}} + 0.000{3}0{3} \times {\text{C}} + 0.{435} \\ & \quad \times {\text{GLE}} + 0.0{85} \times {\text{NLR}}{-}0.00{7} \times {\text{PLT}}{-}0.0{39} \times {\text{GFR}} + 0.0{2}0 \times {\text{AST}})) \\ \end{aligned}$$

where PIL is the percent of infected lung, S is the skewness, C is GLCM-Clustershade, GLE is GLSZM-LargeAreaLowGrayLevelEmphasis, NLR is the neutrophil-to-lymphocyte ratio, PLT is the platelet count, GFR is the estimated glomerular filtration rate, and AST is aspartate aminotransferase. This model had the highest AUC and specificity (Table 5).

The calculated p values for the clinical, radiological, and combined models were 0.376, 0.399, and 0.631, respectively, in the Hosmer–Lemeshow test. Radiologic and combined models showed better calibration than the clinical model (Fig. 2). The VIF value was less than 3.0 for all parameters in the models; thus, there was no significant multicollinearity.

figure 2

Calibration plot of Clinical (a), Radiological (b), and Combined (c) models

The optimal cutoff-off values were calculated for the clinical, radiological, and combined models as 0.565, 0.444, and 0.429, respectively (Fig. 3).

figure 3

Prediction probability score versus model score graph of clinical (a), radiological (b), and combined (c) models. In the ROC analysis of the cross-validation sets, the optimal cutoff values of the models were determined and marked by using the Youden index

Test set features and test set results of logit-fit models

Fifty-nine patients in the training set and 15 in the test set were transferred to the ICU (p = 0.908, chi-squared test). The median age of the patients was 65 (22.8) years in the training set and 61 [20] years in the test set (p = 0.056, Mann–Whitney test). The training set included 88 males and 64 females, and the test set included 22 males and 17 females (p = 0.867, chi-squared test).

In the test set, the combined model produced the best AUC, followed by the clinical model (Table 5).

Synthetic data generation and random forest algorithm

Despite the fact that they were well calibrated, there were problems with the logit-fit models due to the study population. The data had a low sample size and were affected by class-imbalance. Due to the fact that our sample size was low, the BIC method, which penalizes complex models with more parameters [25], was preferred in the model selection for logit-fit models to avoid overfitting [26]. In addition to the logit-fit models, a random forest classification algorithm, as a method that resistant to overfitting, was used for generating a model that uses all of the study parameters.

While 93 (61%) patients in the training set did not require intensive care, 59 (39%) patients were transferred to the ICU. The use of the unbalanced training set, especially for the high-dimensional data, was reported as a reason for model bias in favor of the majority class [27]. To solve this problem, the instance of the minority class, which involves patients being transferred to the ICU, was increased to 93 by using the SMOTE algorithm.

The prediction results on the test set of the model that was trained with the random forest algorithm on a more balanced training set did not lead to an increase in specificity (87.5%). On the other hand, sensitivity (80%) was increased, and increases in accuracy (84.6%), AUC (0.837), precision (0.80), and F1 score (0.80) were followed.

A feature importance study was also conducted for the random forest model (Fig. 4). Overall importance was the highest in the PCT, followed by Skewness, LDH, PIL, CK, and GLDM-LargeDependenceLowGrayLevelEmphasis. It was observed that most of the parameters that were included in the logit-fit models also had high mean decrease accuracy values in the random forest model.

figure 4

Mean Decrease Accuracy of study parameters in Random Forest model. Features with the highest overall importance were indicated. PIL: Percent infected lung, PLT: Platelet count, LDH: Lactate dehydrogenase, CK: Creatine kinase, PCT: Procalcitonin

When the ROC curves of the models were evaluated by pairwise comparison [28], no significant difference was found between the RF model and the Combined logit fit model (Fig. 5).

Fig. 5
figure 5

ROC curves and their non-parametric pairwise comparison table of machine learning models in the study. The p values of the comparison results were given. RF: Random Forest model

Power analysis

On the post hoc analysis, for a difference of the mean for two independent groups including 117 and 74 patients with input parameters of a medium effect size (Cohen’s d = 0.5), two tails, and alpha = 0.05, the calculated power was 0.91.


COVID-19 surges in the United States over the past two years were assessed by the CDC as three periods [29]. These are the Winter 2020–2021 period, the Delta period from July to November 2021, and the Omicron period that we are currently in, which started in December 2021. The maximum number of 7-day moving average ICU bed in use for COVID-19 was reported as 27,958 (January 9–16, 2021) in the Winter 2020–2021 period, 24,775 (September 6–13, 2021) in the Delta period, and 24,776 (January 15, 2022) in the Omicron period [29]. It has been reported that the number of patients who required intensive care due to the Omicron variant is one-fourth compared to the Delta variant [30]. The lack of difference between the ICU admission numbers between the Omicron period and the Delta period is due to the difference in the number of COVID-19 cases. While the maximum case number of 7-day moving average in the Delta period was 164,249, this value was reported as 798,976 in the Omicron period [29].

Since none of the 28 clinical, 21 volumetric, and 74 radiomics parameters could reliably predict patients who would require ICU admission, clinical, radiological, and combined models were built, and the combined model provided the best predictions.

Models based solely on the clinical data emerged with easy accessibility and usability features. High fever, older age, elevated LDH, increased acute phase reactants and a decreased lymphocyte count were frequently reported in ICU candidates [4,5,6,7,8,9,10,11]. In our study, procalcitonin was distinguished as the parameter with the highest odds ratio value, and patients with chronic renal failure showed a significantly higher need for ICU care, which is consistent with previous publications [12].

In a study involving cases from 100 hospitals in South Korea, the presence or absence of chest X-ray findings did not significantly improve the clinical model outcomes when used as a parameter [12]. On the other hand, it has been reported that a better prediction was obtained when a deep-learning model, which was trained to discriminate critical and noncritical chest X-ray findings, was combined with a clinical model [31].

In prognostic studies that have evaluated CT findings, both increased volumes in pulmonary involvement [32] and an increased ratio of consolidation in pulmonary lesions [33] were associated with an unfavorable prognosis. In our study, the percentage of infected lung parenchyma was included in the radiological and combined models. The mean and median densities of the lesions were significantly higher in ICU patients, suggesting a higher frequency of consolidation. However, this parameter was not included, and skewness was selected for the models by BIC. It has been previously shown that as the GGO areas in the lesions increase, the skewness value of the lesion also increases [18]. We showed that the skewness of the lesions was significantly higher in patients who did not require ICU admission. VSS, which includes the classification of lesions as GGO or consolidation, has been reported as an effective method in the evaluation of COVID-19 prognosis [34, 35]. However, this method is reported to have reliability and reproducibility problems due to issues such as difficulty classifying lesions containing both areas of consolidation and GGO, and radiomics models were found to be more useful for predicting prognosis [36].

The total lung volume was 12% lower in the ICU group. Alveolar collapse is known to occur in patients with SARS-CoV-2 infection [37, 38], and surfactant reduction that results from the loss of alveolar type 2 cells, increased inflammatory cell migration to the interstitial space, and microvascular thrombosis may be responsible for this outcome [39]. Although the total lung volume was not directly entered into the models in the parameter selections, it participated in the calculation of the percent of infected lung parameter.

Although segmentation can be performed manually in radiomics modeling studies related to COVID-19 [18, 40], this method takes considerable time due to the large number of lesions per patient, and the reproducibility problem needs to be overcome. Methods such as the segmentation of the entire lung (healthy and diseased), rather than individual lesions, have been suggested [36].

Automated segmentation solves all of these problems. While the use of AI in CT sections is not recommended as a screening test, its use as a predictive and prognostic decision support system in hospitalized patients has been suggested [41]. In a study examining clinical data and the radiomic features calculated from automated segmentations, a combination model produced the best predictions [19]. In our study, apart from radiomic parameters, the percent of infected lung parameter was included in the models. Thus, the subjective calculation of critical parameters, such as the lesion classification and the ratio of the diseased parenchyma in the VSS method, were solved, and the models were based on objective criteria. Additionally, volumetric parameters produced by automated segmentations are reportedly more accurate than human semiquantitative estimates [42].

The model that we propose for predicting the risk of ICU in the COVID-19 patient has two important features. First, it does not solely use clinical data. Models that are solely based on laboratory parameters did not consider lung parenchyma involvement as a parameter, nevertheless all of the combined models in our study included more than one radiological parameter, regardless of the machine learning algorithm that was used. Second, the reproducibility problem of VSS methods has been resolved by using the segmentations of the deep learning algorithm that was trained with CT studies from multiple hospitals in affected countries across Europe. Thus, we believe that models based on non-subjective clinical and radiological data that require no parameter calculation effort and that provide reproducible results could be more widely used in the field and can help healthcare providers to make decisions and better organize hospitals' resources.

This study has some limitations. First, these are the results of single center. However, we used automated segmentation, and CT data were resampled during the radiomics parameter calculation. Second, the patient population was retrospectively selected from patients who had an indication for hospitalization, which could introduce selection bias. Third, the relationship between the antiviral treatment efficacy and the ICU requirements was not evaluated in this study since there is no definitively proven antiviral treatment for COVID-19. Fourth, we used an unbalanced data set; however, we increased the sensitivity by adding synthetic data to the training set. Finally, patients with a contrast-enhanced examination were not included in this study since the radiomics parameters would be affected. We believe that a different model is required for embolism cases.


The model that was created by combining the radiological parameters obtained by automated segmentation and the clinical parameters in COVID-19 patients requiring hospitalization was found to be useful as an objective method in predicting the risk of developing critical illness.

Availability of data and materials

All data generated or analyzed during this study are included in this published article.



Area under curve


Decision curve analysis


Deep learning

FiO2 :

Fraction of inspired oxygen


Ground glass opacities


Gray level co-occurrence matrix


Gray level dependence matrix


Gray level run-length matrix


Gray level size zone matrix


Interquartile range


Neutrophil to lymphocyte ratio


Partial pressure of oxygen


Robust mean absolute deviation


Receiver operator characteristics


Standard deviation


Synthetic Minority Oversampling TEchnique


Oxygen saturation (estimated by pulse oximetry)


Variance inflation factor


Visual severity score


  1. COVID-19 Treatment Guidelines Panel. Coronavirus disease 2019 (COVID-19) treatment guidelines. National Institutes of Health. Available at Accessed 24 Oct 2021.

  2. Goic M, Bozanic-Leal MS, Badal M, et al. COVID-19: short-term forecast of ICU beds in times of crisis. PLoS ONE. 2021;16(1):e0245272.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. Pascarella G, Strumia A, Piliego, et al. COVID-19 diagnosis and management: a comprehensive review (review). J Intern Med. 2020;288(2):192–206.

    CAS  Article  PubMed  Google Scholar 

  4. Ayaz A, Arshad A, Malik H, et al. Risk factors for intensive care unit admission and mortality in hospitalized COVID-19 patients. Acute Crit Care. 2020;35(4):249–54.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Vanhems P, Gustin MP, Elias C, et al. Factors associated with admission to intensive care units in COVID-19 patients in Lyon-France. PLoS ONE. 2021;16(1):e0243709.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. Gong J, Ou J, Qiu X, et al. A tool to early predict severe corona virus disease 2019 (COVID-19): a multicenter study using the risk nomogram in Wuhan and Guangdong. China Clin Infect Dis. 2020;71(15):833–40.

    CAS  Article  PubMed  Google Scholar 

  7. Yan L, Zhang H-T, Xiao Y, Wang M, Sun C, Liang J, et al. Prediction of criticality in patients with severe Covid-19 infection using three clinical features: a machine learning-based prognostic model with clinical data in Wuhan. medRxiv. 2020.

  8. Liang W, Liang H, Ou L, Chen B, Chen A, Li C, et al. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19. JAMA Intern Med. 2020;180(8):1081–9.

    CAS  Article  PubMed  Google Scholar 

  9. Caramelo F, Ferreira N, Oliveiros B. Estimation of risk factors for COVID-19 mortality - preliminary results. medRxiv. 2020.

  10. Ji D, Zhang D, Xu J, et al. Prediction for progression risk in patients with COVID-19 pneumonia: the CALL score. Clin Infect Dis. 2020.

    Article  PubMed  Google Scholar 

  11. Xie J, Hungerford D, Chen H, Development and external validation of a prognostic multivariable model on admission for hospitalized patients with COVID-19. medRxiv. 2020.

  12. Heo J, Han D, Kim HJ, et al. Prediction of patients requiring intensive care for COVID-19: development and validation of an integer-based score using data from Centers for Disease Control and Prevention of South Korea. J Intensive Care. 2021;9:16.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Liu S, Nie C, Xu Q, et al. Prognostic value of initial chest CT findings for clinical outcomes in patients with COVID-19. Int J Med Sci. 2021;18(1):270–5.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. Wasilewski PG, Mruk B, Mazur S, Półtorak-Szymczak G, Sklinda K, Walecki J. COVID-19 severity scoring systems in radiological imaging: a review. Pol J Radiol. 2020;17(85):e361–8.

    Article  Google Scholar 

  15. Yu Q, Wang Y, Huang S, et al. Multicenter cohort study demonstrates more consolidation in upper lungs on initial CT increases the risk of adverse clinical outcome in COVID-19 patients. Theranostics. 2020;10(12):5641–8.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. Gillies RJ, Kinahan PE, Hricak H. Radiomics: images are more than pictures. They Are Data Radiol. 2016;278(2):563–77.

    Article  Google Scholar 

  17. Zhang X, Wan D, Shao J, et al. A deep learning integrated radiomics model for identification of coronavirus disease 2019 using computed tomography. Sci Rep. 2021;11:3938.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. Gülbay M, Özbay BO, Mendi BA, et al. A CT radiomics analysis of COVID-19-related ground-glass opacities and consolidation: Is it valuable in a differential diagnosis with other atypical pneumonias? PLoS ONE. 2021;16(3):e0246582.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. Wang D, Huang C, Bao S, et al. Study on the prognosis predictive model of COVID-19 patients based on CT radiomics. Sci Rep. 2021;11:11591.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. Wu Q, Wang S, Li L, et al. Radiomics analysis of computed tomography helps predict poor prognostic outcome in COVID-19. Theranostics. 2020;10(16):7231–44.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. Huang S, Yang J, Fong S, et al. Artificial intelligence in the diagnosis of COVID-19: challenges and perspectives. Int J Biol Sci. 2021;17(6):1581–7.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. Saood A, Hatem I. COVID-19 lung CT image segmentation using deep learning methods: U-Net versus SegNet. BMC Med Imaging. 2021;21:19.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Taylor S. Logistic regression: application to clinical classification. UC Davis Health—Clinical and Translational Science Center. Accessed 24 Oct 2021.

  24. Package ‘smotefamily’. Accessed 23 Jan 2022.

  25. Bayesian information criterion. Accessed 23 Jan 2022.

  26. Peduzzi P, Concato J, Kemper E, et al. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49(12):1373–9.

    CAS  Article  PubMed  Google Scholar 

  27. Blagus R, Lusa L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 2013.

    Article  Google Scholar 

  28. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–45.

    CAS  Article  PubMed  Google Scholar 

  29. Trends in Disease Severity and Health Care Utilization During the Early Omicron Variant Period Compared with Previous SARS-CoV-2 High Transmission Periods—United States, December 2020–January 2022. Accessed 26 Mar 2022.

  30. Abdullah F, Myers J, Basu D, et al. Decreased severity of disease during the first global omicron variant covid-19 outbreak in a large hospital in Tshwane, South Africa. Int J Infect Dis. 2022;116:38–42.

    CAS  Article  PubMed  Google Scholar 

  31. Jiao Z, Choi JW, Halsey K, et al. Prognostication of patients with COVID-19 using artificial intelligence based on chest x-rays and clinical data: a retrospective study. Lancet Digit Health. 2021;3:e286–94.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Yuan M, Yin W, Tao Z, et al. Association of radiologic findings with mortality of patients infected with 2019 novel coronavirus in Wuhan, China. PLoS ONE. 2020;15(3):e0230548.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  33. Cau R, Falaschi Z, Paschè A, et al. Computed tomography findings of COVID-19 pneumonia in Intensive Care Unit-patients. J Public Health Res. 2021.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Wang X, Hu X, Tan W, et al. Multicenter study of temporal changes and prognostic value of a CT visual severity score in hospitalized patients with coronavirus disease (COVID-19). Am J Roentgenol. 2021;217:83–92.

    Article  Google Scholar 

  35. Zhao W, Zhong Z, Xie X, et al. Relation between chest CT findings and clinical conditions of coronavirus disease (COVID-19) pneumonia: a multicenter study. Am J Roentgenol. 2020;214(5):1072–7.

    Article  Google Scholar 

  36. Homayounieh F, Ebrahimian S, Babaei R, et al. CT radiomics, radiologists, and clinical information in predicting outcome of patients with COVID-19 pneumonia. Radiol Cardiothorac Imaging. 2020;2:4.

    Article  Google Scholar 

  37. Iwasawa T, Sato M, Yamaya T, et al. Ultra-high-resolution computed tomography can demonstrate alveolar collapse in novel coronavirus (COVID-19) pneumonia. Jpn J Radiol. 2020;38:394–8.

    CAS  Article  PubMed  Google Scholar 

  38. Shi F, Wei Y, Xia L, et al. Lung volume reduction and infection localization revealed in Big data CT imaging of COVID-19. Int J Infect Dis. 2021;102:316–8.

    CAS  Article  PubMed  Google Scholar 

  39. Savaş R, Öz ÖA. Evaluation of lung volume loss with 3D CT volumetry in COVID-19 patients. Diagn Interv Radiol. 2021;27:155–6.

    Article  PubMed  Google Scholar 

  40. Wang L, Kelly B, Lee EH, et al. Multi-classifier-based identification of COVID-19 from chest computed tomography using generalizable and interpretable radiomics features. Eur J Radiol. 2021;136:109552.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Neri E, Miele V, Coppola F, et al. Use of CT and artificial intelligence in suspected or COVID-19 positive patients: statement of the Italian Society of Medical and Interventional Radiology. Radiol Med. 2020;125(5):505–8.

    Article  PubMed  Google Scholar 

  42. Kanne JP, Bai H, Bernheim A, et al. COVID-19 imaging: what we know now and what remains unknown. Radiology. 2021;299(3):E262–79.

    Article  PubMed  Google Scholar 

Download references


Authors would like to thank Angel Alberich-Bayarri for his support during the calculation of radiomics parameters and Cristobal Bautista Hernandez for his support during the development of the models.


The authors state that this work has not received any funding.

Author information

Authors and Affiliations



MG: Conceptualization, data curation, AI results evaluation, statistics, writing. AB: Conceptualization, data curation, case selection, ethical evaluation of the study. EO: Data curation, AI results evaluation. BYO: Data curation, clinical findings collection and evaluation of the cases. BARM: Data curation, radiologic findings evaluation of the cases. HB: Conceptualization, data curation, ethical evaluation of the study. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mutlu Gülbay.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

Institutional Review Board approval was obtained. From, Ankara City Hospital Institutional Review Board No: 1. Email: Fax: + 90 312 552 99 82. Study No: 2090. IRB decision number: EK-E1-21-2090. All procedures performed in this study was in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. This study has an IRB approval of an informed consent waiver.

Consent for publication

Not applicable.

Conflict of interest

The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gülbay, M., Baştuğ, A., Özkan, E. et al. Evaluation of the models generated from clinical features and deep learning-based segmentations: Can thoracic CT on admission help us to predict hospitalized COVID-19 patients who will require intensive care?. BMC Med Imaging 22, 110 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • COVID-19
  • Deep learning
  • Artificial intelligence
  • Computed tomography
  • Radiomics
  • Machine learning
  • Logistic regression models