Performances of clinical characteristics and radiological findings in identifying COVID-19 from suspected cases

Background To identify effective factors and establish a model to distinguish COVID-19 patients from suspected cases. Methods The clinical characteristics, laboratory results and initial chest CT findings of suspected COVID-19 patients in 3 institutions were retrospectively reviewed. Univariate and multivariate logistic regression were performed to identify significant features. A nomogram was constructed, with calibration validated internally and externally. Results 239 patients from 2 institutions were enrolled in the primary cohort including 157 COVID-19 and 82 non-COVID-19 patients. 11 features were selected by LASSO selection, and 8 features were found significant using multivariate logistic regression analysis. We found that the COVID-19 group are more likely to have fever (OR 4.22), contact history (OR 284.73), lower WBC count (OR 0.63), left lower lobe involvement (OR 9.42), multifocal lesions (OR 8.98), pleural thickening (OR 5.59), peripheral distribution (OR 0.09), and less mediastinal lymphadenopathy (OR 0.037). The nomogram developed accordingly for clinical practice showed satisfactory internal and external validation. Conclusions In conclusion, fever, contact history, decreased WBC count, left lower lobe involvement, pleural thickening, multifocal lesions, peripheral distribution, and absence of mediastinal lymphadenopathy are able to distinguish COVID-19 patients from other suspected patients. The corresponding nomogram is a useful tool in clinical practice.


Introduction
In December 2019, a few pneumonia cases of unknown etiology were reported in Wuhan, Hubei Province, China [1]. The disease, now named coronavirus disease 2019 (COVID-19) then spread at a striking speed worldwide. The causative organism was identified as a novel coronavirus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) due to the phylogenetic similarity to SARS-CoV [2]. As of October 15th, 2022, there were a total of 238,940,176 cumulative cases and 4,882,066 cumulative deaths worldwide. COVID-19 was declared as a public health emergency of international concern (PHEIC) by the World Health Organization (WHO) as early as January 30th, 2020 [3,4].
The confirmation of COVID-19 relies on the positive result of the nucleic acid amplification test (NAAT) of the upper respiratory tract specimens using the real-time reverse transcriptase-polymerase chain reaction (RT-PCR) tests [5]. However, the limitations of RT-PCR tests include: 1) The severity and progression of the disease cannot be quantitatively judged. 2) They have long turnaround times, especially in less developed regions. 3) They require certified laboratories, expensive equipments and trained technicians [6,7].
On the contrary, chest CT scan is relatively easy to perform with fast diagnosis and the sensitivity reached as high as 97% for COVID-19 according a study of 1014 patients in Wuhan [8]. Chest CT abnormalities have also been identified in patients even prior to the development of symptoms or the detection of viral RNA [9,10]. Thus it has a great value in early identification of COVID-19 [8,11,12]. Chest CT imaging is also a useful tool in monitoring COVID-19 progression and therapeutic effect in clinical settings [13]. The Diagnosis and Treatment Program of COVID-19 (trail version 8) [14,15] formulated by the National Health Commission of China has summarized the typical CT manifestations of COVID-19 as follows and incorporated them in the diagnosis criteria: multiple small patchy shadows and interstitial changes are seen, mainly in periphery lungs. This may progress into bilateral multiple ground glass opacities (GGOs) and infiltrations. In severe cases, consolidation may occur, but pleural effusion is rare. In multiple system inflammatory syndrome (MIS-C), patients with cardiac insufficiency can show enlarged heart silhouette and pulmonary edema.
Patients with above-said CT manifestations are suspected as COVID-19 infectors therefore need further examinations. Before the RT-PCR result is available, the patient needs isolation, but the quarantine of the patients may lead to a waste of medical resources and a possible delay of essential treatment. Hence, effective and convenient methods to better distinguish COVID-19 patients are needed.
The aim of our study is to identify the useful clinical, laboratory and radiographic features that are able to distinguish COVID-19 patients from other suspected cases and generate a nomogram as a useful tool for clinical practice.

Materials and methods
The schematic workflow is depicted in Figure 1.

Patient cohort
Data were de-identified to guarantee the patients' confidentiality. From January 21th to March 5th, 2020, patients admitted to a hospital in Anhui province, China and our institution in Shanghai, China who met the following requirements were enrolled as the primary cohort in our study: (1) Patients with chest CT manifestations suggested by the Diagnosis and Treatment Program of COVID-19 (trail version 8) [15] that had a suspicion of COVID-19. (2) Patients that took laboratory examination at admission. (3) Patients diagnosed of COVID-19 with  positive RT-PCR for SARS-CoV-2, or patients excluded  from the diagnosis of COVID-19 with 2 consecutively  negative RT-PCR test results separated by at least 1 day. Exclusion criteria included: (1) Patients who were hospitalized before (n = 4). (2) Significant motion artefacts in CT images (n = 12). (3) Patients lacking essential data (n = 21). The epidemiological history, the symptoms, the laboratory test results and the imaging features of their first CT scan after onset were recorded.
From February 6th to March 13th, 2020, an independent cohort of CT-suspected patients from another institution in Anhui Province was retrospectively studied, using the same inclusion and exclusion criteria. These patients formed the validation cohort.
The laboratory tests were carried out in the outpatient department or in the wards on admission, mostly on the same day when CT scan was done. Collected laboratory indices included the white blood cell (WBC) count, lymphocyte count, lactate dehydrogenase (LDH), C reactive protein (CRP), procalcitonin (PCT), alanine aminotransferase (ALT), and aspartate aminotransferase (AST).

CT protocol
105 patients from Huashan Hospital Affiliated to Fudan University were imaged with 1.5 mm-thickness with a 256-slice spiral CT scanner (Philips). 134 patients from Fuyang No.2 People`s Hospital were imaged with 1 mmthickness with a 64-section CT scanner (Aquilion 64, Toshiba Medical Systems). 59 patients from Bozhou People`s Hospital in the validation cohort were imaged with 5 mm-thickness with a 64-section CT scanner (Siemens Somantom Sensation).

CT manifestation analysis
All imaging data were analyzed with consensus by two experienced radiologists (Y.L. and D.W., general radiologists with 12 and 7 years of experience in CT interpretation). 23 features from 18 aspects were collected as listed below: (a) The involved pulmonary lobes including five features: right upper, right middle, right lower, left upper, left lower lobes; (b) Distribution of lesions including two features: anterior and posterior part of lungs; (c) The location of lesions that is set as dummy variables: peripheral (the outer one-third of the lung), central (the inner two-thirds of the lung) or both; (d) The extent of the lesions that is set as dummy variables: unifocal (only one lesion can be observed), multifocal (multiple lesions separated from each other by uninvolved lung tissue) and diffuse (dispersed over a large area). (e) An extent score was semi-quantitatively calculated. Both lungs were divided into upper (above tracheal carina), lower (below inferior pulmonary vein) and middle (in between) zones, and involved percentage in each zone was scored: 0, 0%; 1, < 25%;2, 25%-49%; 3, 50%-74%; 4, > 75%, and they added up to the extent score (range 0-24). (f ) The existence of opacification set as dummy variables included GGO, mixed (mainly GGO), mixed (mainly consolidation) and consolidation; (g) The shape of the lesions, including nodular (characterized by a rounded or irregular opacity, well or poorly defined, measuring up to 3 cm in diameter), linear (fine linear opacity), patchy (isolated focal lesions with no nodular/linear shape in the segment) and large patchy (large fused lesions involving multi-  Pleural traction; (p) Pleural effusion; (q) Mediastinal lymphadenopathy (the short axial diameter > 1 cm); (r) Liver spleen ratio (LS ratio) was calculated as CT liver / CT spleen to indicate the relative density. Five 1cm 2 regions of interests (ROI) were drawn in the liver and spleen parenchymal to obtain the mean CT values of liver and spleen. The description of the radiological features of the lungs followed the definition compiled by the Fleischner Society [16].

Feature selection
The clinical [8], laboratory [7] and CT features [23] were analyzed altogether, but with the limited sample size, a total of 38 features would lead to overfitting in multivariate analysis. Thus, the least absolute shrinkage and selection operator (LASSO) method was adopted to select the most relevant features. This method is able to shrink the coefficients and diminish some to zero, thus can be used for feature reduction and selection. The R software and the "glmnet" package (version 3.6.0; R foundation for Statistical computing) were used.

Statistical analysis
All statistical analyses were executed with R software. The Shapiro-Wilk test was used to evaluate the distribution type and Bartlett`s test was used to evaluate the homogeneity of variance. Normally distributed data were displayed as mean ± standard deviation. Non-normally distributed data and ordinal data were displayed as median (inter-quartile range). Categorical variables were summarized as counts and percentages. Both univariate and multivariate logistic regression were analyzed to demonstrate the correlation of the features with COVID-19 diagnosis. The regression coefficient (β) was calculated using the odds ratio (OR). The model was estimated as follows: A nomogram was established. The calibration ability was internally assessed with the bootstraping method and the Hosmer-Lemeshow test (HL test) was performed to test the goodness of fit.
For the external validation of the nomogram, the prediction value of each case was calculated according to the nomogram and compared with the observed diagnosis. The accuracy was validated by correctly predicted case proportion and the HL goodness-of-fit test. A P-value of < 0.05 was defined as statistical significance.

IRB approval
This multi-center retrospective study was approved by the institutional review board (IRB) and the requirement of written informed consent was waived.

Clinical information
The clinical information, laboratory tests, and chest CT imaging findings were compared between the primary cohort and validation cohort (Tabled 1 and 2). In the primary cohort, 239 patients (134 males and 105 females) were included in this study with an average age of 46.31 ± 15.90 years old. 28.87% of the patients had a direct contact with confirmed COVID-19 patients before the onset or had travelled/lived in the Hubei Province. 17.57% of the patients had indirect contact. Most common symptoms the patients presented were fever (70.29%), cough (44.35%), and chest distress (11.30%). Some patients had digestive symptoms such as diarrhea (2.09%) and anorexia (2.09%) ( Table 1). The median interval between the onset and the date of CT scan was 8 (range 1-22) days. 157 patients were confirmed as COVID-19 by RT-PCR and were allocated to the COVID-19 group. They were put in quarantine and treated with the antiviral therapy based on the evolving recommendations [17]. The other 82 patients had negative RT-PCR results. They were eventually diagnosed as other conditions such as viral pneumonia (influenza type A virus, respiratory syncytial virus), bacterial infection (Staphylococcus aureus, Streptococcus pneumoniae), fungal infection (pneumocystis jiroveci pneumonia), mycoplasma pneumoniae pneumonia, and other respiratory conditions (acute eosinophilic pneumonia, Goodpasture syndrome etc.). Clinical information of two groups were compared using univariate analysis (Table 3). COVID-19 patients were found to be younger (P = 0.037), more likely to have fever (P = 0.001) or cough (P < 0.001), and more likely to have contact history (P < 0.001).

Chest CT imaging findings
Imaging characteristics were assessed and compared between two groups (Tables 3). Regarding the location and the distribution of the lesions, COVID-19 patients were found to be more located in posterior part of the lungs (P < 0.001) compared with non-COVID-19 patients. They had more involvement in every lobe of the lung (P < 0.05) due to more multifocal distribution (P < 0.001). Besides, they were more likely to have specific signs including reticular changes (P = 0.04), vascular enlargement (P < 0.001), air bronchogram (P = 0.043), and pleural thickening (P < 0.001). They were less likely to show pleural effusion (OR 0.16, P = 0.007) or mediastinal lymphadenopathy (P < 0.001). Other parameters were not significantly different.

Nomogram
A nomogram was constructed based on the multivariate Logistic analysis model. The adjusted C-index of the nomogram was 0.97 (Fig. 3A). The calibration curve was determined with bootstrap analysis to get bias-corrected estimation. It indicated great agreement between the prediction and the actual diagnosis in the probability (Fig. 3B). The HL goodness-of-fit test showed good calibration as well (P = 0.4797). The CT images of two cases illustrated the application of the nomogram (Fig. 4).

External validation
The validation cohort included 59 cases with 43 COVID-19 and 16 non-COVID. The baseline data were collected in Tables 1 and 2. 56 out of 59 cases were correctly predicted using the nomogram, reaching an accuracy of

Discussion
An ongoing outbreak of COVID-19 originated from Hubei Province in China has been spreading worldwide. Experts in infectious and respiratory diseases, critical care, and radiology from all over the world have been making a joint effort to contain the epidemic situation [18]. Presently, RT-PCR is the standard confirmative method in spite of a few flaws including long turnaround time for the results in underdeveloped regions and low sensitivity especially in the early phase of the disease [10]1920. On the contrary, chest CT scan is able to  recognize the lesions at earlier stages with high sensitivity, thus is considered an important tool to guarantee an early diagnosis and isolation of infected patients [8].
Before the RT-PCR results are attainable, the quarantine is needed, but the isolation site is insufficient, and it possibly delays essential treatment. In this study, the CT manifestations summarized by the National Health Commission of China were used as the inclusion criteria. We investigated the differential values of clinical characteristics, laboratory results and CT features to better distinguish COVID-19 patients from those with suspicious CT findings, and developed a model with a nomogram as a practical tool. The most common symptom in the patients we enrolled is fever, followed by cough and chest distress. As a differential feature, fever is significant in both univariate and multivariate analysis. This echoes previous studies, and fever is the leading symptom listed in the case definition for surveillance of COVID-19 by the Chinese Health Commission [17,21,22]. Therefore, it is necessary to monitor body temperature and at-home temperature measurement is a useful and easy way for the public to early notice. Additionally, we noticed a small portion of the patients with digestive disorders like diarrhea and anorexia, and it occurred more in the COVID-19 group. Increasing evidence shows the manifestation of COVID-19 is not always confined to respiratory symptoms, but may also involve other systems, e.g., the central nervous system [23,24]. Liver function abnormalities have been reported in COVID-19 patients with a pooled prevalence of 19% (95% confidence interval, 9-32%) with an association with disease severity. Hepatocyte degeneration, focal necrosis, and fatty infiltration were reported in COVID-19 patients [25,26]. LS ratio was observed in this study since the CT values were attainable in chest CT image, but was insignificant here.
The contact history is another valuable factor for COVID-19, including direct contact with COVID-19 patients, direct exposure in districts with confirmed cases, and indirect contact with those who were exposed [27]. According to the National Health Commission of China, a patient with one exposure or contact history and two clinical conditions can be regarded as a suspected case [17]. However, with the swift spread of the disease, some contact history is unrevealed, making it harder to   contain the epidemic [28]. More active precaution and isolation is needed. Among the laboratory parameters, WBC count is significantly lower in COVID-19 group in both univariate and multivariate analysis, and lymphocyte count is lower in univariate analysis. This is consistent with previous findings and the criteria by the Chinese Health Commission [1,12,17]. We also found lower levels of CRP and PCT in the COVID-19 group. They are useful indicators of infection or inflammation, and CRP was previously reported to increase in COVID-19 patients by some researchers [8,29]. Our finding may result from higher extent of increased levels of these indices in non-COVID-19 patients since they had other inflammatory conditions including bacterial infection, while other studies used healthy controls. Typical radiographic features on chest CT in COVID-19 patients were reported to predominantly include bilateral and peripheral GGOs and consolidative pulmonary opacities. The location of the lesions varied among studies, yet the peripheral site is most frequently reported [8,[30][31][32]. These widelyaccepted imaging characteristics constituted the most important inclusion criterium in this study, thus were seen in both groups. Less typical signs in previous studies included linear opacities, "crazy-paving" pattern and the reverse halo sign, etc. [8,[33][34][35][36][37]. We found that COVID-19 lesions are more commonly seen in both lower lobes, which echoes existing literature. We also found that the right lower lobe was more often involved in both COVID-19 and non-COVID-19 groups, which may be related to the shorter and thicker structure of the right lower lobe bronchus that may make it easier for the pathogens to enter this lobe [38]. There are also studies that found left lower lobe to be mostly involved [39,40]. Distribution in all lobes showed significant difference between two groups, but left lower lobe involvement remained after two-step feature selection, making it a significant feature in differentiating COVID-19 patients from other conditions. Although it is unclear at this time why it is useful, further investigations of the common distribution and the corresponding mechanisms of the diseases in the non-COVID-19 group respectively will be helpful. Besides, compared with non-COVID-19 cases, COVID-19 is more likely to exhibit multifocal distribution rather than unifocal changes, and more likely to have reticulated changes, vascular enlargement, and pleural thickening. The pooled prevalences of pleural thickening in COVID-19 patients were 30.0-52.46% [39,41,42]. COVID-19 patients are also less likely to have pleural effusion and mediastinal lymphadenopathy, which is consistent with prior researches [30].
Fever, contact history, decreased WBC count, left lower lobe location, pleural thickening, multifocal lesions, peripheral distribution, and absence of mediastinal  lymphadenopathy were found to be features independently associated to COVID-19 patients. On the basis of these parameters, a nomogram was built to better interpret our findings, which is popular in cancer research these years [35]. According to our nomogram, the point of each feature adds up to a total score with a corresponding probability of COVID-19. A nomogram can be validated by both internal and external validation [36]. In this study, internal validation used the data of the same cohort for the generation of the nomogram, and external validation used the data from another institution. Both internal and external validation indicated good agreement between the prediction and the actual diagnosis in the probability.
Since the COVID-19 outbreak, the scientific researchers have focused more on clinical and radiological findings of COVID-19 infection, whereas a few studies have investigated the differential diagnoses. Three studies from Europe presented a vast spectrum of differential diagnoses with abundant figures and elaborate illustrations to help the radiologist with differentiation [43][44][45]. Another study evaluated the performances of radiologists from US and China in differentiating COVID-19 from other viral pneumonia [46]. Researchers from Japan compared COVID-19 and other diseases with similar symptoms, and proposed useful laboratory indicators [47]. The studies above investigated the differential diagnosis of COVID-19, but did not construct a practical model. One study built a diagnostic model, but with a small sample size, and only included non-COVID-19 pneumonia patients in the control group [48]. Our study has a different design from those of existing papers. In this study, the typical CT manifestations of COVID-19 were used as the inclusion criteria, thus a wider spectrum of diseases that needed to be differentiated from COVID-19 was included, which is a realistic problem that may be encountered in clinical practice.
In summary, this study is the first to investigate the features to distinguish confirmed COVID-19 patients from other conditions with similar CT findings, which is an important clinical issue. The nomogram can be used as an instant tool able to provide practical reference for individualized management for every suspected patient Two representative cases to illustrate the application of the nomogram. A A 40-year-old male patient complained of fever for 4 days (score ≈ 80). He had travelled to Huangshi, a city in Wuhan Province, China a week before the onset (score ≈ 100). His laboratory tests indicated leukocytopenia (1.99*10^9/L, score ≈ 92). His chest CT showed patchy ground glass opacities with vascular enlargement and reticular changes on bilateral lower lobes (left lower lobe involvement: score ≈ 83; multifocal: score ≈85). Lesions were located both central and peripheral (score ≈ 80). No mediastinal lymphadenopathy was observed (score ≈ 80). Slight pleural thickening was observed (score ≈ 85). Total estimated score reached around 687, indicating > 99.8% probability to be a COVID-19 case. He was later confirmed by RT-PCR. B A 60-year-old female patient complained of fever for 3 days (score ≈ 80). She claimed no contact or exposure history (score ≈ 60). Her WBC count is slightly elevated (10.52*10^9/L, score ≈ 60). Her chest CT showed unifocal (score ≈ 68) large patchy ground glass opacities with consolidation only involving the right upper lobe (score ≈ 63), but with both central and peripheral distribution (score ≈ 80). Mediastinal lymphadenopathy was observed in mediastinal window (score ≈ 60). No pleural thickening (score ≈ 70). Total estimated score reached around 541, indicating < 0.2% probability to be a COVID-19 case. She was radiologically suspected as COVID-19, but the diagnosis of COVID-19 was ruled out by 2 consecutively negative RT-PCR test results. She was finally diagnosed with respiratory syncytial virus infection and is likely to offer effective and scientific basis for empirical treatment.
Our study had several limitations. Firstly, in this multicenter study, the normal range and results of the laboratory data might be different due to the differences in the kits, equipment, and environmental conditions. However, three institutions are all China's Grade-A Tertiary Hospitals, with laboratories of the highest qualifications, and similar protocols are adhered, thus the results are relatively stable. Secondly, the sample size is relatively small since no data was obtained from the epicenter of the outbreak, and the spread of COVID-19 was successfully suppressed in a few months in China as appropriate precautions were taken. Besides, despite being the standard confirmative test, RT-PCR has false-negative probabilities, therefore our results might be biased since non-COVID-19 group might include infected patients. Future prospective investigation of larger scale with international data and evolved diagnostic techniques is expected.

Conclusion
In conclusion, fever, contact history, decreased WBC count, left lower lobe involvement, pleural thickening, multifocal lesions, peripheral distribution, and absence of mediastinal lymphadenopathy are able to distinguish COVID-19 patients from other suspected patients. The nomogram based on these features is a useful tool in the clinical practice.