RSNA and BSTI grading systems of COVID-19 pneumonia: comparison of the diagnostic performance and interobserver agreement

Background This study aimed to compare the performance and interobservers agreement of cases with findings on chest CT based on the British Society of Thoracic Imaging (BSTI) guideline statement of COVID-19 and the Radiological Society of North America (RSNA) expert consensus statement. Methods In this study, 903 patients who had admitted to the emergency department with a pre-diagnosis of COVID-19 between 1 and 18 July 2020 and had chest CT. Two radiologists classified the chest CT findings according to the RSNA and BSTI consensus statements. The performance, sensitivity and specificity values of the two classification systems were calculated and the agreement between the observers was compared by using kappa analysis. Results Considering RT-PCR test result as a gold standard, the sensitivity, specificity and positive predictive values were significantly higher for the two observers according to the BSTI guidance statement and the RSNA expert consensus statement (83.3%, 89.7%, 89.0%; % 81.2,% 89.7,% 88.7, respectively). There was a good agreement in the PCR positive group (κ: 0.707; p < 0.001 for BSTI and κ: 0.716; p < 0.001 for RSNA), a good agreement in the PCR negative group (κ: 0.645; p < 0.001 for BSTI and κ: 0.743; p < 0.001 for RSNA) according to the BSTI and RSNA classification between the two radiologists. Conclusion As a result, RSNA and BSTI statement provided reasonable performance and interobservers agreement in reporting CT findings of COVID-19. However, the number of patients defined as false negative and indeterminate in both classification systems is at a level that cannot be neglected. Supplementary Information The online version contains supplementary material available at 10.1186/s12880-021-00668-3.


Background
The diagnosis of COVID-19 is primarily carried out by the reverse transcription polymerase chain reaction (RT-PCR), which is a nucleic acid amplification test (NAAT) of SARS-CoV-2 RNA sampled from the upper respiratory tract [1]. NAATs detect SARS-CoV-2 RNA in patient samples, and they are highly specific. Although they are able to detect even low levels of viral RNAs, the sensitivity of these tests in clinical setting is likely to depend on the type and quality of the sample obtained, the duration of the disease at the time point of the test and the individual test. Their estimated falsenegative rates ranges from 5 to 40% [2,3]. In early or mild disease, chest radiographies may be normal. Chest computed tomography (CT) is more sensitive than chest radiography, and some CT findings can be considered as characteristical to COVID-19. However, the absence of typical signs in CT or presence of atypical manifestations of this disease can not completely eliminate or rule out diagnosis of COVID-19 [4,5].
There are many articles and case reports trying to identify chest CT findings in COVID-19 disease. Different findings and reports presented every day for discussion in the light of the existing literature. Although, the common opinion in the radiology society is to reject using chest CT as a diagnostic tool in COVID-19, a growing number of studies classifying CT findings of the disease exists. Currently, there are many studies focusing on chest CT findings, changes of these findings throughout the disease period, and on the practices of radiologists using these findings in differential diagnosis of COVID-19 [6][7][8][9][10].
COVID-19 may present with a broad spectrum of findings ranging from ground-glass opacities (GGO) to an apparent pneumonia [11]. Furthermore, additional findings including linear or curvilinear bands, nodular densities, GGO and consolidations, and cobblestone pattern that can be seen in other viral or bacterial pneumonia, drug toxicities, inhalation exposure and some systemic diseases [12][13][14]. Due to similar imaging patterns of different etiologies other than COVID-19, diagnostic process may be complicated, relatively. It can also lead to stress and loss of labor for patients and their relatives, as well as healthcare professionals [15]. For a better diagnostic process, standardization of reporting and direct contact with the attending physician are important parameters to increase productivity. Many institutes and radiology societies proposed classifications that may reveal presence, severity and may predict prognosis of the disease based on CT findings but it has yet to be attained to a clear consensus. In order to help radiologists recognize and correctly identify chest CT findings in COVID-19 disease, similar guidelines that are slightly differing from each other have been confirmed and published by associations such as Society of Thoracic Radiology, Radiology Society of North America (RSNA), British Society of Thoracic Imaging (BSTI), and Dutch Radiological Society (Additional file 1: Tables E1-2) [16][17][18].
In this study, it was aimed to reveal imaging findings of patients admitted to hospital with a pre-diagnosis of COVID-19 and of those who have suspicious findings in chest CT in company with their epidemiological data, to classify the patients in terms of findings of COVID-19 based on the BSTI guidance statement and the RSNA expert consensus statement, and to compare both of classification systems.

Methods
This monocentric, retrospective study has been approved by the COVID-19 Scientific Research Committee of the Republic of Turkey Ministry of Health and Sağlık Bilimleri Üniversitesi Gazi Yaşargil Training and Research Hospital Clinical Trials Ethics Board in accordance with the Declaration of Helsinki. The approval of the study was granted by the Institutional Ethics Committee (Gazi Yaşargil Training and Research Hospital, decision no: 588/2020). The informed consent form was waived because of no risk owing to the study and there were no adversely affected subjects or groups.

Patient selection
In this study, 903 of 5708 patients who applied to Gazi Yaşargil Training and Research Hospital, Sağlık Bilimleri Üniversitesi (Diyarbakır, Turkey) with a history of contact or symptoms of COVID-19 and underwent chest CT examination between 1 and 18 July 2020 were included (see flow chart) (Fig. 1). RT-PCR test was performed on a total of 3561 patients and the result was recorded as positive in 637 patients. Of the patients who were positive, 612 who met the appropriate criteria and 291 patients with a negative test were included in the study.
Among the most common reasons for admission to the emergency room were symptoms of high fever, cough, weakness, shortness of breath, and contact history. Subjects had been diagnosed according to the COVID-19 (SARS-CoV-2 Infection) Guide (Study of Scientific Board, Ministry of Health, Republic of Turkey, April 14, 2020). After obtaining their anamnesis and physical examination, laboratory tests including RT-PCR, and radiological imagings of respiratory system were conducted. Those who had positive RT-PCR test and pathological findings in thorax CT were diagnosed as COVID-19. On the other hand, subjects with pathological findings in thorax CT, but with two consecutive RT-PCR test negativity obtained at least 24 h later were accepted as non COVID-19.

Obtainment of thorax CT imagings and technical parameters
All the examinations were conducted using 16-and 64-slice multi-slice CT scanners [Emotion 16, Somatom Sensation 64 (Siemens Healthineers, Erlangen, Germany)]. All patients were admitted to the imaging unit with a surgical mask on. Radiology technicians got the imagings with an N95 mask, face-shield, gloves, and disposable apron. After each implementation, the surfaces had been disinfected by sodium hypochlorite solution and the room was ventilated for at least 10 min. CT examinations were non-contrast, and were performed in supine position and in deep-inspiration breath-hold. CT imaging protocol was programmed in 120 kV, 80 mA, 1 mm slice thickness, and with a standard or lower dose adjusted by using automatic dosing. Axial images were reconstructed with a slice thickness of 1.5 mm to obtain coronal images. Window settings were selected as a window level of − 600 HU and a window width of 1500 HU for lung parenchyma and a window level of 40 HU and a window width of 350 HU for mediastinum.

Interpretation of imagings
Thoracic CT examinations were blindly reported in terms of RT-PCR test results by two radiologists with a 14 and 15 years of experience, respectively (S.K. and R.D.).
Although, recurring chest CT imagings were performed in some patients because of various reasons such as sudden deterioration in clinical course or laboratory values, first admission chest CTs were considered for evaluation. Chest CTs were assessed in terms of presence of GGOs, consolidations, nodular densities, crazy-paving pattern, reticulation, subpleural bands, interlobular septal thickening, halo and reversed halo signs, enlarged vascular sign at the level of lesion, bronchial wall changes, traction bronchiectasis, enlarged mediastinal lymph node, etc., which are considered to be related to COVID-19. In addition, other findings atypical to COVID-19 such as infectious cavitation, tree-in-bud pattern, lobar pneumonia and consolidation, and pleural effusion were also evaluated. Interpretation was based on the classification recommended by RSNA expert consensus statement and BSTI guidance statement, which was prepared based on chest CT findings of COVID-19 [16,17]. The findings were recorded as typical, indeterminate, atypical and negative according to the RSNA guidelines, and as classical COVID-19, probable COVID-19, indeterminate and non-COVID-19 according to the BSTI classification (Figs. 2, 3, 4, 5, 6). The correlation between the two radiologists and the data obtained according to each classification system were compared.

Statistical analysis
The Statistical Package for Social Sciences (SPSS for Windows 24.0, SPSS Inc, Chicago, Illinois, USA) computer program was used for statistical analyses. Number, percentage, mean ± standard deviation (SD) values were used for descriptive statistics. Shapiro-Wilk test was used for assumptions of normality and homogeneity of variances. Chi-square test was used for comparison of categorical data. The results were evaluated using hazard ratio and with 95% confidence interval. Kappa statistics were used to determine the interobservers reliability. The degree of interobservers agreement was considered with the following interval of kappa: 0-0.20 for poor, 0.21-0.40 for fair, 0.41-0.60 for moderate, 0.61-0.80 for good, and 0.81-1.00 for excellent agreement. In order to measure the diagnostic power of both classification systems, the receiver operating curve (ROC) analysis was also performed. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated by setting different cut-off points for each criterion. In these analyses, p values < 0.05 were considered statistically significant.

Radiological findings
In accordance with the literature, the most common finding was GGOs, which were detected in 685 patients (75.8%). Ground-glass opacities were mostly patchy patterns and were peripherally localized [8,10,19]. In addition, they had a multilobar distribution, and they were mostly located at the level of the lower lobes and in the posterior zone. Ground-glass opacities or consolidation were detected in 45 patients (5.0%) at the isolated upper and middle zone levels. There was consolidation together with GGO in 276 (30.6%) subjects. Ground-glass opacities were bilaterally observed in the lungs in 561/685 subjects (81.9%). The number of subjects with unilateral right lung involvement was 75 (10.9%), while the number of those with unilateral left lung involvement was 49/685 (7.2%). The most common concomitant finding was enlarged vascular sign (≥ 3 mm) at the lesion level in 414/685 patients (60.4%), followed by interlobular septal thickening, crazypaving pattern, pleural and reticular bands, traction bronchiectasis, atelectasis and air bronchogram. The tree-in-bud pattern was detected in 39 patients (4.3%), centrilobular nodules in 50 patients (5.5%) and lobar consolidation in 33 patients (3.7%).
There was good agreement in the PCR-positive and PCR-negative groups according to the BSTI classification between the two radiologists (κ: 0.707; p < 0.001, κ: 0.645; p < 0.001, respectively; see Table 2). Accordingly, good agreement in the PCR-positive and PCR-negative groups (0.716; p < 0.001, κ: 0.743; p < 0.001, respectively; see Table 3) was found according to the RSNA classification.
When using the RT-PCR test result as the gold standard, Observer A yielded a sensitivity of 83.3%, a specificity of 89.7%, a PPV of 89.0%, an NPV of 84.3%, an accuracy 86.5%, and Likelihood Ratio (LR+) values of 8.1 for the Classic and Probable COVID-19 category of the BSTI guide statement with a p value of < 0.001. According to the same observer, for the typical COVID-19 category, the sensitivity of the RSNA expert consensus statement was 81.2%, the specificity was 89.7%, the PPV was 88.7%, the NPV was 82.7%, the accuracy was 85.4%, the LR + value was 7.9, and the p value was < 0.001 (Table 5).
In addition, the diagnostic performances of both observers in RT-PCR-positive and RT-PCR-negative groups according to the BSTI guidance statement and RSNA expert consensus statement were determined by ROC analysis, and the area under the curve (AUC) was calculated. Although the diagnostic performance of both classification systems was similar, the power of the BSTI guidance statement was higher than that of the RSNA expert consensus statement for both observers (observer A for BSTI: AUC 0.910 (95% CI 0.889-0.932) and for RSNA: AUC 0.884 (95% CI 0.858-0.909); observer B for BSTI: AUC 0.903 (95% CI 0.881-0.925) and for RSNA: AUC 0.876 (95% CI 0.850-0.902)) (Fig. 7).

Discussion
As a common finding and one of the most important descriptive findings in COVID-19 pneumonia, peripheral and bilateral GGOs are the most common finding in this study and have been considered the most important parameter for the identification of the disease in the classification systems of RSNA and BSTI. In accordance with the literature, the predominance  of the posterior zone of GGOs was observed in this study. Other important findings that both classification systems use in predicting the disease include consolidation co-occurring with GGOs, findings of organized pneumonia and halo signs [8,11,16,20,21]. In terms of these findings, the success of the diagnostic process in both classification systems was similar. The findings classified as typical in the RSNA guidelines and the findings classified as classic COVID-19 and probable COVID-19 in the BSTI guidelines match to a great extent. The presence of positive radiological findings that can indicate the disease and thus may contribute to the detection of the disease by repeating the test in case of RT-PCR test false negativity can lead to an earlier treatment and therefore reduced morbidity and mortality. In addition, in a recently published study, it was reported that structured radiological (SR) reporting for chest CT is more reliable than conventional    .5% for BSTI, respectively). In our study, the correlation between two radiologists in RT-PCR-positive patients was k: 0.716 for RSNA and k: 0.707 for BSTI. De Jaegere et al., in their study of 96 patients, compared the RSNA and CO-RADS classifications and found that they had similar identification power. In the assessment between the three readers, 62.2% (28/45), 37.8% (17/45), and 44.4% (20/45) in the typical category for RSNA were identified, respectively. Considering the correlation between the readers for RSNA, kappa values ranged from 0.564 to 0.663 and were similar to our study [23]. In two other studies, according to the RSNA expert consensus statement, it has been reported that interobserver agreement varies between good and excellent (K = 0.822-0.924). In both studies, interobserver agreement was found to be significantly high in typical and negative categories for RSNA, whereas it was found to be poor or moderate in indeterminate and atypical categories [24,25]. In these studies, the high number of patients in the negative category for RSNA may have led to a partial increase in interobserver agreement. Conversely, in our study, due to the low number of patients in the negative category for RSNA, the interobserver agreement may have been found to be lower.
Regarding the diagnostic power of both classification systems, the sensitivity, specificity and PPV values were high, and the diagnostic power of BSTI was found to be slightly higher than that of RSNA (83.3%, 89.7%, 89.0% for BSTI, 81.2%, 89.7%, 88.7% for RSNA). In a similar study, Inui et al. compared the typical category of RSNA with the classic COVID-19 and possible COVID-19 categories of BSTIs in terms of sensitivity, specificity, PPV, and NPV. They reported that the sensitivity was higher in RSNA (73.5% versus 71.3%), whereas the specificity was higher in BSTI (87.3% versus 82.8%) [26]. Ciccarese et al. reported that analysis of the pattern distribution was 'typical' (n = 151/211), a sensitivity of 71.6%, a specificity of 91.6%, and a PPV of 87.8% for COVID-19. There was excellent agreement between the two observers for typical and negative results (84.7% and 91.3%, respectively) [27]. Özer et al. reported high sensitivity and specificity values in the typical category for RSNA. In addition, ROC analysis was performed in RT-PCR-positive and RT-PCR-negative patients in this study, and the AUC was reported to be 0.878 (95% CI 0.852-0.903) [23].
However, very small GGOs that are peripheral and not round and GGOs with unilateral involvement that are defined in the indeterminate group in the RSNA classification can be classified as probable COVID-19 in the classification of BSTI. Another difference between the two classification systems is that although some diseases, such as pre-existing interstitial pneumoniae, are not considered by the RSNA expert consensus statement, the BSTI guidance statement suggested an increase of one degree in those patients [16,17]. In both classification systems, the findings that were considered typical or highly suspected for COVID-19 disease were not detected in a significant number of patients, and they were classified in the indeterminate category. In RT-PCR-positive patients, 10 (15.9%) of 63 patients identified by observer A in the indeterminate group according to the RSNA classification system were in the classical or probable COVID-19 group in the BSTI classification. The cases considered indeterminate according to both RSNA and BSTI classifications in RT-PCR-positive subjects were examined in detail, and it was seen that the majority [40/63 (63.5%) and 39/60 (65.0%), respectively] were in the early stage of the disease. In light of the available data, due to the nature of the disease, there may be no CT findings or ambiguous CT findings at the early stage of the disease, although ground-glass opacities might present along with consolidation and other findings in later stages [9]. CT findings in COVID-19 disease have a wide spectrum depending on the epidemiological characteristics of the patient (age, sex, race), presence of comorbid disease, any concomitant parenchymal lung disease, duration and stage of the disease, and severity of the disease [6,28]. Higher percentages of patients classified as indeterminate in both classification systems can be explained by that.
In RT-PCR-negative subjects, chest CT findings in a considerable number of subjects were congruent with COVID-19 according to both classification systems (typical category 30/291 (10.3%) for RSNA and 30/291 (10.3%) for classical and probable categories for BSTI). In particular, a high number of false-positive subjects can cause problems in the diagnostic process. When an indeterminate group was added to the false-positive group, the rate of exclusion of COVID-19 among RT-PCR-negative subjects was calculated to be 76.6% for RSNA and 74.3% for BSTI, and in that respect, both classifications were approximate. One reason for the relatively low rates may be the lower number of cases without chest CT findings in the positive group and in the negative group. In addition, in our study, the correlation between two radiologists in RT-PCR-negative patients was k: 0.743 for RSNA and k: 0.645 for BSTI. Based on both classification systems, a good correlation was found between the operators in RT-PCR-positive and RT-PCR-negative subjects.
The most important limitation of this study is that few subjects with aetiologies other than COVID-19 that may show similar CT findings were included. Therefore, studies with larger cohorts including other aetiologies might reveal more accurate specificity and sensitivity rates.

Conclusions
Many radiology associations disapprove of the use of chest CT examination in the diagnostic process of COVID-19. However, currently, it is widely used for diagnostic purposes and for predicting and monitoring the course of the disease. Since it is a newly defined disease, complete detection and accurate identification of CT findings, eliminating other aetiologies that lead to similar findings, and standardization in reporting to help radiologists and other clinicians are of importance. Several radiology societies have therefore proposed a classification system for COVID-19 based on CT findings to create a reliable and accurate diagnostic process for radiologists. In this study, chest CT findings of RT-PCR-positive and RT-PCR-negative subjects were revealed in detail. In conclusion, based on these findings, the guidelines proposed by RSNA and BSTI were successful in detecting COVID-19, and they can be reliable references for radiologists since both present similar results.