Diagnostic performance of radiomics model for preoperative risk categorization in thymic epithelial tumors: a systematic review and meta-analysis

Background Incidental thymus region masses during thoracic examinations are not uncommon. The clinician’s decision-making for treatment largely depends on imaging findings. Due to the lack of specific indicators, it may be of great value to explore the role of radiomics in risk categorization of the thymic epithelial tumors (TETs). Methods Four databases (PubMed, Web of Science, EMBASE and the Cochrane Library) were screened to identify eligible articles reporting radiomics models of diagnostic performance for risk categorization in TETs patients. The quality assessment of diagnostic accuracy studies 2 (QUADAS-2) and radiomics quality score (RQS) were used for methodological quality assessment. The pooled area under the receiver operating characteristic curve (AUC), sensitivity and specificity with their 95% confidence intervals were calculated. Results A total of 2134 patients in 13 studies were included in this meta-analysis. The pooled AUC of 11 studies reporting high/low-risk histologic subtypes was 0.855 (95% CI, 0.817–0.893), while the pooled AUC of 4 studies differentiating stage classification was 0.826 (95% CI, 0.817–0.893). Meta-regression revealed no source of significant heterogeneity. Subgroup analysis demonstrated that the best diagnostic imaging was contrast enhanced computer tomography (CECT) with largest pooled AUC (0.873, 95% CI 0.832–0.914). Publication bias was found to be no significance by Deeks’ funnel plot. Conclusions This present study shows promise for preoperative selection of high-risk TETs patients based on radiomics signatures with current available evidence. However, methodological quality in further studies still needs to be improved for feasibility confirmation and clinical application of radiomics-based models in predicting risk categorization of the thymic epithelial tumors.


Introduction
Thymic epithelial tumors (TETs) are one of the most common primary tumors in the mediastinum, accounting for up to 50% of all anterior mediastinal neoplasms in adults [1,2].Incidental thymus tumors are almost always asymptomatic and appropriate serum markers are absent, so the clinician's decision-making on treatment largely depends on imaging findings.Risk categorization of the thymic neoplasms is of great value in this situation.
According to the previous studies, the prognosis of thymic tumors largely depends on four prognostic factors: WHO histologic type, pathological stage, surgical margin status and pattern of treatment [3,4].In a large multiple-center cohort (n = 907), Liu et Colleagues from the ChART (Chinese Alliance for Research in Thymomas) confirmed that the WHO classification and T stage were independent prognostigators for recurrence of thymic tumors [5].Patients with high-risk thymic tumors need more attention for timely surgery and comprehensive periopeartive treatment modality [6].
Jeong et al [7] tried to correlate the CT findings of thymic epithelial tumors with histologic classification as early as 2004, but found CT was of limited value in distinguish WHO histopathological subtypes.White and colleagues evaluated the efficacy of preoperative CT to predict the pathological stage of TETs.They concluded that the diagnostic accuracy rate of preoperative thoracic CT was two-thirds for TMN stage and less than 50% for Masaoka system [8].One possible reason for the poor prediction performance is that subjective CT findings were evaluated in these studies, and disagreement was often seen when judging pericardial invasion, vascular invasion, lung invasion, pleural invasion and so on.Therefore, a more objective predictive model was urgently needed in this condition.
Radiomics refer to the use of computer technology to extract high-throughput quantitative features from medical images and transform the images into high-dimensional data, so as to reflect the biological characteristics in a non-invasive and objective way.Researchers have used radiomics models in TETs patients with regard to differential diagnosis, grading, staging or survival analysis [9].As regard to differentiate risk subgroups of TETs, several studies demonstrated that radiomics model was a potential tool with acceptable diagnostic accuracy.However, there were limited relevant studies compared with other tumors, and the predictive power varied greatly, with AUC (area under curve) values of radiomics algorithms ranged from 70-90% [10,11].So we conducted a systematic review and meta-analysis to investigate the predictive performance of radiomics to act as an imaging biomarker for risk categorization of thymic tumors based on the published available literature.The results might serve as a benchmark for future prospective radiomics trials for clinical translation.This study followed the Cochrane Handbook for Systematic Reviews of Interventions and was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analysis) statement.

Search strategy
A systematic literature review of PubMed, Web of Science, EMBASE and the Cochrane Library was manually conducted from their establishment date until November 2022 to identify relevant reports.The databases were searched using the terms[(thymic or anterior mediastinal or thymoma or thymus neoplasm) and (radiomics or machine learning or deep learning or artificial intelligence or neural network)].The search strategies incorporated the Medical Subject Headings terms and keywords.We omitted words, such as tumors, mass, lesions and so forth, in order to get more relevant articles to generate more power for analyzing this neglected issue.The search was limited to humans and performed with no language restrictions.References lists of the relevant articles were also screened.

Study selection
The main outcome of our study was assessment of risk categorizations of thymic tumors.Risk categorizations include histologic subtypes classification and clinical/ pathological stage classification.Histologic subtypes were classified into low risk (A,AB,B1) and high risk (B2,B3,C) groups according to WHO classification.While, stage classification was divided into early (I/II) and advanced (III/IV) stages according to TNM or Masaoka staging systems.
Criteria for inclusion in this study were as follow: (1) cohort or case-control studies; (2) patients with thymoma or thymic carcinoma proved by pathology; (3) all imaging-based (computer tomography, CT/magnetic resonance, MR/positron emission tomography-computer tomography, PET-CT) radiomics studies; (4) diagnostic outcomes (sensitivity, specificity, accuracy, etc.) were reported or could be calculated.
We excluded (1) non-original studies such as case reports, review, letters and commentaries; (2) studies involving other types of mediastinal tumors; (3) Studies with a replicated population.; (4) the diagnostic efficacy indicators were reported missing or could not be calculated.

Data extraction and quality assessment
Two investigators (XF Lu and TY Zhu) independently screened the titles and abstracts of all relevant studies.The following data from each study was extracted: author, country, year of publication, study design, imaging modality, population for diagnostic accuracy, age, parameter extraction software and risk categorizations.And we made a quality evaluation to each study by quality assessment of diagnostic accuracy studies 2 (QUA-DAS-2) tools and radiomic quality score (RQS) [12,13].The QUADAS-2 tool includes four evaluation criteria: (a) patient selection; (b) index test; (c) reference standard; and (d) flow and timing.The RQS assessment included 16 aspects with 36 potential points.All disagreements between the two reviewers were resolved by discussion.

Statistical analysis
According to QUADAS-2 standards, the RevMan 5.4 software was applied to fill in and draw the quality profiles included in this study.Meta-Disc 1.4 software was used to calculate threshold effects spearman correlation coefficient to evaluate the heterogeneity of threshold effects between studies.If there was no significant threshold effect heterogeneity between studies, Cochran's Q test and I 2 test were used to calculate the diagnostic odds ratio to evaluate the heterogeneity caused by non-threshold effects.I 2 values were defined as no heterogeneity (0-25%), low heterogeneity (26-50%), moderate heterogeneity (51-75%) and high heterogeneity (76-100%) [14].If P > 0.1 and I 2 < 50%, the heterogeneity between studies was small, and fixed effects model was used for analysis.While if I 2 > 50% and P < 0.1 indicating high heterogeneity, the random effects model was used for pooled analysis.The combined sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic odds ratio and 95% confidence interval (95%CI) were calculated.Summary receiver operating characteristic curve (SROC curve) was drawn and analyzed, and the area under the curve (AUC) was calculated.StataSE 16 software was used to draw Deeks' funnel plot to determine whether there was publication bias between studies [15], and P > 0.05 was considered as no bias.

Study characteristics and quality assessment
A total of 2134 patients with thymic neoplasms were comprehensively analyzed in the included 13 literatures, including 1603 patients for histologic subtypes classifications (11 studies) and 531 patients for staging classifications (4 studies).The characteristics of retained studies were demonstrated in Table 1.
The data of the included 13 studies were complete.According to QUADAS-2 standard, the quality evaluation and mapping of all the included studies were conducted by RevMan 5.3 software.The main risk of bias came from the process and timing with unclear situation in all 13 studies.The specific quality evaluation results were shown in Fig. 2. For the RQS scale, mean score of included studies was 9.2 (range from 1 to 17).The mean RQS percentage was 25.6%(Table 2).
We performed sensitivity analyses to test how robust were the pooled results from the following aspects.1)By calculating the pooled value using fixed effect model (Mantel-Haenszel), the value remain the same (AUC = 0.855); 2)By omitting one single study in each turn, and our results were stable consequently (AUC ranged from 0.843 to 0.864).
The discordance indexes (I 2 ) of the above five pooled indicators were 67.0%, 75.8%, 65.4%, 49.4% and 55.9%, indicating mainly moderate heterogeneity of studies.Possible sources of heterogeneity were identified by meta-regression analysis model.The included indicators contained the number of patients included in the study (n < 100/≥100), country (Asia/Occident country), study design (single-center/multiple-center), imaging modality (NECT/CECT/MR/PET-CT), and machine learning.The source of significant heterogeneity was not identified by calculation of relative DOR with meta-regression (Table 3).As there was a tendency for significant differences in imaging modality (P = 0.06), the subgroup analysis was used to calculated the diagnostic accuracy of each modality (Table 4).The results showed that all subgroups had high diagnostic efficacy, but there were great differences among various modalities.The best diagnostic imaging was contrast enhanced CT with largest pooled AUC (0.873, 95% CI 0.832-0.914).

Publication bias
The Deeks' funnel plot for subtypes classification was conducted and demonstrated in the Fig. 5.The figure did not show obvious asymmetry with P value > 0.05.This indicated that there was no significant publication bias in the included studies.Publication bias was not assessed for the staging classification, as the small number of studies included (n = 4) might lead to inconclusive funnel plot [29].

Discussion
In the present systematic review, based on current evidence including more than two thousand patients, we found that radiomics has excellent diagnostic performance for risk categorization in thymic neoplasm patients.In addition, the highest predictive accuracy was based on contrast enhanced CT with a pooled AUC nearly 0.900 (0.873, 95% CI 0.832-0.914).Moreover, the researches of radiomics based on MR or PET-CT are still in its infancy, and need to be investigated and confirmed by further studies.
To the best of our knowledge, this is the first systematic review and meta-analysis to overview the diagnostic accuracy of preoperative radiomics model to predict risk classifications in TETs patients.We investigated the risk categorization from two aspects: high-risk histopathological subtypes and high-risk pathological staging, which both predicting complex treatments and worse outcomes [30].We finally included a total of 13 predictive radiomics studies [16][17][18][19][20][21][22][23][24][25][26][27][28].They were all published in the recent three years, with nearly half of them published in the last years (2022).Except for three studies from Switzerland, Japan and the United States [22,25,28], the other studies all came from different provinces of China [16-21, 23, 24, 26, 27].This might be related to the low incidence of TETs and the difficulty of conducting multi-center radiomics studies.All the studies were retrospective nature, and only two [18,19] included data from two medical centers.We used QUADAS-2 and RQS scale to evaluate the quality of literature.because these studies were relatively new and had a basic similar design protocol, the quality of the literature was relatively high according to the QUA-DAS-2 tool.The main risk of bias came from flow and timing, as no studies have reported the time interval between radiomics and postoperative pathology results.There were also a small proportion of studies (3/13,23.1%)reported unclear patient selection [19,20,28] and index test [18,19,27], which might result in a small risk of deviation.A meta-regression was applied to investigate the radiomics-based prediction of low/ high-risk WHO histologic subtypes, however, there was no statistical difference in the results.The sources  be considered from the following aspects: (a) The scanning machines and scanning protocol varied in different institutions, which might influence the image acquisition; (b) Regions of interest were manually delineated in most researches, so there was a subjective component here; (c) Image feature extraction methods and extraction softwares were different; (d) There were multiple approaches for machine learning modeling, such as RF (random forest), SVM (support vector machine), kNN (k-nearest neighbor) and so on.We performed a subgroup analysis of the imaging modality, which showed a differential tendency (P = 0.06), but found that all the methods have good predictive performance, and chest enhanced CT remained the best model.
Because CT is the most commonly used examination of the chest, most of the radiomics were based on CT images.Wang et al. [17] Compared performance of radiomics signatures based on NECT and CECT for predicting high risk thymoma.Both radiomics showed excellent ability for risk categorization with high AUC (CECT 0.827 vs. NECT 0.801, P = 0.365).When compared with radiologists, only CECT-based radiomics signature showed statistically significant However, other researchers ( 16) found that the radiomics features of the NECT scan outperformed CECT in risk grading for anterior mediastinal mass (AUC, CECT 0.741 vs. NECT 0.842).In our previous study, we found that the pooled AUC was slightly higher in the CECT radiomics signature than that of NECT (0.873 vs. 0.809), indicating that CECT-based   better diagnostic accuracy model.The underlying mechanism might be attribute to that the enhanced CT could better reflect the internal heterogeneity of TETs than the unenhanced computed tomography using texture analysis.Sui et al. [16] pointed out that some features, like tumor shape, shape-Spherical Disproportion, was selected from CECT, which probably because the enhancement scan highlighted the outline of the lesions.
In our present study, only one study [21] based on MRI radiomics and one based on PET-CT [28] included.Researches on TETs based on these two types of radiomics were still in its infancy.Xiao et al. [31] Published a pilot study to explore the performance of MRI-based radiomics in risk stratification of TETs.The radiomics signatures demonstrated high AUC values of 0.880 and 0.948 for differentiating WHO highrisk subtypes and advanced staging.The outcomes of MRI-based radiomics studies were promising, particularly suitable for those who were allergic to iodine contrast agent or those who were afraid of radiation.Radiomics based on PET-CT for risk categorization in TETs patients was proved by few studies, however, the expensive cost limited its routine use.
Several non-radiomics indicators were included in the clinical combined radiomics models, such as, gender, age, myasthenia gravis and regular imaging findings (tumor size, pleural effusion, pericardial effusion, infiltration, etc.) [18,19,21,23].Although there was no statistical difference between the combined model and the radiomics model in the prediction efficiency, the absolute values were all improved in the studies.Further preoperative radiomics prediction studies were recommended to combine with clinical indicators.
Although this study provides the first comprehensive investigation of diagnostic performance of radiomics algorithm for risk categorization in TETs patients, there are also some limitations.Firstly, the included studies might be subject to some designed drawbacks, for instance, all studies were retrospective nature and the majority of studies had a small population.Secondly, the overall quality of the included studies was not optimal (mean RQS 25.6%), which might have potential influence of the subsequent analysis.Thirdly, heterogeneity was obvious among these included studies, though no source of heterogenenity was found by meta-regression.Last but not least, publication bias may be another major setback, because unreported non-significant radiomics models might be unavailable for analysis.However, the Deeks' funnel plot suggested no significant evidence of publication bias in our study.According to above reasons, the clinical diagnostic TETs risk categorization tool based on radiomics should be rigorously conducted and evaluated in the future by prospective, multiple-center and well-design radiomics studies.this systematic review and meta-analysis shows promise for preoperative selection of high-risk TETs patients based on radiomics signatures with current available evidence.However, methodological quality in further studies still needs to be improved for feasibility confirmation and clinical application of radiomics-based models in predicting risk categorization of the thymic epithelial tumors.

Fig. 5
Fig. 5 The Deeks' funnel plot asymmetry test for subtypes classification

Table 1
Characteristics of included studies CECT: contrast enhanced computer tomography; NECT: non-contrast enhanced computer tomography; MR: magnetic resonance; PET: positron emission tomography; NA: not available; Risk classification: A, Low risk vs. High risk histological subtype thymic tumor; B, early stage vs. Advanced stage thymic tumor

Table 3
Meta-regression of heterogeneity in included studies

Table 4
Subgroup analysis of low/high risk thymic tumors by different preoperative imagings No.: numbers; AUC: area under the receiver operating characteristic curve; CECT: contrast enhanced computer tomography, NECT: non-contrast enhanced computer tomography; MR: magnetic resonance; PET: positron emission tomography