Differentiation of Pseudoprogression from True Progressionin Glioblastoma Patients after Standard Treatment: A Machine Learning Strategy Combinedwith Radiomics Features from T1-weighted Contrast-enhanced Imaging

Background Based on conventional MRI images, it is difficult to differentiatepseudoprogression from true progressionin GBM patients after standard treatment, which isa critical issue associated with survival. The aim of this study was to evaluate the diagnostic performance of machine learning using radiomics modelfrom T1-weighted contrast enhanced imaging(T1CE) in differentiating pseudoprogression from true progression after standard treatment for GBM. Methods Seventy-sevenGBM patients, including 51 with true progression and 26 with pseudoprogression,who underwent standard treatment and T1CE, were retrospectively enrolled.Clinical information, including sex, age, KPS score, resection extent, neurological deficit and mean radiation dose, were also recorded collected for each patient. The whole tumor enhancementwas manually drawn on the T1CE image, and a total of texture 9675 features were extracted and fed to a two-step feature selection scheme. A random forest (RF) classifier was trained to separate the patients by their outcomes.The diagnostic efficacies of the radiomics modeland radiologist assessment were further compared by using theaccuracy (ACC), sensitivity and specificity. Results No clinical features showed statistically significant differences between true progression and pseudoprogression.The radiomic classifier demonstrated ACC, sensitivity, and specificity of 72.78%(95% confidence interval [CI]: 0.45,0.91), 78.36%(95%CI: 0.56,1.00) and 61.33%(95%CI: 0.20,0.82).The accuracy, sensitivity and specificity of three radiologists’ assessment were66.23%(95% CI: 0.55,0.76), 61.50%(95% CI: 0.43,0.78) and 68.62%(95% CI: 0.55,0.80); 55.84%(95% CI: 0.45,0.66),69.25%(95% CI: 0.50,0.84) and 49.13%(95% CI: 0.36,0.62); 55.84%(95% CI: 0.45,0.66), 69.23%(95% CI: 0.50,0.84) and 47.06%(95% CI: 0.34,0.61), respectively. Conclusion T1CE–based radiomics showed better classification performance compared with radiologists’ assessment.The radiomics modelwas promising in differentiating pseudoprogression from true progression.

Sun et al. BMC Med Imaging (2021) 21:17 Background Glioblastoma multiforme (GBM) is the most common primary malignant brain tumor in adults. Although maximal safe surgical resection followed by concurrent chemoradiotherapy (CCRT) with temozolomide (TMZ) and adjuvant TMZ has been a standard treatment, the prognosis of GBM patients is still very poor. Specially, the median overall survival ranges from 14 to 16 months, and the 2-year survival rate is only 26-33% [1,2]. To improve this situation, the early and accurate diagnosis of postoperative progression has become very critical because it can directly influence the optimal therapy schemeselection associated with patient survival.
However, the pseudoprogression is a treatment-related change within 12 weeks after the completion of CCRT, including inflammation, radiation effects, ischemia and increased vascular permeabilityand contrast enhancement on MR imaging [3]. Both the true progression and pseudoprogression exhibit progressive enlargement and new enhancement within the radiation field. It is also difficult to differentiate them with conventional MRI sequences because pseudoprogression can mimic true progression in terms of tumor location, morphology, and enhancement patterns [4]. However, their treatments and prognosis are completely different [5]. Generally, pseudoprogressionshows better outcomes and overall survival without invasive treatment [2]. According to the Response Assessment in Neuro-Oncology (RANO) criteria [3], the current strategy to distinguish pseudoprogression from true progression heavily depends on continuous follow-up MRI examinations. Where, it may take several months to obtain a reliable diagnosis, resulting in the delay or inappropriate management of progressed GBM patients [6]. Moreover, studies by Ellington et al. [7] have shown that once tumor recurrence occurs, there is no consensus on its treatment standard. Then, even if the most aggressive treatment is adopted, it is expected that there will be no significant survival benefit. Therefore, it is crucial to develop an effective method to differentiate pseudoprogression from true progression as early as possible.
Although advanced MR imaging techniques, including diffusion-weighted imaging (DWI), perfusion-weighted imaging e.g. arterial spin labeling (ASL), dynamic contrast-enhanced MRI(DCE) anddynamic susceptibility contrast perfusion MRI (DSC) andmagnetic resonance spectroscopy (MRS), have been demonstrated to be promising in differentiating pseudoprogression from true progression, there are still limitations for them.First, the lesions were measured on the basis of a single slice region of interest(ROI)or the hot-spot method, leading to theincompleteassessment of tumors [8,9]. Second, the limited image information applied in these studies cannot fully address tumor heterogeneity. Third, excessive parameters and time-consuming post-processing limit their clinical applications [10,11]. Besides, advanced sequences highly depend on the performance of the scanner and are not available in all hospitals.Thus, it is urgent to develop a user-friendly protocol for the early and comprehensive differentiation ofpseudoprogression from true progression.
Recently, the term radiomics,byextracting a large number of quantitative image features combined with machine learning algorithms, radiomics can provide information that is difficult to perceive by visual inspection to guide clinical decision-making, has attracted increased attention in the medical field, especially in tumor research for diagnosis, staging and prognosis [12][13][14][15]. Theradiomics strategy hasalso been used to identify pseudoprogression and true progression [16][17][18]. However, most of them were largely focused on advanced MR techniques, andthe varied post-processing models, varied interpretation and uniform standards for evaluation restricted their clinical applications. In contrast, T 1 CE is widelyused in almost all hospitalsfor thediagnosis and follow-upof GBM patients. Thus, developing an effective T 1 CE based radiomics model to differentiate pseudoprogression and true progressionwill have great potential in clinic.
In this study, we aimed to evaluate the diagnostic power of T 1 CE imaging radiomics-based machinelearning in differentiating pseudoprogression from true progression inGBM patients after standard treatment.The diagnostic power of radiomics model was further compared with that of radiologists' assessment.

Patient population
This study was approved by our institutional review board, and the requirement for informed consent was waived based on its retrospective nature. One hundred thirty-one pathologically confirmed primary GBM patients were retrospectively enrolled from May 2014 to February 2017 in Tangdu hospital.
The inclusioncriteria were as follows: (1) GBM patients underwent gross total resection or subtotal resection of the lesion; (2) routine MRI was performed within 48 h after surgery, including T 1 -weighted imaging (T 1 WI) Keywords: Glioblastoma, MRI, Pseudoprogression, Radiomics, Texture feature, Machine learning and contrast-enhanced T 1 WI; (3) the patients underwent standard treatment (CCRT with TMZ and six cycles of adjuvant TMZ after surgery); (4) the patients underwent a second round of MR imaging within 2 months after CCRT with TMZ, and the third follow-up MRI examination was obtained at 6 months after CCRT [19]; (5) the patients did notreceivecorticosteroidtreatment3 days before each MRI examination; (6) the patients had new or enlarged enhancement within the radiation field on the second follow-up MR images; and (7) thepatientswere confirmed to havetrue progression or pseudoprogression through pathology after the second surgery or clinical radiologic follow-up.
Fifty-four patients were excluded for the following reasons: (1) absence of new or enlarged enhancement at the end of radiation therapy with concurrent TMZ (n = 15); (2) lack ofstandardized treatment schedules after surgery (n = 10); (3)poor image quality or motion artifacts (n = 11); and (4) lack ofcomplete clinical radiological follow-up or pathological evidence (n = 18).
Finally, 77 patients were included and confirmed to have true progression (n = 51) or pseudoprogression (n = 26). Thirteen patients with true progression and 2 patients with pseudoprogression were confirmed by pathology of the reoperation samples. The other 2 patients died of GBM-related complications within 9 months and were also classified intothe true progression group. The other patients with true progression (n = 36) or pseudoprogression (n = 24) according to the RANO criteria [3]. The details of the patient enrollment are shown in Fig. 1.

Image Acquisition
The MRI protocol was performed on a 3.0 T MRI scanner (MR750, GE Healthcare, and Milwaukee, Wisconsin, USA) with an 8-channel head coil (General Electric Medical System). Preoperative and the follow-up MR images were collected including axial T 1 -weighted imaging (T 1 WI), T 2 -weighted imaging (T 2 WI), fluid-attenuated inversion recovery (FLAIR) and T 1 -weighted contrastenhanced imaging (T 1 CE).

Segmentation of the volume of interest(VOI)
The research pipeline, including image preprocessing, feature extraction, feature selection and radiomics model building is depicted in Fig. 2.Two neuroradiologists (L.F.Y., with 12 years of experience and Y.Z.S., with 10 years of experiencein neuro-oncology imaging) independently reviewed all images. A third senior neuroradiologist (G.B.C., with 25 years of experience in brain tumor imaging) re-examined the images and determined the finalclassificationwhen inconsistencies existed between the two neuroradiologists. In assessing whether the lesion progressed after complete resection, the preoperative image features of the tumor would affect the results. Thus, the preoperative image features of the tumor were also observed and characterized based on the criteria outlined in Additional file 1: Table S1.
The VOIs were semi-automatically segmented by the two neuroradiologists(L.F.Y. and Y.Z.S.)using ITK-SNAP (version 3.6, http://www.itk-snap.org). The VOIs covering the enhanced lesion were drawn slice by slice on T 1 CE, avoiding the regions of macroscopicnecrosis, cystic, edema and non-tumor macrovessels, at the second follow-up MR imaging within 2 months after standard treatment [20].

Feature Extraction
A series of texture featureswere involved in this study, including 42 histogram features, 11 Gy level size zone matrix (GLSZM) texture features, 10 Haralick features, 144 Gy level co-occurrence matrix (GLCM) texture features and 180 run-length matrix (RLM) texture features of the original images. The after 25 times Gabor and Haarwavelettransforming. Then, a total of 9675 features were extracted from the T 1 CE images using Analysis-Kinetics (A.K., GE Healthcare) software.The aforementioned features were used here because they were found to be relevant for distinguishing glioma grades in our previous study [14].

Feature Selection
After normalization, the highly redundant and correlated features were subjected to a two-step feature selection procedure. First, highly correlated features were eliminated using Pearson correlation analysis, with anr threshold of 0.75. Then, a random forest (RF) classifier consisting of a number of decision trees was used to rankthe feature importance. Specially, each node in the decision trees is a condition on a single feature, designed to split the dataset into two and similar response values will end up in the same set. The measurement based on which the (locally) optimal condition is determinedis called impurity. When training a tree, how much each feature decreases this weighted impurity in the tree can be computed. Furthermore, for a forest, the impurity decrease ofeach feature can be averaged across the trees, and then used to rank the features, i.e. features importance. In our study, the Gini impurity decrease was used as the criterion to evaluate the feature importance for feature selection.

Radiomics Model Building
After feature ranking, the 50 most important features were fed into a conditional inference RF classifier for model fitting [21]. The synthetic minority oversampling technique (SMOTE) strategy was used to address the data imbalance issue [22].Five-fold cross validation method was employed for tuning the hyperparameter and performed 3 times to avoid bias and overfitting as much as possible. Then these results were averaged to get the final performance.
The accuracy, sensitivity and specificityof the receiver operating characteristic (ROC)were computed to evaluate the constructed radiomics model.

Radiologists'assessment
To compare the efficacies of radiologists' assessment and radiomics modelin differentiating pseudoprogression from true progression, the images were also evaluated by three junior neuroradiologists (Q.T., G.X. and Y.H., with 8, 7 and 7 years of experience in neuroradiology, respectively) using the second follow-up MR images when new or enlarged enhanced lesions were observed within the radiation field.The neuroradiologists were blinded to the clinical information but were aware that the tumors showedeither pseudoprogression or true progression, without knowing the exact category each patient fell in. Each readers independently assessed only the T 1 CE images and recorded a final diagnosis using a 4-point scale (1 = definite pseudoprogression; 2 = likely pseudoprogression; 3 = likely true progression; and 4 = definite true progression) [23].

Statistics
For comparisons of the differences in clinical characteristics between thepseudoprogression and true progression groups, Fisher's exact test or thechi-square test wasused for the categorical variables, and unpaired Student's t test was used for continuous variables. These were performed by using SPSS 20.0 software (SPSS Inc., Chicago, IL, USA). P value < 0.05 was considered to indicate statistical significance.
Radiomics model construction was performed using R version 3. 4. 2 (R Foundation for Statistical Computing). The 'RF' , 'caret' and 'unbalanced'R packages were used for feature selection and SMOTE, respectively. Thediagnostic performance of the radiomics model was assessedby using the accuracy, sensitivity, specificity.The samevalues of the three readers for differentiating pseudoprogression from true progression were also calculated and compared with the radiomicsmodel.

Patient Characteristics and Qualitative MR Assessment
The patient characteristics are summarized in Table 1.
In addition, the diagnostic powers of preoperative image features in differentiatingpseudoprogression from true progression were summarized in Additional file 1: Table S2. The side of the tumor exhibited statistically significant (P = 0.023), and the location of the tumor had a tendency towards statistical significancebetween-group difference (P = 0.053). Figures 3 and 4 demonstrate representative patients withpseudoprogression and true progression onT1CE imaging, respectively. The pseudoprogression case (Fig. 3), in the absence of more interventions, showed a strengthened extent of the lesion and a reduced degree of enhancement. The case of true progression (Fig. 4) showed a marked increase in the extent of the enhanced lesions, which was confirmed by secondary surgical pathology as tumor recurrence. Figure 5 depicts the relative importance of the top 50 featuresbased on the Gini index. In the present study, 92% (n = 46) of the key features in the radiomics model were wavelet features. Twenty-two of the top 50 texture features had significant differences between the true progression group and the pseudoprogression group (Table 2).

Quantitative MR Texture Analysis
These optimal features included1 GLSZM texture feature, 6 histogram texture features, 19 GLCM texture featuresand 24 RLM texture features. The details of the optimal feature subsets are provided in Additional file 1: Table S3.The RLM texture features accounted forthe highest proportion of the top 50 features, among which Short Run Emphasis_angle45_offset1_LHHL was the most relevant feature and was significantly lower in patients with true progression than in patients with pseudoprogression ( Table 2). The GLCM texture feature was the second most dominant featurecomputed from T 1 CE (Fig. 5)and was significantly higher in patients with pseudoprogression than in patients with true progression ( Table 2). The histogram feature and GLSZM texture feature were the least relevantof the top 50 features. Skew-ness_LHLH and low intensity small area emphasis were the fourth and ninth most relevant features (Fig. 5) and were significantly lower in patients with true progression than in patients with pseudoprogression (Table 2). Low intensity small area emphasisindicated that hypointense zones were more likely to be present inpseudoprogression patients. The above results indicated that lesions with a relatively homogenous appearance were associated with pseudoprogression.

Comparison of the diagnostic performance between theradiomicsmodeland the radiologists' assessment
The ROC curve in Fig. 6 indicated that the radiomics model hasbetter diagnostic performance than the radiologists' assessment.

Discussion
In this study, none of the pretreatment clinical characteristics showed significant difference between the two groups.In addition, according to the results of preoperative imaging characteristics analysis,only the side of the tumor was statistically significantand the location of the tumor had a tendency towards statistical significance between two groups (Additional file 1: Table S2). The  results may be related to the small sample size and data imbalance, we will observe the results in future research. The ability of quantitative radiomics features based on T 1 CE imagesto differentiate pseudoprogression from true progressionin patients with GBM after CCRTwas investigated in the current study.When combined with RF classifier, the radiomics model achieved relatively gooddiagnosis performance with higher ACC (72.78%) and sensitivity (78.36%) than radiologists' assessment.
Regarding the top 50 most important features selectedby using the Gini index as a metric,most of them were RLM (n = 24) and GLCM (n = 19) features. The RLM mainly reflects the roughness and directionality of the texture.The GLCM reflects the intensity of the spatial distribution [24].The histogram features (n = 6) and GLSZM texture feature (n = 1) were also played an important role in identifying pseudoprogression and true progression.The ninth important feature of low intensity small area emphasis indicated that hypointense zones were more likely to be present in pseudoprogression patients. Previous literature reports have shown that low intensity small area emphasis may reflect fibrinoid necrosis, oligodendroglial injury and glial cell hyperplasia [11,25]. The higher the valuewas the greater the probability of pseudoprogression, which appears as a low-signal region. On the contrast,recurrent GBM was characterized by vascular proliferation and a disrupted blood-brain barrier, leading to the high signal intensity in the T 1 CE image  caused by contrast agent leakage [11,26]. Above texture features mainly reflect the tumor heterogeneity and complexity of components based on voxel-based changes in grayscale [27]. Specially, the Haralick features were not in the top 50 features, whichprobablysuggested that these two groups of features were not effective in distinguishing pseudoprogression from true progression and needed to be verified in future research.
Moreover, it can be observed that, in our study, 92% (n = 46) of the key features in the radiomics model were Gabor filtered wavelet features.The use of high-dimensional feature helps to improve the performance of the model.This finding demonstrates thatthe wavelet features can provide more information about the tumor invisible to the eye, so as to better assess treatment response [28,29].
Previous studies have used low-dimensional features coupled witha few pieces of information from multiparametric histograms [16]orSVM classification based on DCE MRI to differentiate pseudoprogression from true progression [30]. Although these studies achieved good results in differentiating pseudoprogression from true progression inGBM patients with standard treatment, there were still certain disadvantages. First, the samples and quantitative features in previous studies were relatively small, especially the relatively small number of pseudoprogression patientswithout proper handling, which might have overshadowed their statistical results [16]. Second, previous studies were mostly based on advanced MR sequences that were of much equipment dependent and may hamper its application in some primary hospitals.
To the best of our knowledge, there is no published study in the literature comparing the radiomics model with radiologists' assessment for distinguishing pseudoprogression from true progression. In our study, the radiomics model demonstrated betterdiagnostic performance than the radiologists' assessment. It suggested that our radiomics model may have the potential to help clinicians make an earlier judgment for patients in whom a "wait and see" approach may be the most appropriate.

Study limitations
Several limitations of the current study should be addressed. First, the sample size was still small,so there may be a risk of overfitting. In order to solve the problem of small sample size and overfitting risk, we adopted the following methods: 1)25 times Gabor and wavelet transformations were performed on the features extracted from the original images. 2) five-fold cross validation was employed for tuning the hyperparameter and was performed 3 times to avoid bias and overfitting as much as possible.3) the SMOTE strategy was used to address the data imbalance issue, especially the sample size of pseudoprogression was relatively small. Moreover, Bum-Sup Jang et al. built a radiomics model by machine learning algorithm differentiating pseudoprogression from true progression with the total amount of sample they used was 78 cases [31]. In the future, a much larger dataset needs to be investigated to validate the robustness and reproducibility of the currently proposed radiomics model.Second, molecular alterations, such as isocitrate dehydrogenase (IDH) mutation and oxygen 6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status, were not included in this study. The recently published 2016 WHO classification of brain tumors incorporated genetic parameters into the classical histopathological findings. These genetic alterations have   6 Graph shows receiver operating characteristic curve for radiomics model in differentiating pseudoprogression from true progression versus the radiologists' assessment in GBM patients after CCRT