Radiomic analysis of Gd-EOB-DTPA-enhanced MRI predicts Ki-67 expression in hepatocellular carcinoma

Background Nuclear protein Ki-67 indicates the status of cell proliferation and has been regarded as an attractive biomarker for the prognosis of HCC. The aim of this study is to investigate which radiomics model derived from different sequences and phases of gadolinium-ethoxybenzyl-diethylenetriamine pentaacetic acid (Gd-EOB-DTPA)-enhanced MRI was superior to predict Ki-67 expression in hepatocellular carcinoma (HCC), then further to validate the optimal model for preoperative prediction of Ki-67 expression in HCC. Methods This retrospective study included 151 (training cohort: n = 103; validation cohort: n = 48) pathologically confirmed HCC patients. Radiomics features were extracted from the artery phase (AP), portal venous phase (PVP), hepatobiliary phase (HBP), and T2-weighted (T2W) images. A logistic regression with the least absolute shrinkage and selection operator (LASSO) regularization was used to select features to build a radiomics score (Rad-score). A final combined model including the optimal Rad-score and clinical risk factors was established. Receiver operating characteristic (ROC) curve analysis, Delong test and calibration curve were used to assess the predictive performance of the combined model. Decision cure analysis (DCA) was used to evaluate the clinical utility. Results The AP radiomics model with higher decision curve indicating added more net benefit, gave a better predictive performance than the HBP and T2W radiomic models. The combined model (AUC = 0.922 vs. 0.863) including AP Rad-score and serum AFP levels improved the predictive performance more than the AP radiomics model (AUC = 0.873 vs. 0.813) in the training and validation cohort. Calibration curve of the combined model showed a good agreement between the predicted and the actual probability. DCA of the validation cohort revealed that at a range threshold probability of 30–60%, the combined model added more net benefit compared with the AP radiomics model. Conclusions A combined model including AP Rad-score and serum AFP levels based on enhanced MRI can preoperatively predict Ki-67 expression in HCC. Supplementary Information The online version contains supplementary material available at 10.1186/s12880-021-00633-0.

status of cell proliferation activity which corresponds with tumor biological behavior, treatment efficacy and prognosis [4,5]. Previous studies have demonstrated that high Ki-67 expression was associated with poor overall survival (OS) [6][7][8][9][10], disease-free survival (DFS) [6,9,11], relapse-free survival (RFS) [8,9,12]. In particular, Ki-67 is proposed to be an attractive therapeutic target for cancer because it is highly expressed in most malignant cells but rarely detected in normal cells, though this targeting Ki-67 therapy has not been applied in the clinical [5]. Accurate identification of Ki-67 expression level is crucial for prognosis and treatment decision-making to achieve a satisfactory outcome. However, it is difficult to differentiate the nuances among HCCs with different Ki-67 expression through conventional imaging.
Current radiomics, which involves numerous advanced, quantitative, high-throughput features extracted from medical images, has been used to develop diagnostic, predictive, and prognostic models [13,14]. Previous studies have reported that tumor characteristics at the cellular and genetic levels can be reflected in the phenotypic patterns and subsequently captured by radiomics signatures [15][16][17][18][19][20]. Gadolinium ethoxybenzyldiethylenetriamine pentaacetic acid (Gd-EOB-DTPA), which has characteristics of both a blood-pool agent and a hepatobiliary agent, is commonly used in clinical practice. Previous studies have applied texture analysis on Gd-EOB-DTPA-enhanced MRI to preoperatively predict Ki-67 expression in patients with HCC and indicated that the texture analysis was superior to subjective MRI characteristics determined by radiologists and obtained a good result in predicting Ki-67 expression [21,22]. Although previous studies were valuable, they have not compared predictive performance of radiomics models derived from different sequences and phases based on Gd-EOB-DTPA-enhanced MRI.
Thus, this study aimed to develop and compare predictive performance of radiomics models derived from different sequences and phases based on Gd-EOB-DTPAenhanced MRI, then to further validate the optimal model for preoperative prediction of Ki-67 expression in patients with HCC.

Patients
This is a retrospective study for which ethical approval was obtained and informed consent from patients was waived. Between January 2013 and November 2019, patients who underwent Gd-EOB-DTPA-enhanced MRI examination before surgery or biopsy were consecutively included in this study according to the following inclusion and exclusion criteria. The inclusion criteria were: (1) pathologically confirmed HCC; (2) received Gd-EOB-DTPA-enhanced MRI of the liver within 1 month before surgery or biopsy; (3) images without obvious artifact; (4) if multiple lesions were present, the largest one was selected with matched pathological and immunohistochemical diagnosis. The exclusion criteria were: (1) received previous treatment, such as anti-tumor therapies, radiofrequency ablation, transcatheter arterial chemoembolization (TACE), and so on; (2) incomplete clinical or pathological information. All enrolled patients were randomly divided into training and validation cohorts at a ratio 7:3.

Histopathological examination
The tumor tissue sections were stained using monoclonal mouse anti-human Ki-67 antibody (Beijing Zhongshan Golden Bridge Biotechnology Company, Beijing, China). The Ki-67 expression was evaluated by calculating the frequency of 1 Ki-67-positive cells. Ki-67 was considered positive when the cell nuclei were stained brown yellow. Immunoreactive cells were classified as low Ki-67 expression (≤ 14% immune-reactivity) or high Ki-67 expression (> 14% immune-reactivity) according to previous studies [5,16]. Referring to previous study, we dichotomized histologic subtypes using low-grade tumors and high-grade tumors. Low-grade tumors correspond to well differentiated, well and moderately differentiated, and moderately differentiated HCC. High-grade tumors correspond to moderately and poorly differentiated, poorly differentiated, and undifferentiated HCC.

MRI protocol
The details of MRI protocol and the sequences used in this study were presented in the Additional file 1.

Tumor segmentation
Tumor segmentation was manually performed on (arterial phase, AP), (portal venous phase, PVP), (Hepatobiliary phase, HBP) and T2W images with 3D Slicer (http:// www. slicer. org), and a three-dimensional (3D) region of interest (ROI) that covered the whole tumor was delineated along the border of tumors. HBP or T2W images were first for manual segmentation. Subsequently, AP and PVP images were delineated, as the tumor margins on HBP or T2W images were clearer than that on AP and PVP images. Taking this delineating order would mitigate software-related segmentation errors. The segmentation was independently performed by two radiologists (Y.Y., 10 years of liver imaging experience; Y.F., 8 years of liver imaging experience) in 30 randomly chosen patients to assess inter-observe reproducibility. The segmentation was performed again by the radiologist (Y.F.) at another day to assess the intra-observe reproducibility. The remaining images of patients were segmented by the radiologist (Y.F.). Both radiologists were blinded to the clinical outcomes.

Preprocessing and radiomic features extraction
Before radiomic features extraction, preprocessing of images was performed, including Laplacian of Gaussian (LoG) preprocessing, wavelet transformations, bin discretization and radiomic matrix symmetry. Features extraction was performed using the Slicer Radiomics extension, which incorporates the PyRadiomics library into 3D Slicer [23]. Extracted features included first order statistics, shape and texture features, which were gray level co-occurrence matrix (GLCM), gray level size zone matrix (GLSZM), gray level run length matrix (GLRLM), gray level dependence matrix (GLDM) and neighboring gray tone dependence matrix (NGTDM). Among these features, flatness and least axis from shape features were excluded based on the definition of the feature, as discussed in the documentation of PyRadiomics, and sum average was excluded because it is directly correlated with joint average [24]. Thus, a total of 1,300 radiomic features were extracted for each unique lesion.

Radiomic feature selection and model development
The least absolute shrinkage and selection operator (LASSO) logistic regression with 5-fold cross-validation was used to select the most useful features in the training cohort. Rad-score was calculated for each patient using the linear combination of selected features multiplied by their respective coefficients.

Comparison of radiomics model in the training and validation cohort
These models assessed in the training cohorts were applied to validation cohorts. The Receiver operating characteristic (ROC) curve, Delong test, calibration curve and decision curve analysis (DCA) were utilized to illustrate the diagnostic performances of these constructed models, and the cutoff values were selected according to the Youden index to determine the corresponding sensitivity and specificity.

Combined model development and validation
For the development of combined model, we performed multivariate logistic regression analysis of clinical factors in training cohort, including age, sex, hepatitis B, hepatitis C, cirrhosis, serum alanine aminotransferase (ALT) level, serum aspartate aminotransferase (AST) level, serum gamma-glutamyl transferase (GGT) level, and serum alpha-fetoprotein (AFP) level. Clinical factors that reached statistical significance with P values less than 0.05 were selected into the combined model, which also included the optimal Rad-score.
Calibration curves were adopted to analyze the diagnostic performance of the combined model in both training and validation cohort. Decision curve analysis was conducted to determine the clinical usefulness of the combined model by quantifying the net benefits at different threshold probabilities in the validation cohort.

Statistical analysis
The continuous variables were described as median and interquartile range, and the categorical variables were described as frequency and percentage. D' Agostino-Pearson test was used to test normality of dates. Independent sample t-test or Mann-Whitney U nonparametric rank sum test was used to compare clinical characteristics between the training and validation cohort, and between high Ki-67 expression and low Ki-67 expression groups in the training and validation cohort for continuous variables, while.
the Chi-squared test or Fisher exact test were conducted for categorical variables. Two-sided P values < 0.05 were considered statistically significant. The inter-observer and the intra-observer reproducibility to the extracted features were assessed by the intra-class correlation coefficient (ICC). ICC ≥ 0.8, 0.5-0.79 and < 0.5 indicated high, middle, and low consistency, respectively [25]. LASSO logistic regression, and multivariable logistic regression analysis were performed to select radiomics features and clinical risk factors using the "glmnet" and "rms" package running in R software, version 3.0.1 (http:// www. Rproj ect. org. org). The calibration and decision curve were plotted using the "rms" and "rmda" package. Other statistical analyses were performed using the MedCalc software (Version 16.2.0, https:// www. medca lc. org).

Baseline characteristics
One hundred fifty-one patients were collected, including 103 patients in the training cohort and 48 patients in the validation cohort (Table 1). Baseline characteristics were not significantly different between training and validation cohort. Among all 151 patients, high Ki-67 expression was pathologically diagnosed in 112 patients (74.2%), low Ki-67 expression was pathologically diagnosed in 39 patients (25.8%). In both cohorts, the serum AFP levels and tumor grade were significantly higher in high Ki-67 expression group than that in low Ki-67 expression group. In both cohorts, low-grade tumors were more frequently in patients with low Ki-67 expression group. In the training cohort, the number of patients with hepatitis B in high Ki-67 expression was larger than that in the low Ki-67 expression group (Table 2).

Features selection and radiomics model development
No statistically significant difference was found between the inter-observer or between the intra-observer (P values ranged from 0.691 to 0.815, 0.755 to 0.891). Of texture features, for AP, HBP, and T2W radiomics models, 1300 features were respectively reduced to 12 (Fig. 1a, b), 6, and 12 potential predictors in 103 patients of the training cohort. For VP images, no valuable features were selected by the LASSO regression analysis. Rad-score was calculated for each patient by using the linear combination of selected features multiplied by their respective coefficients. These features were presented in the Radscore calculation formula (Additional file 2).

Comparison of predictive performance among radiomics models in training and validation cohorts
The AUC values, sensitivity, specificity, and accuracy of the AP, HBP, T2W, combined AP and HBP radiomics model in predicting Ki-67 expression in training and validation cohort were in Table 3; Fig. 2. Delong test showed that there was no significant difference in AUC values among AP, HBP, combined AP and HBP, and T2W radiomics models. DCA showed that the curve of AP was generally higher than HBP and T2W radiomics models (Fig. 3), and combined AP and HBP radiomics model did not result in significantly extra benefits compared with the AP radiomics model only (Fig. 4).

Combined model development and validation
The multivariate logistic regression analysis showed that only serum AFP level and AP Rad-socre was associated with Ki-67 expression in the training cohort (P < 0.05). The combined model was constructed with AP Rad-score and serum AFP level. It yielded an AUC value of 0.922 (95% CI 0.852-0.965) in the training cohort and 0.863 (95% CI 0.733-0.94.5) in the validation cohort (Table 3; Fig. 2). Delong test showed   The calibration curves showed a good agreement between predicted and actual events in the training and validation cohorts (Fig. 5a, b). The DCA of the validation  cohort revealed that at a range threshold probability of 30-60 %, the combined model is an optimal decisionmaking strategy to add the net benefit compared with AP radiomics model (Fig. 3).

Discussion
In this study, we compared the predictive performance of AP, HBP, T2W, and combined AP and HBP radiomics models.  the combined radiomics model, respectively. The combined model includes AP Rad-score and serum AFP level. The curve of AP radiomics model was generally higher than that of HBP and T2W radiomics model. Decision curve shows that at a range threshold probability of 30-60 %, the combined model is optimal decision-making strategy to add the net benefit compared with AP radiomics model only for tumor growth. AP images based on enhanced MRI can best demonstrate the information about the neovascularities of tumors. Accordingly, the AP radiomics model added more net benefit in predicting Ki-67 expression of HCC compared with HBP radiomics model. Although a previous study has reported the AP model of Gd-EOB-DTPA-enhanced MRI was inferior, possibly due to artifacts affecting extraction and calculation of textural-based features [26], our study excluded those patients with obvious artifacts caused by transient severe motion (TSM) [27,28].
Radiomics, including texture analysis and other features, such as shape and intensity [29], is considered to be a potential bridge between medical imaging and personalized medicine [30]. In our study, 41 features most relevant for Ki-67 expression were selected. Among these features, 13 were first-order statistics, 28 were texture features including gray level co-occurrence matrix (GLCM), gray level dependence matrix (GLDM), gray level run length matrix (GLRLM), gray level size zone matrix (GLSZM), and neighboring gray tone dependence matrix (NGTDM). Although some scholars have recently published articles on the same topic of using a radiomics model based on Gd-EOB-DTPA-enhanced MRI to predict Ki-67 expression in HCC [21,22], there are many differences in details compared with our study. In the study of Li et al. [21], a single slice with the largest proportion of lesion was delineated, and the predictive performance of models were compared only by misclassification rate. In our study, all slices covering the whole tumor were delineated, and, the predictive performance of different models were compared by AUC values, calibration curve and DCA. In the study of Ye et al. [22], a sum of texture signatures derived from AP, PVP, pre-contrast T1W and T2W images was used to predictive Ki-67 expression by multivariate logistical regression, and predictive performance of radiomics model derived from different phases were not be compared. Although, in the study of Ye et al. [22], the C-index (AUC) of the combined model (AUC = 0.936) was approximately equivalent to that in our study-the AUC value of combined model was 0.922 in the training cohort in our study, the study of Ye et al. incorporated a sum of texture signatures derived from multiple phase into one radiomics model, which was cumbersome in clinical practice. Our study developed and compared predictive performance of radiomics models derived from different sequences and phases, including T2W, AP, PVP, and HBP images, then further validated the optimal model for preoperative prediction of Ki-67 expression in HCC, which obtained a good result and would be feasible for clinical practice. Moreover, both of the previous studies lacked the validation cohort to validate whether their models were overfit.
There are several limitations in this study. Firstly, the sample size is still small compared with the number of included variables, especially the sample size of the low Ki-67 expression group, and our validation cohort was from the single institution as the training cohort, which restricted the generalizability of our findings to other institutions or settings. Secondly, our study compared predictive performances of AP, HBP, and T2W radiomics model of the Gd-EOB-DTPA-enhanced MRI for predicting Ki-67 expression of HCC, however, our study did not compare AP radiomics model of Gd-EOB-DTPAenhanced MRI with Gd-diethylenetriaminepentaacetic acid (Gd-DTPA)-enhanced MRI. Thirdly, there is currently no standardized Ki-67 expression level threshold in HCC, and it may be controversial that we defined 14 % as the cutoff value. In summary, interpreting the complex associations between the biologic processes and radiomics features remains an enormous challenge, although it is in line with the current trend toward precise and personalized medicine.

Conclusions
Our study established and validated a combined model including AP Rad-score and serum AFP level based on enhanced MRI, for predicting Ki-67 expression in HCC patients. It provides a new non-invasive approach for accurate diagnosis.