Skip to main content

Predicting invasion in early-stage ground-glass opacity pulmonary adenocarcinoma: a radiomics-based machine learning approach

Abstract

Background

To design a pulmonary ground-glass nodules (GGN) classification method based on computed tomography (CT) radiomics and machine learning for prediction of invasion in early-stage ground-glass opacity (GGO) pulmonary adenocarcinoma.

Methods

This retrospective study included pulmonary GGN patients who were histologically confirmed to have adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), or invasive adenocarcinoma cancer (IAC) from 2020 to 2023. CT images of all patients were automatically segmented and 107 radiomic features were obtained for each patient. Classification models were developed using random forest (RF) and cross-validation, including three one-versus-others models and one three-class model. For each model, features were ranked by normalized Gini importance, and a minimal subset was selected with a cumulative importance exceeding 0.9. These selected features were then used to train the final models. The models’ performance metrics, including area under the curve (AUC), accuracy, sensitivity, and specificity, were computed. AUC and accuracy were compared to determine the final optimal method.

Results

The study comprised 193 patients (mean age 54 ± 11 years, 65 men), including 65 AIS, 54 MIA, and 74 IAC, divided into one training cohort (N = 154) and one test cohort (N = 39). The final three-class RF model outperformed three individual one-versus-others models in distinguishing each class from the other two. For the multiclass classification model, the AUC, accuracy, sensitivity, and specificity were 0.87, 0.79, 0.62, and 0.88 for AIS; 0.90, 0.79, 0.54, and 0.89 for MIA; and 0.87, 0.69, 0.73, and 0.67 for IAC, respectively.

Conclusions

A radiomics-based multiclass RF model could effectively differentiate three types of pulmonary GGN, which enabled early diagnosis of GGO pulmonary adenocarcinoma.

Peer Review reports

Introduction

Pulmonary adenocarcinoma is the most common form of primary lung carcinoma, with pulmonary ground-glass nodules (GGNs) often observed in association [1, 2]. Computed tomography (CT) is the standard imaging method for GGNs, which show a hazy opacity that does not obscure the underlying bronchial structures or pulmonary vessels [3]. With advancements in CT technology, thinner slice thickness allows for easier detection of small pulmonary nodules. The most common pathological subtypes of pulmonary GGNs are adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), and invasive adenocarcinoma (IAC) [4,5,6,7]. The different pathological subtypes of pulmonary adenocarcinoma are closely associated with treatment principles, surgical methods, and prognosis. Typically, pre-invasive pulmonary adenocarcinoma undergoes limited wedge resection or segmental resection, while IAC usually requires lobectomy to reduce tumor recurrence [8, 9]. Therefore, accurate prediction of the various subtypes of early-stage pulmonary adenocarcinoma is crucial. However, the early diagnosis of ground-glass opacity (GGO) pulmonary adenocarcinoma remains a challenge in clinical situations due to its high histological complexity [10, 11]. Classification of GGNs is necessary for invasion prediction and ensuring the patient receives the most appropriate treatment.

Recently, there has been a surge in research on the diagnosis of pulmonary adenocarcinoma using CT and machine learning methods. Machine learning has demonstrated significant results in biomedical research and is now extensively used in related fields [12,13,14,15,16]. It plays a crucial role in medical imaging applications, including the detection and segmentation of lesions, as well as the diagnosis and assessment of diseases [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]. Amid these advancements, radiomics has emerged as a powerful tool, enabling the extraction of high-throughput data from large amounts of image features in radiographic images [32, 33]. Researches have shown a promising potential of radiomics methods using texture features for the evaluation of GGN invasion, such as attenuation, mass, kurtosis, and entropy [34, 35], and several recent studies in pulmonary adenocarcinoma have underscored the utility of radiomics methods in GGN diagnosis, using traditional machine learning methods or deep learning methods [36,37,38,39,40,41,42]. However, these studies have primarily concentrated on distinguishing between benign and malignant, or invasive and noninvasive nodules, without specifically discriminating among the three subtypes of AIS, MIA, and IAC. Considering previous relevant studies, we consider that employing machine learning methods that incorporate a wider selection of radiomics features is meaningful for classification of pulmonary GGNs.

Therefore, we collected CT data of 193 GGN patients in the present study and obtained 107 radiomics features for each patient. 3 one-versus-others models, and 1 model of three-class classification using random forest (RF) were developed to classify pulmonary GGNs, and features were selected for each model according to feature weights. The final classification model was a multiclass classification model using 20 radiomics features, which could effectively predict invasion of early-stage GGNs.

The highlights of this study were as follows:

  • Integrating CT radiomics, random forest (RF) and feature selection based on Gini importance provides a promising solution for classification of AIS, MIA, and IAC.

  • A CT radiomics-based classification method using random forest (RF) was designed to predict invasion in early-stage GGNs (AUC 0.87, 95% CI: 0.70–0.98 for AIS; AUC 0.90, 95% CI: 0.78–0.99 for MIA; and AUC 0.87, 95% CI: 0.73–0.96 for IAC, respectively).

  • The proposed CT radiomics and RF-based approach offers a rapid, non-invasive method for early diagnosis of GGNs.

The article was structured as follows: The Materials and Methods section detailed the study participants, image acquisition, and radiomics analysis, including preprocessing, feature extraction and selection and model training processes, and statistical analyses. The Results section presented the feature selection results and the performance of the models. The Discussion section interpreted these findings, considering their implications and limitations. Finally, the Conclusion section summarized the key points and suggested potential areas for future research.

Materials and methods

Study participants

This retrospective study was approved by the institutional review board of Huizhou Third People’s Hospital in accordance with local ethics procedures with a waiver for informed consent [2023-KY-014-01]. A total of 16,956 patients with pulmonary GGN who were confirmed by surgical pathology at our hospital from January 2020 to December 2023 were identified, and 193 patients were included in the final analysis, including a training cohort (N = 154) of 52 AIS, 43 MIA and 59 IAC, and one test cohort (N = 39) of 13 AIS, 11 MIA and 15 IAC. The flowchart of inclusion and exclusion is shown in Fig. 1. The inclusion criteria were as follows: (a) pulmonary nodules with a maximum diameter ≥ 5 mm and ≤ 20 mm on baseline CT; (b) partial lobectomy or segmentectomy performed under thoracoscopy, with pathology confirming either in AIS, MIA or IAC; (c) pulmonary nodules appearing as pure ground-glass opacity or part-solid ground-glass nodules on imaging. The exclusion criteria were as follows: (a) multiple or diffuse lesions in the lungs; (b) poor image quality or significant artifacts present; (c) solid component of ground-glass nodules ≥ 50%; (d) history of prior anti-tumor therapy, or metastasis of malignant tumors from other sites to the lungs. All the diagnoses were performed by two experts of lung disease in the neurology department. Patient characteristics are presented in Table 1.

Fig. 1
figure 1

Flowchart of patient inclusion and exclusion. Abbreviations: CT, computed tomography; AIS, adenocarcinoma in situ; MIA, minimally invasive adenocarcinoma; IAC, invasive adenocarcinoma cancer

Table 1 Demographic and clinical characteristics of the population

Image acquisition

All the participants were scanned on a Discovery CT750 HD scanner (GE Healthcare, Milwaukee, WI, USA) or Optima CT 680scanner (GE Healthcare, Milwaukee, WI, USA). CT scanning and reconstruction details are presented in Supplemental Table 1. The CT scan was performed within 1 month prior to surgery, ensuring complete clinical and pathological data. The scan range covered from the thoracic inlet to the lung bases. CT scan parameters were as follows: tube voltage: 120 kV, tube current: auto 120–500 mA, pitch ratio: 0.984:1, detector collimation: 64 × 0.625 mm, gantry rotation time: 0.5 s/rotation, scan slice thickness: 5 mm, field of view: Large Body (50 cm aperture). The reconstruction slice thickness was 0.625 mm, with a reconstruction matrix of 512 × 512 and reconstruction algorithms of Stnd/Bone. The quality of CT images was checked by visual inspection, and images with significant motion artifacts or distortions were excluded. The pulmonary GGN lesions were manually delineated on the CT image by two experienced radiologists, by drawing region of interest (ROI) layer by layer along lesion’s contour.

Radiomics analysis

The overall data analysis procedure of this study is shown in Supplemental Fig. 1. Before feature extraction, CT images and lesion ROIs of each patient underwent preprocessing. The voxel dimensions were resampled to 1 × 1 × 1 mm³ using a linear interpolation algorithm to standardize voxel spacing. Subsequently, Z-score normalization was applied to standardize the CT image intensities. Radiomics features were extracted using python package Pyradiomics [43]. 107 features were calculated in 7 classes (18 based on first order statistics, 14 shape-based, 24 based on gray-level co-occurrence matrices (GLCM), 16 based on gray-level run length matrices (GLRLM), 16 based on gray-level size zone matrices (GLSZM), 14 based on gray level dependence matrices (GLDM), and 5 based on neighboring gray tone difference matrices (NGTDM)).

After extraction, the features were first selected using intraclass correlation coefficient (ICC). ICC was calculated on each pair of two features, and the ones with a value larger than 0.8 were removed. The selected features were normalized to 0 to 1 before input to train 4 models in total, including 3 models of one-versus-others, and 1 model of three-class classification. The predictive models were developed with scikit-learn package using RF [44]. 5-fold cross-validation and elastic net regularization were used for hyperparameter optimization. For each trained model, the Gini importance was calculated for each feature. Specifically, for each feature, we computed its weighted average decrease of Gini coefficient across all node splits in the trees, and then normalization was performed to obtain the relative feature weight. Subsequently, features were sorted in descending order based on their weights, and a feature subset was then selected where the cumulative weight exceeded 0.9. The selected features of each model were used to train a new model for one-versus-others or three-class classification.

Statistical analyses

Pandas and scikit-learn packages were used for statistical analyses. ICC was calculated on each two features for dimension reduction. For each predictive model, diagnostic metrics including area under the curve (AUC), accuracy, sensitivity and specificity were calculated. AUC represents the ability of the model to distinguish between two classes, with a higher AUC indicating better performance. Sensitivity represents the model’s ability to correctly identify positive cases, and specificity refers to its ability to correctly identify negative cases. Diagnostic metrics of the three-class classification model were computed individually for each class. The final classification strategy was determined based on the superior model for each subtype. For AIS, MIA and IAC separately, we compared the accuracy of the three-class classification model and the individual one-vs-others model, to determine which model is better for identifying each subtype. If the accuracies were equal, we further compared the AUC values to determine the superior model.

Results

Feature selection

107 radiomics features were screened by ICC to remove those with a value larger than 0.8. The correlation results of 29 features screened are represented by feature heatmap (Fig. 2). Details of the features are presented in Supplemental Table 2. The features were then selected by feature weights of RF models. 20 most important features of three-class model, 12 of AIS versus others, 17 of MIA versus others, 13 of IAC versus others were selected. The names and weights of the 10 most important features for training each new model are shown in Fig. 3. The complete selected feature list is presented in Supplemental Table 3.

Fig. 2
figure 2

The correlation results of 29 screened radiomics features with intraclass correlation coefficient (ICC) lower than 0.8. Those features with ICC larger than 0.8 were excluded in screening. The rows and columns represent individual features, while each cell shows ICC between the corresponding pair of features. The color gradient indicates the strength and direction of correlation: red denotes a positive correlation, and blue represents a negative correlation

Fig. 3
figure 3

The 10 most important feature names and weights selected for diagnosis models (a) three-class model (b) AIS versus others (c) MIA versus others (d) IAC versus others. Abbreviations: AIS, adenocarcinoma in situ; MIA, minimally invasive adenocarcinoma; IAC, invasive adenocarcinoma cancer

Model performance

The diagnostic metrics of all classification models are shown in Table 2, and Fig. 4 illustrates the corresponding receiver operating characteristic (ROC) curves. For the three-class model to distinguish AIS, the AUC, accuracy, sensitivity and specificity were 0.87, 0.79, 0.62 and 0.88, respectively. Although the accuracy was comparable to that of the AIS versus others model, the AUC slightly surpassed it. In the case of MIA classification, the three-class model achieved an AUC of 0.90, accuracy of 0.79, sensitivity of 0.54, and specificity of 0.89, significantly outperforming the MIA versus others model in both accuracy and AUC. Similarly, for IAC classification, the three-class model exhibited an AUC of 0.87, accuracy of 0.69, sensitivity of 0.73, and specificity of 0.67. While the accuracy was similar to that of the IAC versus others model, the AUC was higher. The three-class model was selected as the final classification model, which simultaneously classified AIS, MIA and AIC with an accuracy of 0.64.

Table 2 Performance of RF classification models
Fig. 4
figure 4

ROC curve of the final RF classification models (a) the three-class model and AUC calculated for each class (b) the 3 one-versus-others models and AUC. Abbreviations ROC, receiver operating characteristic; RF, random forest; AUC, area under curve; AIS, adenocarcinoma in situ; MIA, minimally invasive adenocarcinoma; IAC, invasive adenocarcinoma cancer

Discussion

Our study designed and evaluated a radiomics-based machine learning method of pulmonary GGN classification to predict invasion in early-stage GGO pulmonary adenocarcinoma. In 193 patients with pulmonary GGN, the radiomic-based RF model was able to significantly distinguish among the 3 subtypes, AIS, MIA and IAC. In addition, the performance of three-class model surpassed that of using individual binary classification models, which suggested that simultaneous classification of the three subtypes using the same set of radiomics features was feasible.

Feature selection results revealed that first-order maximum, first-order energy and least axis length were associated with GGN progression, consistently appearing in selected features of the four models. First-order maximum and energy reflect the maximum intensity and uniformity of distribution within the ROI on CT images, and least axis length partially reflects the size of the ROI. Since on CT images, IAC nodules typically manifest as larger and contain more solid components, our results were consistent with established definitions and prior research findings [4,5,6,7, 45]. According to previous studies, the heterogeneity observed in malignant tumors results from various tissue structural changes such as uneven cell density distribution, hemorrhage, necrosis, and mucinous degeneration [46]. However, due to overlapping radiological features among pathological subtypes, distinguishing between AIS, IAC, and MIA remains a challenge. Visual assessment alone may not suffice to identify subtle changes in IAC or to differentiate between IAC and MIA. The high-order features identified in the feature selection results were primarily GLRLM features, such as Run Length Non-Uniformity, Long Run High Gray-Level Emphasis and Short Run High Gray-Level Emphasis. The GLRLM evaluates the discrete distribution of gray levels in an image or stack of images [47]. It describes the roughness or smoothness of the image, reflecting the heterogeneity of tumors. Previous studies have also reported that IAC tends to exhibit greater heterogeneity on CT images compared to pre-invasive lesions [48]. Therefore, our study results indicated that GLRLM features, as indicators of heterogeneity, are valuable for assessing the invasiveness of GGNs.

Over the last few years, radiomics based on CT data has been widely applied in the study of ground glass nodules [34, 35]. Radiomics has been demonstrated to unearth features and patterns within large volumes of imaging data, offering information of both phenotype and microenvironment of lesions [32, 33]. Furthermore, machine learning methods have also been applied in this field for the diagnosis or prediction of GGNs [36,37,38,39, 41, 49]. Shi et al. employed the minimum redundancy maximum relevance (mRMR) feature ranking method and the least absolute shrinkage and selection operator (LASSO) for feature selection, along with a multivariable logistic model, to construct a radiomics model for classifying the differentiation of noninvasive lesions and IACs, achieving an AUC performance of 0.805 [50]. Similarly, Zheng et al. utilized a comparable approach to Shi’s method to develop a radiomics model for distinguishing between IAC and non-IAC lesions, achieving an AUC performance of 0.79 in the test set [51]. Feng et al. applied LASSO and multivariate backward stepwise regression analysis to classify IAC and noninvasive lesions, resulting in a radiomics model with an AUC of 0.814 [52]. These studies have collectively demonstrated the discriminative value of CT radiomics features for distinguishing GGNs of different invasiveness. However, research specifically aimed at differentiating the three subtypes of GGO pulmonary adenocarcinoma is limited. More attention has been focused on discriminating between benign and malignant GGNs. Therefore, a systematic and objective differential diagnosis method must be urgently developed. Furthermore, the results of these studies are not entirely consistent.

In our study, we collected 107 radiomics features of 7 classes, including first order statistics, shape-based, GLCM, GLRLM, GLSZM, GLDM, and NGTDM. When combined with feature selection and machine learning methods, this approach yields straightforward and effective classification models. Random forest is an algorithm that consists of multiple decision trees through ensemble learning, fundamentally belonging to the ensemble learning methods in machine learning. Its results typically demonstrate higher accuracy and generalization performance, making it suitable for predicting disease risks and patient susceptibility [41, 53]. Compared to methods such as support vector machine, back propagation, and K-nearest neighbors, RF method has a reduced risk of overfitting by combining predictions from multiple trees, and is more effective for high-dimensional data, such as radiomics data. Additionally, RF provides feature importance scores that help identify the features most contributing to model predictions. According to previous radiomics studies, among machine learning methods such as random forests, neural networks, linear regression, logistic regression, and least absolute shrinkage and selection operator, etc., the random forest classification method had the highest prognostic performance [33]. Mayerhoefer et al. also noted in their article that while training convolutional neural networks on images for diagnosis outperforms feature selection methods, the limitation is that convolutional neural networks require excessive amounts of training data, making them unsuitable for limited datasets [54]. Several prior studies have focused solely on low-order features. For instance, Lee et al. discovered that characteristics such as smaller lesion size, smaller solid proportion, non-lobulated border, and non-spiculated margin were significant discriminators between IAC and preinvasive lesions [36]. Meng et al. pointed out in their study that assessing the invasiveness of GGN based on morphological categorical variables is not entirely reliable [55]. Some researchers have integrated whole-image features and clinical data alongside radiomics features in benign and malignant GGN classification efforts. However, these studies reported no significant improvement in model performance [56, 57]. Wu et al. also mentioned in their research that the addition of clinical features to the radiomics model did not yield a significant increase in accuracy but led to poor calibration [58]. Therefore, to obtain a more effective combination of radiomics features, we computed a wide range of higher-order texture parameters in our study, which can reflect the characteristics of lesion images across various dimensions to capture their heterogeneity more comprehensively [59].

Our results indicate that larger range of lesion features such as GLDM and GLRLM provide additional diagnostic and predictive value for GGN lesions. In addition, the performance of the three-class model surpassed that of the binary classification models, which may be attributed to several reasons. Firstly, the majority of features selected by the binary classification models were similar, and the high-weight features were also included in the features selected by the three-class model. Therefore, the three-class model could achieve classification performance at least comparable to three binary classification models. Secondly, apart from these similar features, the three-class model selected a few features that were not selected by the binary classification models, which might serve to enhance the model. Thirdly, treating two classes with different degrees of invasiveness as the same category in the binary classification models might lead to unsatisfactory classification results. For the three subtypes of GGN, it typically progresses from AIS to MIA, and then to IAC. In the MIA vs. others model, the two subtypes which have different malignancy levels, AIS and IAC, are combined into one class. This could confuse the model, making it difficult to recognize feature differences and determine classification boundaries, leading to reduced classification performance.

Based on the results, the reason for the misclassified cases was mainly the similarity of the images. In most of the misclassified cases, the nodules had similar imaging features, appearing as clustered low-density nodular shadows. Subtype classification of GGNs is inherently challenging in clinical situations, due to their histological complexity [10, 11]. The difficulty mainly lies in distinguishing between MIA and IAC, which often requires histopathological confirmation to reach a definitive conclusion, and it accounted for the majority of misclassified cases. This is consistent with previous related studies [60].

There were several limitations in our study. Firstly, it was a single-center retrospective study, which may introduce selection bias and limit the generalizability of our findings. The classification models remain to be validated with larger external datasets. Secondly, the manual delineation of GGN lesions could introduce subjective errors and variability. Therefore, using a CNN segmentation model, such as those employed in some studies [61, 62], for lesion segmentation might be worth exploring. However, the actual performance of these models also needs further validation. Thirdly, the sample size was relatively small, which could lead to instability in the performance of the models. Despite the limitation of the current data, our proposed method achieved AUC of 0.87, 0.90, 0.87 for the 3 classes AIS, MIA and IAC, underscoring its potential value in clinical practice. Additionally, we employed feature selection and cross-validation techniques to enhance the robustness of our models. Further work included enlarging the sample size and reducing inter-observer variability in segmentation by automated or semi-automated methods.

Conclusions

Our study proposed a radiomics-based method for prediction of invasion in early-stage GGO pulmonary adenocarcinoma. The final method was a three-class classification RF model using 20 radiomics features. The selected features included low-order features, such as first-order maximum and first-order energy, and high-order features, including Run Length Non-Uniformity, Dependence Variance, etc. The AUC of the model was 0.87(95% CI: 0.70–0.98) for AIS, 0.90(95% CI: 0.78–0.99) for MIA, and 0.87(95% CI: 0.73–0.96) for IAC, respectively. Accuracy for multiclass classification of AIS/MIA/IAC was 0.64. The AUC for classification of IAC was comparable to or higher than those reported in studies those distinguished IAC from non-IAC lesions.

This study, conducted on a limited dataset, achieved acceptable predictive performance, demonstrating that integrating CT radiomics, RF, and Gini importance-based feature selection provides a promising solution for classifying AIS, MIA, and IAC. This approach addressed the shortcomings of previous studies that have predominantly focused on binary classification models with low-order texture features and have seldom explored the differentiation of these three subtypes.

The aim of this study was to design a straightforward pipeline for distinguishing among the three GGN subtypes to predict invasiveness. The proposed method and selected features enabled efficient classification of GGNs, offering a promising approach for quick, non-invasive early diagnosis of pulmonary adenocarcinoma with GGNs, making it potentially suitable for clinical application and capable of enhancing diagnostic efficiency.

Data availability

The datasets used or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

GGN:

Ground Glass Nodule

GGO:

Ground Glass Opacity

AIS:

Adenocarcinoma in Situ

MIA:

Minimally Invasive Adenocarcinoma

IAC:

Invasive Adenocarcinoma Cancer

RF:

Random Forest

AUC:

Area Under the Curve

CT:

Computed Tomography

ROI:

Region of Interest

GLCM:

Gray-Level Co-Occurrence Matrices

GLRLM:

Gray-Level Run Length Matrices

GLSZM:

Gray-Level Size Zone Matrices

GLDM:

Gray Level Dependence Matrices

NGTDM:

Neighboring Gray Tone Difference Matrices

ICC:

Intraclass Correlation Coefficient

References

  1. Kerr KM. Pulmonary adenocarcinomas: classification and reporting. Histopathology. 2009;54(1):12–27. https://doi.org/10.1111/j.1365-2559.2008.03176.x.

    Article  PubMed  Google Scholar 

  2. Kobayashi Y, Mitsudomi T. Management of ground-glass opacities: should all pulmonary lesions with ground-glass opacity be surgically resected? Transl Lung Cancer Res. 2013;2(5):354–63. https://doi.org/10.3978/j.issn.2218-6751.2013.09.03.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Hansell DM, Bankier AA, MacMahon H, McLoud TC, Müller NL, Remy J. Fleischner Society: glossary of terms for thoracic imaging. Radiology. 2008;246(3):697–722. https://doi.org/10.1148/radiol.2462070712.

    Article  PubMed  Google Scholar 

  4. Russell PA, Barnett SA, Walkiewicz M, Wainer Z, Conron M, Wright GM, Gooi J, Knight S, Wynne R, Liew D, et al. Correlation of mutation status and survival with predominant histologic subtype according to the New IASLC/ATS/ERS Lung Adenocarcinoma classification in stage III (N2) patients. J Thorac Oncol. 2013;8(4):461–8. https://doi.org/10.1097/JTO.0b013e3182828fb8.

    Article  PubMed  Google Scholar 

  5. Yoshizawa A, Motoi N, Riely GJ, Sima CS, Gerald WL, Kris MG, Park BJ, Rusch VW, Travis WD. Impact of proposed IASLC/ATS/ERS classification of lung adenocarcinoma: prognostic subgroups and implications for further revision of staging based on analysis of 514 stage I cases. Mod Pathol. 2011;24(5):653–64. https://doi.org/10.1038/modpathol.2010.232.

    Article  CAS  PubMed  Google Scholar 

  6. Mansuet-Lupo A, Bobbio A, Blons H, Becht E, Ouakrim H, Didelot A, Charpentier M-C, Bain S, Marmey B, Bonjour P, et al. The new histologic classification of lung primary adenocarcinoma subtypes is a Reliable Prognostic marker and identifies tumors with different mutation status: the experience of a French cohort. Chest. 2014;146(3):633–43. https://doi.org/10.1378/chest.13-2499.

    Article  PubMed  Google Scholar 

  7. Yanagawa N, Shiono S, Abiko M, Ogata S-y, Sato T, Tamura G. New IASLC/ATS/ERS classification and invasive tumor size are predictive of Disease Recurrence in Stage I Lung Adenocarcinoma. J Thorac Oncol. 2013;8(5):612–8. https://doi.org/10.1097/JTO.0b013e318287c3eb.

    Article  PubMed  Google Scholar 

  8. Zhang Y, Ma X, Shen X, Wang S, Li Y, Hu H, Chen H. Surgery for pre- and minimally invasive lung adenocarcinoma. J Thorac Cardiovasc Surg. 2022;163(2):456–64. https://doi.org/10.1016/j.jtcvs.2020.11.151.

    Article  PubMed  Google Scholar 

  9. Tsutani Y, Miyata Y, Nakayama H, Okumura S, Adachi S, Yoshimura M, Okada M. Appropriate sublobar resection choice for ground glass opacity-dominant clinical stage IA lung adenocarcinoma: wedge resection or segmentectomy. Chest. 2014;145(1):66–71. https://doi.org/10.1378/chest.13-1094.

    Article  PubMed  Google Scholar 

  10. Park CM, Goo JM, Lee HJ, Kim KG, Kang M-J, Shin YH. Persistent pure ground-glass nodules in the lung: interscan variability of Semiautomated volume and attenuation measurements. Am J Roentgenol. 2010;195(6):W408–14. https://doi.org/10.2214/ajr.09.4157.

    Article  Google Scholar 

  11. Ko JP, Rusinek H, Jacobs EL, Babb JS, Betke M, McGuinness G, Naidich DP. Small pulmonary nodules: volume measurement at chest CT—Phantom Study. Radiology. 2003;228(3):864–70. https://doi.org/10.1148/radiol.2283020059.

    Article  PubMed  Google Scholar 

  12. Chandrasekar V, Ansari MY, Singh AV, Uddin S, Prabhu KS, Dash S, Al Khodor S, Terranegra A, Avella M, Dakua SP. Investigating the use of machine learning models to understand the drugs permeability across placenta. IEEE Access. 2023;11:52726–39.

    Article  Google Scholar 

  13. Ansari MY, Chandrasekar V, Singh AV, Dakua SP. Re-routing drugs to blood brain barrier: a comprehensive analysis of machine learning approaches with fingerprint amalgamation and data balancing. IEEE Access. 2022;11:9890–906.

    Article  Google Scholar 

  14. Ansari MY, Qaraqe M, Charafeddine F, Serpedin E, Righetti R, Qaraqe K. Estimating age and gender from electrocardiogram signals: a comprehensive review of the past decade. Artif Intell Med 2023, 146:102690. https://doi.org/10.1016/j.artmed.2023.102690

  15. Ansari MY, Qaraqe M. Mefood: a large-scale representative benchmark of quotidian foods for the middle east. IEEE Access. 2023;11:4589–601.

    Article  Google Scholar 

  16. Ansari MY, Qaraqe M, Righetti R, Serpedin E, Qaraqe K. Enhancing ECG-based heart age: impact of acquisition parameters and generalization strategies for varying signal morphologies and corruptions. Front Cardiovasc Med. 2024;11:1424585. https://doi.org/10.3389/fcvm.2024.1424585.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Han Z, Jian M, Wang G-G. ConvUNeXt: an efficient convolution neural network for medical image segmentation. Knowl Based Syst. 2022;253:109512.

    Article  Google Scholar 

  18. Ansari MY, Mohanty S, Mathew SJ, Mishra S, Singh SS, Abinahed J, Al-Ansari A, Dakua SP. Towards developing a lightweight neural network for liver CT segmentation. 2023; Singapore. Springer Nature Singapore; 2023. pp. 27–35.

  19. Jafari M, Auer D, Francis S, Garibaldi J, Chen X. DRU-Net: An Efficient Deep Convolutional Neural Network for Medical Image Segmentation. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI): 3–7 April 2020 2020; 2020: 1144–1148.

  20. Ansari MY, Mangalote IAC, Masri D, Dakua SP. Neural Network-based Fast Liver Ultrasound Image Segmentation. In: 2023 International Joint Conference on Neural Networks (IJCNN): 18–23 June 2023 2023; 2023: 1–8.

  21. Xie Y, Zhang J, Shen C, Xia Y. CoTr: efficiently bridging CNN and Transformer for 3D medical image segmentation. Medical Image Computing and Computer assisted intervention – MICCAI 2021: 2021// 2021; Cham. Springer International Publishing; 2021. pp. 171–80.

  22. Ansari MY, Abdalla A, Ansari MY, Ansari MI, Malluhi B, Mohanty S, Mishra S, Singh SS, Abinahed J, Al-Ansari A, et al. Practical utility of liver segmentation methods in clinical surgeries and interventions. BMC Med Imaging. 2022;22(1):97. https://doi.org/10.1186/s12880-022-00825-2.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Akhtar Y, Dakua SP, Abdalla A, Aboumarzouk OM, Ansari MY, Abinahed J, Elakkad MSM, Al-Ansari A. Risk Assessment of computer-aided Diagnostic Software for hepatic resection. IEEE Trans Radiation Plasma Med Sci. 2022;6(6):667–77. https://doi.org/10.1109/TRPMS.2021.3071148.

    Article  Google Scholar 

  24. Rai P, Ansari MY, Warfa M, Al-Hamar H, Abinahed J, Barah A, Dakua SP, Balakrishnan S. Efficacy of fusion imaging for immediate post-ablation assessment of malignant liver neoplasms: a systematic review. Cancer Med 2023, 12(13):14225–51. https://doi.org/10.1002/cam4.6089

  25. Ansari MY, Mangalote IAC, Meher PK, Aboumarzouk O, Al-Ansari A, Halabi O, Dakua SP. Advancements in Deep Learning for B-Mode Ultrasound Segmentation: a Comprehensive Review. IEEE Trans Emerg Top Comput Intell. 2024;8(3):2126–49. https://doi.org/10.1109/TETCI.2024.3377676.

    Article  Google Scholar 

  26. Ansari MY, Qaraqe M, Righetti R, Serpedin E, Qaraqe K. Unveiling the future of breast cancer assessment: a critical review on generative adversarial networks in elastography ultrasound. Front Oncol. 2023;13:1282536. https://doi.org/10.3389/fonc.2023.1282536.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Kumar Singh L, Khanna M, singh R. A novel enhanced hybrid clinical decision support system for accurate breast cancer prediction. Measurement. 2023;221:113525. https://doi.org/10.1016/j.measurement.2023.113525.

    Article  Google Scholar 

  28. Singh LK, Pooja, Garg H, Khanna M. An Artificial Intelligence-Based Smart System for Early Glaucoma Recognition Using OCT Images. In: Research Anthology on Improving Medical Imaging Techniques for Analysis and Intervention. edn. Edited by Management Association IR. Hershey, PA, USA: IGI Global; 2023: 1424–1454.

  29. Singh LK, Garg H, Pooja. Automated Glaucoma Type Identification Using Machine Learning or Deep Learning Techniques. In: Advancement of Machine Intelligence in Interactive Medical Image Analysis. edn. Edited by Verma OP, Roy S, Pandey SC, Mittal M. Singapore: Springer Singapore; 2020: 241–263.

  30. Singh LK, Khanna M, Garg H. Multimodal Biometric based on Fusion of Ridge features with Minutiae features and face features. Int J Inform Syst Model Des (IJISMD). 2020;11(1):37–57. https://doi.org/10.4018/IJISMD.2020010103.

    Article  Google Scholar 

  31. Singh LK, Khanna M, Thawkar S, Singh R. Nature-inspired computing and machine learning based classification approach for glaucoma in retinal fundus images. Multimedia Tools Appl. 2023;82(27):42851–99.

    Article  Google Scholar 

  32. Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, Zegers CM, Gillies R, Boellard R, Dekker A, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48(4):441–6. https://doi.org/10.1016/j.ejca.2011.11.036.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Rizzo S, Botta F, Raimondi S, Origgi D, Fanciullo C, Morganti AG, Bellomi M. Radiomics: the facts and the challenges of image analysis. Eur Radiol Experimental. 2018;2(1):36. https://doi.org/10.1186/s41747-018-0068-z.

    Article  Google Scholar 

  34. Yagi T, Yamazaki M, Ohashi R, Ogawa R, Ishikawa H, Yoshimura N, Tsuchida M, Ajioka Y, Aoyama H. HRCT texture analysis for pure or part-solid ground-glass nodules: distinguishability of adenocarcinoma in situ or minimally invasive adenocarcinoma from invasive adenocarcinoma. Jpn J Radiol. 2018;36(2):113–21. https://doi.org/10.1007/s11604-017-0711-2.

    Article  CAS  PubMed  Google Scholar 

  35. Chae HD, Park CM, Park SJ, Lee SM, Kim KG, Goo JM. Computerized texture analysis of persistent part-solid ground-glass nodules: differentiation of preinvasive lesions from invasive pulmonary adenocarcinomas. Radiology. 2014;273(1):285–93. https://doi.org/10.1148/radiol.14132187.

    Article  PubMed  Google Scholar 

  36. Lee SM, Park CM, Goo JM, Lee HJ, Wi JY, Kang CH. Invasive pulmonary adenocarcinomas versus preinvasive lesions appearing as ground-glass nodules: differentiation by using CT features. Radiology. 2013;268(1):265–73. https://doi.org/10.1148/radiol.13120949.

    Article  PubMed  Google Scholar 

  37. She Y, Zhang L, Zhu H, Dai C, Xie D, Xie H, Zhang W, Zhao L, Zou L, Fei K, et al. The predictive value of CT-based radiomics in differentiating indolent from invasive lung adenocarcinoma in patients with pulmonary nodules. Eur Radiol. 2018;28(12):5121–8. https://doi.org/10.1007/s00330-018-5509-9.

    Article  PubMed  Google Scholar 

  38. Hu X, Ye W, Li Z, Chen C, Cheng S, Lv X, Weng W, Li J, Weng Q, Pang P, et al. Non-invasive evaluation for benign and malignant subcentimeter pulmonary ground-glass nodules (≤ 1 cm) based on CT texture analysis. Br J Radiol. 2020;93(1114):20190762. https://doi.org/10.1259/bjr.20190762.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Zhao W, Xu Y, Yang Z, Sun Y, Li C, Jin L, Gao P, He W, Wang P, Shi H, et al. Development and validation of a radiomics nomogram for identifying invasiveness of pulmonary adenocarcinomas appearing as subcentimeter ground-glass opacity nodules. Eur J Radiol. 2019;112:161–8. https://doi.org/10.1016/j.ejrad.2019.01.021.

    Article  PubMed  Google Scholar 

  40. Hu X, Gong J, Zhou W, Li H, Wang S, Wei M, Peng W, Gu Y. Computer-aided diagnosis of ground glass pulmonary nodule by fusing deep learning and radiomics features. Phys Med Biol. 2021;66(6):065015. https://doi.org/10.1088/1361-6560/abe735.

    Article  PubMed  Google Scholar 

  41. Mei X, Wang R, Yang W, Qian F, Ye X, Zhu L, Chen Q, Han B, Deyer T, Zeng J, et al. Predicting malignancy of pulmonary ground-glass nodules and their invasiveness by random forest. J Thorac Dis. 2018;10(1):458–63. https://doi.org/10.21037/jtd.2018.01.88.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Sun Y, Li C, Jin L, Gao P, Zhao W, Ma W, Tan M, Wu W, Duan S, Shan Y, et al. Radiomics for lung adenocarcinoma manifesting as pure ground-glass nodules: invasive prediction. Eur Radiol. 2020;30(7):3650–9. https://doi.org/10.1007/s00330-020-06776-y.

    Article  PubMed  PubMed Central  Google Scholar 

  43. van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, Beets-Tan RGH, Fillion-Robin JC, Pieper S, Aerts H. Computational Radiomics System to Decode the Radiographic phenotype. Cancer Res. 2017;77(21):e104–7. https://doi.org/10.1158/0008-5472.Can-17-0339.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324.

    Article  Google Scholar 

  45. Ohde Y, Nagai K, Yoshida J, Nishimura M, Takahashi K, Suzuki K, Takamochi K, Yokose T, Nishiwaki Y. The proportion of consolidation to ground-glass opacity on high resolution CT is a good predictor for distinguishing the population of non-invasive peripheral adenocarcinoma. Lung Cancer. 2003;42(3):303–10. https://doi.org/10.1016/j.lungcan.2003.07.001.

    Article  PubMed  Google Scholar 

  46. Ganeshan B, Goh V, Mandeville HC, Ng QS, Hoskin PJ, Miles KA. Non-small cell lung cancer: histopathologic correlates for texture parameters at CT. Radiology. 2013;266(1):326–36. https://doi.org/10.1148/radiol.12112428.

    Article  PubMed  Google Scholar 

  47. Galloway MM. Texture analysis using gray level run lengths. Comput Graphics Image Process. 1975;4(2):172–9. https://doi.org/10.1016/S0146-664X(75)80008-6.

    Article  Google Scholar 

  48. Ost DE, Gould MK. Decision making in patients with pulmonary nodules. Am J Respir Crit Care Med. 2012;185(4):363–72. https://doi.org/10.1164/rccm.201104-0679CI.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Liang L, Zhang H, Lei H, Zhou H, Wu Y, Shen J. Diagnosis of Benign and Malignant Pulmonary Ground-Glass nodules using computed Tomography Radiomics parameters. Technol Cancer Res Treat. 2022;21:15330338221119748. https://doi.org/10.1177/15330338221119748.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Shi L, Shi W, Peng X, Zhan Y, Zhou L, Wang Y, Feng M, Zhao J, Shan F, Liu L. Development and Validation a Nomogram Incorporating CT Radiomics Signatures and Radiological features for differentiating Invasive Adenocarcinoma from Adenocarcinoma in situ and minimally invasive adenocarcinoma presenting as ground-glass nodules measuring 5-10 mm in Diameter. Front Oncol. 2021;11:618677. https://doi.org/10.3389/fonc.2021.618677.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Zheng H, Zhang H, Wang S, Xiao F, Liao M. Invasive prediction of Ground Glass Nodule based on clinical characteristics and Radiomics feature. Front Genet. 2021;12:783391. https://doi.org/10.3389/fgene.2021.783391.

    Article  PubMed  Google Scholar 

  52. Feng H, Shi G, Xu Q, Ren J, Wang L, Cai X. Radiomics-based analysis of CT imaging for the preoperative prediction of invasiveness in pure ground-glass nodule lung adenocarcinomas. Insights Imaging. 2023;14(1):24. https://doi.org/10.1186/s13244-022-01363-9.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Sarica A, Cerasa A, Quattrone A. Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer’s Disease: a systematic review. Front Aging Neurosci. 2017;9:329. https://doi.org/10.3389/fnagi.2017.00329.

    Article  PubMed  PubMed Central  Google Scholar 

  54. Mayerhoefer ME, Materka A, Langs G, Häggström I, Szczypiński P, Gibbs P, Cook G. Introduction to Radiomics. J Nucl Med. 2020;61(4):488–95. https://doi.org/10.2967/jnumed.118.222893.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Meng F, Guo Y, Li M, Lu X, Wang S, Zhang L, Zhang H. Radiomics nomogram: a noninvasive tool for preoperative evaluation of the invasiveness of pulmonary adenocarcinomas manifesting as ground-glass nodules. Transl Oncol. 2021;14(1):100936. https://doi.org/10.1016/j.tranon.2020.100936.

    Article  CAS  PubMed  Google Scholar 

  56. Huang W, Deng H, Li Z, Xiong Z, Zhou T, Ge Y, Zhang J, Jing W, Geng Y, Wang X, et al. Baseline whole-lung CT features deriving from deep learning and radiomics: prediction of benign and malignant pulmonary ground-glass nodules. Front Oncol. 2023;13:1255007. https://doi.org/10.3389/fonc.2023.1255007.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Zheng Y, Han X, Jia X, Ding C, Zhang K, Li H, Cao X, Zhang X, Zhang X, Shi H. Dual-energy CT-based radiomics for predicting invasiveness of lung adenocarcinoma appearing as ground-glass nodules. Front Oncol. 2023;13:1208758. https://doi.org/10.3389/fonc.2023.1208758.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Wu G, Woodruff HC, Sanduleanu S, Refaee T, Jochems A, Leijenaar R, Gietema H, Shen J, Wang R, Xiong J, et al. Preoperative CT-based radiomics combined with intraoperative frozen section is predictive of invasive adenocarcinoma in pulmonary nodules: a multicenter study. Eur Radiol. 2020;30(5):2680–91. https://doi.org/10.1007/s00330-019-06597-8.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Jia TY, Xiong JF, Li XY, Yu W, Xu ZY, Cai XW, Ma JC, Ren YC, Larsson R, Zhang J, et al. Identifying EGFR mutations in lung adenocarcinoma by noninvasive imaging using radiomics features and random forest modeling. Eur Radiol. 2019;29(9):4742–50. https://doi.org/10.1007/s00330-019-06024-y.

    Article  PubMed  Google Scholar 

  60. Sakurai H, Nakagawa K, Watanabe S-i, Asamura H. Clinicopathologic features of resected subcentimeter lung cancer. Ann Thorac Surg. 2015;99(5):1731–8.

    Article  PubMed  Google Scholar 

  61. Geng P, Tan Z, Wang Y, Jia W, Zhang Y, Yan H. STCNet: alternating CNN and improved transformer network for COVID-19 CT image segmentation. Biomed Signal Process Control. 2024;93:106205.

    Article  Google Scholar 

  62. Geng P, Lu J, Zhang Y, Ma S, Tang Z, Liu J. TC-Fuse: a transformers Fusing CNNs Network for Medical Image Segmentation. CMES-Computer Model Eng Sci. 2023;137(2):2001–23.

    Google Scholar 

Download references

Acknowledgements

We thank Huizhou Science and Technology Bureau for supporting this study.

Funding

This work was supported by Huizhou Science and Technology Bureau [grant number 221014156941115] and Guangzhou Science and Technology Bureau [grant number 2024A03J1028].

Author information

Authors and Affiliations

Authors

Contributions

J.B. contributed to the study design, data analysis, and was a major contributor in writing the manuscript. M.H. contributed to statistical analyses, interpretation, figures visualization. S.T. and X.S. contributed to data acquisition, collation, and labeling pulmonary nodules. Y.Y. contributed to conceptual and financial support for the research and the study design. Y.L. provided valuable opinions on the study design and data analysis. M.W. provided significant revisions to the manuscript, and was another major contributor in writing the manuscript. All authors reviewed the final manuscript.

Corresponding author

Correspondence to Junjie Bin.

Ethics declarations

Ethical approval and consent to participate

Ethical approval was obtained by IRB of Huizhou Third People’s Hospital in accordance with local ethics procedures with a waiver for informed consent [2023-KY-014-01].

Clinical trial number

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bin, J., Wu, M., Huang, M. et al. Predicting invasion in early-stage ground-glass opacity pulmonary adenocarcinoma: a radiomics-based machine learning approach. BMC Med Imaging 24, 240 (2024). https://doi.org/10.1186/s12880-024-01421-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12880-024-01421-2

Keywords