Prenatal prediction of neonatal respiratory morbidity: a radiomics method based on imbalanced few-shot fetal lung ultrasound images

Background To develop a non-invasive method for the prenatal prediction of neonatal respiratory morbidity (NRM) by a novel radiomics method based on imbalanced few-shot fetal lung ultrasound images. Methods A total of 210 fetal lung ultrasound images were enrolled in this study, including 159 normal newborns and 51 NRM newborns. Fetal lungs were delineated as the region of interest (ROI), where radiomics features were designed and extracted. Integrating radiomics features selected and two clinical features, including gestational age and gestational diabetes mellitus, the prediction model was developed and evaluated. The modelling methods used were data augmentation, cost-sensitive learning, and ensemble learning. Furthermore, two methods, which embed data balancing into ensemble learning, were employed to address the problems of imbalance and few-shot simultaneously. Results Our model achieved sensitivity values of 0.82, specificity values of 0.84, balanced accuracy values of 0.83 and area under the curve values of 0.87 in the test set. The radiomics features extracted from the ROIs at different locations within the lung region achieved similar classification performance outcomes. Conclusion The feature set we designed can efficiently and robustly describe fetal lungs for NRM prediction. RUSBoost shows excellent performance compared to state-of-the-art classifiers on the imbalanced few-shot dataset. The diagnostic efficacy of the model we developed is similar to that of several previous reports of amniocentesis and can serve as a non-invasive, precise evaluation tool for NRM prediction.


Background
Neonatal respiratory morbidity (NRM), mainly including respiratory distress syndrome (RDS) and transient tachypnea of the newborn (TTN), is a leading cause of morbidity and mortality in the preterm and early term [1]. The morbidity of NRM is correlated with fetal lung maturity [2]. Newborns with NRM are born with respiratory distress and even apnoea, which may lead to multiple complications, or even death. Glucocorticoids are used to treat fetuses at high risk of NRM to promote fetal lung maturation and can significantly reduce morbidity and mortality. However, recent studies have shown that glucocorticoid treatment has some side effects, such as short-term fetal heart rate variability (HRV) and fetal movements [3]. An accurate prenatal prediction of NRM is essential to avoid the overuse of glucocorticoids in normal fetuses.
Amniocentesis is an effective method for the prenatal prediction of NRM by assessing fetal lung maturity [4]. However, it is an invasive detection method with complicated and time-consuming operations and no uniform threshold for the prediction. Currently, amniocentesis is rarely used to make prenatal predictions. Instead, gestational age (GA) is usually assessed to make the prediction. Fetuses assessed to be born at 28-36.6 weeks are regarded as having a high risk of NRM because of fetal lung immaturity and will be treated with glucocorticoids. There is a high rate of false positives in view of NRM morbidity, which will cause side effects in newborns. In this context, it is particularly important to develop an accurate and non-invasive method for the prenatal prediction of NRM.
Ultrasound is a non-radiation and non-invasive technology that is widely used in prenatal diagnosis. The use of fetal lung ultrasound images to predict NRM as alternative to amniocentesis has been considered a useful method in recent studies [5]. In a recent study, quantitative texture analysis of fetal lungs (quantusFLM) was used to predict NRM [6]. The study was based on the European population and no related study for Asian populations. Moreover, the feature set used in their study only includes textural features and GA. There is suggestive evidence that gestational diabetes mellitus (GDM) in pregnant women may have adverse effects on lung development [7,8]. On the other hand, due to low morbidity, NRM newborns, especially preterm and early-term newborns, are hard to obtain. The dataset for the study is usually imbalanced and few-shot. This phenomenon was not mentioned in their study. It is worth noting that imbalanced and few-shot datasets are common in clinical practice and will bring overfitting and bias, resulting in poor generalization for the classification model.
The purpose of this study was to develop a non-invasive method for the prenatal prediction of NRM based on the radiomics method with an imbalanced few-shot fetal lung ultrasound image dataset collected from Asian population. Fetal lungs were delineated as the region of interest (ROI), and radiomics features were designed and extracted from the ROI. Feature selection was performed to select representative radiomics features and combining with GA and GDM for modelling. The modelling method of data augmentation, cost-sensitive learning, ensemble learning, Random Under-Sampling with AdaBoost (RUS-Boost) [9] and Synthetic Minority Oversampling Technique (SMOTE) with AdaBoost (SMOTEBoost) [10] were used to address the problems of imbalance and few-shot. Finally, the diagnostic efficacy of the model we developed was found to be similar to that of previous reports of amniocentesis.

Workflow
The workflow for the entire study is summarized in Fig. 1. It can be divided into three parts: image acquisition and lung segmentation, feature extraction and selection, model building. First, for each acquired fetal lung ultrasound image, the ROI inside the fetal lung is delineated by one physician and confirmed by another physician. Then, 308 radiomics features are extracted in the ROI of each image. Feature selection is performed on these radiomics features to select the most valuable features. Finally, the selected radiomics features are combined with the clinical features as the input to the classifier. With building and comparing classification models with different methods, the best model is finally selected to predict NRM.

Patients
From July 2018 to August 2019, a total of 261 fetal lung ultrasound images from 261 singleton pregnant women with GAs ranging from 28.0 to 38.6 weeks were collected from Obstetrics and Gynecology Hospital Affiliated to Fudan University, Shanghai, China. The flowchart for the study population is shown in Fig. 2. Pregnant women who met the following criteria were enrolled in the study: (1) singleton pregnancy; (2) those with complete medical information who had undergone maternity examination and subsequent delivery in our hospital; (3) fetuses with no known congenital malformation or chromosomal abnormality; (4) those with no diabetes before pregnancy; and (5) those who had not been prescribed steroids before delivery. Finally, a total of 210 singleton pregnant women with 210 fetal lung ultrasound images were enrolled in our study and randomly divided into the training set and test set at a ratio of approximately 8:2. It is worth noting that we kept the same proportion of NRM and normal in both sets. The training set contains 167 images, of which 40 are NRM and 127 are normal. The test set contains 43 images, of which 11 are NRM and 32 are normal.
This study was approved by the Ethics Committee of Obstetrics and Gynecology Hospital Affiliated to Fudan University, Shanghai, China. All data were collected and used with the consent of the pregnant women.

Image acquisition and lung segmentation
All ultrasound images were obtained during routine prenatal ultrasound examinations within 72 h before delivery and performed by a radiologist with over 8 years of experience in obstetrics and gynaecology ultrasound imaging. The WS80A ultrasound system (Samsung, Korea) was used in this study for imaging. One scanner was used in this study: the Samsung CA1-7A curved array probe (frequency range 1.0-7.0 MHz, center frequency: 4.0 MHz).
Fetal lung ultrasound image acquisition was achieved using a transverse view of the fetal thorax at the level of the four-chamber view of the heart. The probe was adjusted to ensure that at least one of the lungs had no obvious acoustic shadowing from the fetal ribs. In order to obtain optimal image quality, the acquisition parameters, including depth, gain, frequency, time-gain compensation, and harmonics, were adjusted according to the relevant features of each pregnant woman and fetus. All the images were collected and stored in DICOM format (.dcm) for offline analysis. Figure 3 shows the manual delineation of the lung regions in the ultrasound images of a normal fetus and a fetus with NRM, respectively. All ROIs were selected in the homogeneous area inside the lung, with no vascular or rib shadows. It should be noted that the manual delineation of each fetal lung was delineated by one physician, which was reviewed and confirmed by another physician, both of whom were blinded to the medical histories of the pregnant women and neonatal outcomes.

Feature extraction and selection
The feature design is the basis for building a practical and generalizable classification model. For ultrasound fetal lung images, the feature set should reflect subtle texture Fig. 1 The Workflow of the entire study. Stage I: For each acquired fetal lung ultrasound image, the ROI inside the fetal lung is delineated by one physician and confirmed by another physician. Stage II: 308 radiomics features are extracted in the ROI of each image. Feature selection is performed on these radiomics features to select the most useful features. Stage III: the selected radiomics features are combined with the clinical features as the input to the classifier. With building and comparing classification models with different methods, the best model is finally selected to predict the risk value of NRM information in the ROI of the image and independent of the ROI's size and location to provide a robust description for clinical use. With the requirement for the feature set, a series of radiomics features were designed based on the image greyscale and texture, including 16 greyscale histogram features, 60 texture features, and 304 wavelet features.
Before feature extraction, the area inside the ROI where the feature extracted was min-max normalized into 0-255 to remove bias, scaling factors of the effect of different imaging parameters. To avoid the effect of outliers, we refer to Collewet's work [11] which calculates the maximum and minimum values after removing outliers for min-max normalization of images.
For the greyscale histogram features and texture features, we refer to the feature definitions as described by the Imaging Biomarker Standardization Initiative (IBSI) [12]. The bin-width is set to 1 to maintain detailed texture information. The 304 wavelet features were obtained by extracting 16 greyscale histogram features and 60 texture features separately on four components first-level decomposition (approximate, horizontal, vertical, and diagonal) of the original image's wavelet transform. We adopted the Daubechies wavelets 5 (db5) transform.
The extracted features have different value ranges, which will affect feature selection and modelling. In this study, we performed the min-max normalization on the extracted raw features to ensure effective selection and training in the following modelling process. Note that the maximum and minimum in the normalization are calculated from the training set and also used for normalization in validation and test sets. It is reasonable as the maximum and minimum of test samples are unseen in practice.
In addition, we used a priori clinical knowledge to improve the feature set's descriptive ability by adding two Fig. 2 The flowchart of the selection process of the study population. Pregnant women who met the following criteria were enrolled in the study: (1) singleton pregnancy; (2) those with complete medical information who had undergone maternity examination and subsequent delivery in our hospital; (3) fetuses with no known congenital malformation or chromosomal abnormality; (4) those with no diabetes before pregnancy; and (5) those who had not been prescribed steroids before delivery. Finally, a total of 210 singleton pregnant women with 210 fetal lung ultrasound images were enrolled in our study and randomly divided into the training set and test set at a ratio of approximately 8:2. It is worth noting that we kept the same proportion of NRM and normal in both sets. The training set contains 167 images, of which 40 are NRM and 127 are normal. The test set contains 43 images, of which 11 are NRM and 32 are normal clinical features, GA and GDM, with are readily available and strongly correlated with NRM in relevant studies. The summary of the feature set is listed in Table 1, and the details of the features are as follows.
The feature selection method was used to select the most useful radiomics features as inputs of the classification model. We ranked feature importance to selected features by permuting out-of-bag data feature of random forest trees. If a feature is influential, permuting its values would influence the model error testing with out-ofbag data. The more important a feature is, the greater its influence will be [20].

Model building
The class imbalance and small dataset will lead to overfitting and classification bias. In this study, we designed and evaluate performance of common methods on our imbalanced and few-shot dataset. The motivation of the comparison experiment is to compare the effectiveness of different modelling approaches on the imbalanced small dataset from both data and model perspectives.
To address the imbalance problem, we introduced a data balancing method, Adaptive Synthetic (ADASYN) [21]. ADASYN generates minority class pseudo-samples by linear interpolation to balance the dataset. Classifiers can then be trained on the balanced dataset without the effect of the class imbalance. This has been shown to be effective in some studies, but there is a lack of research on the small medical image datasets. We also introduced a classifier model cost-sensitive support vector machine (SVM) [22], which addresses the class imbalance problem by increasing the model's misclassification cost of the minority classes.
As for the problem of the low generalizability of modelling on small datasets, we introduced the Adaptive boosting (AdaBoost) [23], which improves the generalizability by combining weak base learners and bootstrap sampling with the AdaBoost algorithm.
Moreover, we introduced the RUSBoost and SMOTE-Boost, which are ensemble learning methods based on AdaBoost with undersampling and oversampling, respectively, addressing both low generalizability and imbalance problems simultaneously.
In our comparative experiments, cost-sensitive SVM, SMOTEBoost, RUSBoost were applied to the original imbalanced dataset. SVM and AdaBoost were applied to the original imbalanced dataset and the balanced data balanced with ADASYN, respectively, to test the effectiveness of the data balancing method.
All classifier parameters were tuned with bootstrap fivefold cross-validation, and the decision tree was

Statistical analysis
Descriptive statistics are summarized as the mean ± standard deviation (mean ± std). Univariate analyses were performed on each feature of the training set using the t-test for 380 continuous radiomics features and the χ 2 test for two categorical clinical features. A p value < 0.05 indicated a significant difference.
Since our data is class imbalanced, the metrics used to evaluate the model's classification performance should be sensitive to class imbalance. The metrics we introduced in this study are the balanced accuracy (bACC), the area under the receiver operating characteristic (ROC) curve (AUC), the sensitivity (SENS), the specificity (SPEC), the positive predictive value (PPV) and negative predictive value (NPV). All methods were performed with MAT-LAB R2019b (MathWorks, Inc., Natick, MA, USA). The image processing toolbox and machine learning toolbox were applied in feature extraction and model building. Table 1 The summary of the feature set we designed for predicting NRM (1) Clinical information: GA and GDM are strongly correlated with NRM [7,8]. GA was determined by the last menstrual period and verified by first-trimester dating ultrasound (crown-rump length). According to the presence of GDM during pregnancy, these pregnant women were divided into Yes and No groups (2) Greyscale histogram features: Describe the greyscale and histogram distribution of the ROI in fetal lung ultrasound images [13] (3) Textural features: Describe detailed, invisible greyscale changes and associations in fetal lung ultrasound images (a) ROI textural features: Describe the distribution of greyscale inside the ROI [14] (b) Grey-level co-occurrence matrix (GLCM) textural features: Describe the specified spatial linear relationship between the frequencies of two greyscale intensities inside the ROI [15] (c) Grey-level run-length matrix (GLRLM) textural features: Describe the roughness of the texture by calculating the run-length of the collinear image pixels of the same grey-level in a given direction inside the ROI [16,17] (d) Grey-level size zone matrix (GLSZM) textural features: Describe the uniformity of the small pixel population of the ROI [15,18] (e) Neighbourhood grey-tone difference matrix (NGTDM) textural features: Describe the difference between the greyscale of each image pixel and the greyscale of its neighbours inside the ROI [19] (4) Wavelet features: Describe information that is not directly reflected by the greyscale and textural features of the original image. Every fetal lung ultrasound image was decomposed into four components: approximate, horizontal, vertical, and diagonal by wavelet transform (first-level decomposition). Then, the 76 features mentioned above were extracted separately on each component. Finally, a total of 304 wavelet features were extracted Approximate, horizontal, vertical, and diagonal were decomposed from the image by wavelet transform (first-level decomposition) GA: gestational age, GDM: gestational diabetes mellitus, ROI: region of interest (fetal lung region), SD: standard deviation, GLCM: grey-level co-occurrence matrix, GLRLM: grey-level run-length matrix, GLSZM: grey-level size zone matrix, NGTDM: neighbourhood grey-tone difference matrix

Patient characteristics
A summary of the characteristics of the training set and test set is listed in Table 2. The imbalance ratio between the number of normal and NRM was close to 3:1. There is a significant difference (p value < 0.005) in both GA and GDM between NRM and normal controls, which is the statistical basis for using GA and GDM as clinical features. Moreover, there is a significant difference (p value < 0.0001) in birth weight between the two groups.

Univariate analysis and feature selection
Univariate analysis was performed on the training set.
The results show that 32 of all 380 radiomics features were highly correlated with NRM (p value < 0.05). The feature selection method was used to select the most useful features for modelling. The final 10 features with the highest feature's importance score were selected. The feature names and descriptive statistics of the 10 radiomics features selected are listed in Table 3. Figure 4 shows the box plots of the top 3 features with a  high correlation between the normal and NRM fetal lung ultrasound images of the 10 selected features. Although there are significant differences in the means, the standard deviations overlap, making the classification task difficult and requires a more powerful multivariate classification method.

Model construction and evaluation
The classification performance of different modelling methods is illustrated in Table 4. The inputs to the model were 2 clinical features and 10 radiomics features, as shown in Table 3.
On the original imbalanced dataset, the SVM has a severe class bias, testing with a SPEC of 1.00 but a SENS of only 0.36. The cost-sensitive SVM model obtains a small increase in SENS of 0.36-0.45 but is accompanied by a large decrease in SPEC of 1.00-0.84. The AdaBoost shows a better performance than the cost-sensitive SVM, while SPEC decreased by only 0.09.
Training the SVM and AdaBoost models on the balanced dataset resulted in a substantial increase in SENS compared to the results from the original imbalanced dataset, both reaching 0.73, but correspondingly, a substantial decrease in SPEC, from 1.00 to 0.78 and from 0.91 to 0.75, respectively.
The SMOTEBoost's SENS is equal to that of the Ada-Boost trained on the original imbalanced dataset, but its SPEC is only 0.88, lower than AdaBoost's 0.91. RUSBoost shows better classification performance than other methods, with a SENS of 0.72, a SPEC of 0.82, a bACC of 0.77, and an AUC of 0.83 by bootstrap validation in the training set. Moreover, the model has excellent classification performance with a SENS of 0.82, a SPEC of 0.84, a bACC of 0.83, and an AUC of 0.87, in the test set.

The effect of our feature set
The verification result of feature set effectiveness is illustrated in Table 5. In the test set, the model built with the feature of GA alone has a high SPEC of 0.97 and a low SENS of 0.45. For the combination of GA and GDM, there is an increase in SENS from 0.45 to 0.69, but SPEC decreases by 0.34. The best classification performance can be achieved with our designed feature set, including radiomics features, GA, and GDM.
Since most areas inside the fetal lung are homogeneous, the greyscale histogram features and texture features have the stability for small changes of the location or shape of the ROI in the homogeneous region. As a validation measure of the stability of the feature set, each image is additionally delineated with a square ROI in addition to the irregular ROI. The square ROI was outlined within the fetal lung region, as shown in Fig. 5. As illustrated in Table 5, the irregular ROI and square ROI achieved similar performance outcomes. There is only a difference of 0.04 in bACC, 0.02 in AUC, 0.09 in SENS, 0.02 in SPEC, 0.01 in PPV, and 0.03 in NPV on the test set. These results   Table 5 The classification performance of RUSBoost with different features on the original imbalanced few-shot dataset demonstrate our texture feature-based model has the stability for the shape and location of the ROI.
As shown in Fig. 6a and b, clinical models have severe class bias, leading to low sensitivity. In the model using GA, only 45% of the NRM samples were correctly diagnosed. In the model using GA and GDM, only 64% of the NRM samples were correctly diagnosed. Model using both clinical data and radiomics features achieves the best diagnostic performance, as illustrated in Fig. 6c and d. There are 82% and 91% of NRM samples correctly diagnosed, respectively, while less than 20% of Normal samples were misdiagnosed as NRM. Furthermore, in Fig. 6e, tests using different ROIs achieved similar classification performance and ROC curves. It is worth to be noted that the classifier using GA only or GA and GDM is biased towards the normal class, while its AUC is higher due to the imbalance of the dataset, which is the limitation of AUC in the classification performance evaluation of imbalanced datasets.

Discussion
Prenatal prediction and therapy for NRM are an effective way to improve the life quality of NRM newborns. There is a consensus to study non-invasive methods to predict NRM using fetal lung ultrasound images. However, there is no unified feature set for the prenatal prediction of NRM, and the dataset collected in medical practice is often imbalanced and few-shot. To tackle these challenges, our study focuses on the design of feature sets with a strong representation of fetal lung ultrasound images and effective classification modelling methods.

The feature set for predicting NRM
Considering that the fetal lung in the ultrasound image is homogeneous, we designed radiomics features based on the image greyscale and texture, which can avoid the influence of the ROI's size and location on feature extraction. For each fetus, 380 radiomics features were extracted from the fetal lung region of ultrasound images, and 10 of them were selected for modelling. The energy of horizontal, which characterizes the brightness in the horizontal direction of the wavelet transform, has a mean value of 1400 in normal fetal lungs, which is higher than 1200 in NRM fetal lungs. The high grey-level run emphasis of the normal fetal lung has a higher mean value of 298 than the NRM fetal lungs of 279, which means that the fetal lung region is more homogeneous in normal fetal lungs than NRM fetal lungs. For the long-run high grey-level emphasis of vertical feature, the mean value of the normal fetal lungs is 432, which is smaller than that of the NRM fetal lung of 462, which suggests that the fetal lung region is more delicate in normal fetal lungs than NRM fetal lungs. It can be concluded that the lung region of normal fetuses has a more delicate and homogeneous texture on the ultrasound image and is brighter than that of NRM fetuses. The features we selected were also stable. The radiomics features extracted from the irregular ROI and the square ROI achieved similar performance outcomes with the same modelling method (the difference was less than 0.09 for each measure), as shown in Table 5.
In addition to radiomics features, GA and GDM, two clinical features identified to be strongly correlated with NRM, were also added to the feature set. Newborns with a low GA have a significantly increased risk of NRM due to immature lungs, and GDM in pregnant women leads to delayed lung development in the fetus, increasing the risk of NRM. As shown in Table 5, with the addition of radiomics features, the SPEC and SENS were both significantly improved. In conclusion, the feature set designed in this study that includes radiomics features, GA, and GDM is more effective for NRM prediction and is not affected by the size or location of the ROI.

Model development
Imbalance and few-shot are inevitable in medical datasets, which pose many challenges for modelling. As shown in Table 4, there is a large class bias and poor classification performance on small imbalanced datasets using the conventional SVM. The methods of data augmentation, cost-sensitive learning, and ensemble learning are commonly used on imbalanced few-shot datasets. Here, these methods were performed and analysed to find the most effective modelling method.
The cost-sensitive SVM and AdaBoost show an improvement of 0.21 and 0.36 in SENS compared with the SVM in Table 4, but there is a decrease of 0.10 and 0.15 in SPEC in the training set. As for the cost-sensitive SVM, since there are few NRM samples, a higher cost is needed, which makes the compression of boundaries more severe, and the classifier tends to sacrifice multiple normal samples to ensure that one NRM sample is correct with a sharp decline in the generalization performance. The AdaBoost has a better performance than cost-sensitive SVM, with a SENS of 0.68 and a SPEC of 0.84. The ensemble learning method's lower overfitting allows it to exhibit a better generalization performance than the individual learner SVM or the cost-sensitive SVM.
Training on the balanced training set augmented with ADASYN, the SVM and AdaBoost does not show a significant improvement compared to training on the original imbalanced dataset, with an increase of 0.35 and 0.23 in SENS and a decrease of 0.25 and 0.26 in SPEC. For better illustration, we used t-SNE [24] to visualize the sample distribution of the original dataset and the balanced  Fig. 7, there is aliasing between normal and NRM samples, making it difficult to classify. By generating pseudo-samples around the minority class, ADASYN leads the classifier to draw more attention to the NRM samples. However, it also exacerbates aliasing and results in poor classification performance. The generated pseudo-samples also tend to introduce plenty of noise, especially when the aliasing of samples is terrible. The data augmentation method is not appropriate in our application.
The SENS of SMOTEBoost is still low because aliasing in the dataset makes SMOTE introducing considerable noise. RUSBoost shows better classification performance than other methods. It reaches a SENS of 0.72, a SPEC of 0.82, a bACC of 0.77, and an AUC of 0.83 in the training set and a SENS of 0.82, a SPEC of 0.84, a bACC of 0.83, and an AUC of 0.87 in the test set. RUSBoost can reduce overfitting and improve the classification model's generalization ability by combining weak base learners and bootstrap sampling with the AdaBoost algorithm. The input dataset of each learner is obtained by bootstrap undersampling, which enriches the sample distribution that the base learners have learned and reduce the effects of imbalance. The drawback of massive sample loss of undersampling in a small dataset is compensated by ensemble learning, while random undersampling ensures that the samples are real and avoids the noise that caused by data augmentation.

NRM prediction model
In this study, the non-invasive approach we proposed based on the Asian population utilizes a much smaller data set to establish similar prediction performance to previously reported methods. It makes it possible to safely and widely perform the NRM prenatal screening and intervention, which has an excellent prediction performance with a bACC of 0.83, an AUC of 0.87, a SENS of 0.82, a SPEC of 0.84, a PPV of 0.64, and an NPV of 0.93. A comparison of our method with some of the existing reported methods is illustrated in Table 6, which shows that our diagnostic performance approximates to that of invasive amniocentesis tests. Compared our study to Bonet' work [5], in which only fetal lung ultrasound images were used for NRM prediction, our method utilizes less than 1/2 of the training set size. There is 0.02 higher in bACC, 0.07 higher in SENS, 0.02 higher in NPV, the same PPV and only 0.04 lower in SPEC, with square ROI. Compared to quantusFLM [5] reported in a multicenter study, our study uses less than 1/4 of the training set size and is 0.05 higher in bACC, 0.17 higher in SENS, 0.12 higher in PPV, the same NPV and only 0.07 lower in SPEC, with square ROI. Our model based on the Asian population utilizes a much smaller data set to establish better prediction performance to previously reported methods.
Our model was built and tested in female and male fetuses with GAs ranging from 28.0 to 38.6 weeks. The experimental results show that our model has effective predictive performance in this scope. Moreover, our method has a degree of stability for the ROI's location and shape, allowing the model to be widely used.

Strengths and limitation
Our study has three strengths. First, to the best of our knowledge, this is the first study to incorporate GDM, GA, and radiomics features for NRM prenatal prediction. The diagnostic efficacy of the model we developed based on fetal lung ultrasound images in this study reached which are similar to those of many previous reports of amniocentesis [26][27][28]. Second, we developed a practical modelling approach to address the problems of imbalance and few-shot. RUSBoost shows excellent performance and generalization capabilities compared with the other methods used for comparison in this study. Third, we used radiomics features based on the image greyscale and texture for the prenatal prediction of NRM, whose performance is efficient and robust, without the influences of ROI selection results.
As a retrospective study, this study has some limitations that should be acknowledged. Clinical outcome of the fetuses depends on several clinical factors. In addition to GA and GDM, more clinical information could be studied for its correlation with fetal lung development and used for NRM prediction. A comparative study on the right and left lungs to verify the generalizability of the method between the right and left lungs is also needed. Furthermore, for applying the proposed method to a clinical application, a robust validation technique is required to demonstrate the stability of our model on the multicenter dataset from different machines and different operators. The applicable fetal population (different gestational week groups or sexes) is also needed to be investigated in our upcoming multicenter experiment.
In order to answer these questions and overcome these limitations, a multicenter study is underway. Additional fetal ultrasound images from multicenter will be included in our study for robust validation.

Conclusion
In conclusion, our results show that the radiomics features of the fetal lung can be used as an efficient and robust biomarker for NRM prediction. The diagnostic efficacy of the model based on fetal lung ultrasound images, which incorporates routinely available clinical characteristics GA and GDM and radiomics features, achieves a better clinical outcome, which might afford a non-invasive tool that is easy to implement in NRM prediction.

Funding
This research was supported by the National Natural Science Foundation of China (Grants 61871135, 81627804 and 81830058) and the Science and Technology Commission of Shanghai Municipality (Grants 20DZ1100104). The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Availability of data and materials
The datasets generated and analysed during the current study are not publicly available due to the data being also a part of an ongoing study but are available from the corresponding author on reasonable request.