Application of preoperative ultrasound features combined with clinical factors in predicting HER2-positive subtype (non-luminal) breast cancer

Background Human epidermal growth factor receptor2+ subtype breast cancer has a high degree of malignancy and a poor prognosis. The aim of this study is to develop a prediction model for the human epidermal growth factor receptor2+ subtype (non-luminal) of breast cancer based on the clinical and ultrasound features related with estrogen receptor, progesterone receptor, and human epidermal growth factor receptor2. Methods We collected clinical data and reviewed preoperative ultrasound images of enrolled breast cancers from September 2017 to August 2020. We divided the data into in three groups as follows. Group I: estrogen receptor ± , Group II: progesterone receptor ± and Group III: human epidermal growth factor receptor2 ± . Univariate and multivariate logistic regression analyses were used to analyze the clinical and ultrasound features related with biomarkers among these groups. A model to predict human epidermal growth factor receptor2+ subtype was then developed based on the results of multivariate regression analyses, and the efficacy was evaluated using the area under receiver operating characteristic curve, accuracy, sensitivity, specificity. Results The human epidermal growth factor receptor2+ subtype accounted for 138 cases (11.8%) in the training set and 51 cases (10.1%) in the test set. In the multivariate regression analysis, age ≤ 50 years was an independent predictor of progesterone receptor + (p = 0.007), and posterior enhancement was a negative predictor of progesterone receptor + (p = 0.013) in Group II; palpable axillary lymph node, round, irregular shape and calcifications were independent predictors of the positivity for human epidermal growth factor receptor-2 in Group III (p = 0.001, p = 0.007, p = 0.010, p < 0.001, respectively). In Group I, shape was the only factor related to estrogen receptor status in the univariate analysis (p < 0.05). The area under receiver operating characteristic curve, accuracy, sensitivity, specificity of the model to predict human epidermal growth factor receptor2+ subtype breast cancer was 0.697, 60.14%, 72.46%, 58.49% and 0.725, 72.06%, 64.71%, 72.89% in the training and test sets, respectively. Conclusions Our study established a model to predict the human epidermal growth factor receptor2-positive subtype with moderate performance. And the results demonstrated that clinical and ultrasound features were significantly associated with biomarkers.


Introduction
Breast cancer is a highly heterogenous tumor that has recently become the most common malignant tumor worldwide [1,2]. The 2013 St. Gallen Consensus classified breast cancer into five subtypes according to the biomarker expression (estrogen receptor [ER], progesterone receptor [PR], human epidermal growth factor receptor-2 [HER2], and Ki67) evaluated using immunohistochemistry (IHC) [3]. Different breast cancer subtypes and biomarker expression of breast cancer are important prognostic factors [4,5].
HER2+ breast cancer accounts for about 15-20% of all breast cancers [6]. HER2+ subtype (non-luminal) breast cancer is defined as ER-, PR-, HER2+, and has a high degree of malignancy and a poor prognosis, with a heterogeneous clinical and biological presentation. Hereafter, HER2+ subtype refers to HER2+ subtype (non-luminal). Chromosome 17 polyploidy, spatial and temporal heterogeneity of tumors lead to inaccurate assessment of HER2 status [7]. And HER2 score 2 + on IHC require additional fluorescent in-situ hybridization (FISH) or chromogenic in-situ hybridization (CISH) testing to determine their status [7,8]. They adversely affect the diagnosis and treatment of HER2+ breast cancer. Currently, the diagnosis of breast cancer subtypes and biomarkers of breast cancer requires preoperative core-needle or postoperative pathology, which is an invasive and time-consuming process. If these could be obtained preoperatively and noninvasively, it would make the treatment process more timely, effective and precise.
In Asian women, ultrasound (US) is the method of choice for screening breast lesions since they have denser breast tissue and are relatively younger at the time of diagnosis [9][10][11][12]. Previous studies have found correlations between biomarkers (ER, PR, and HER2) and US features [13][14][15]. Xu et al. [15] found that the longest/ shortest size ratio (> 1), spiculate margin, and echo halo were related to ER and PR positivity. Additionally, Liu et al. [14] suggested that HER2 positivity was related to tumor blood supply and microcalcification. However, these studies utilized a relatively small number of cases and have correlated the relevant features directly with biomarkers (e.g., ER ± , regardless of the status of PR, HER2). Thus, all three biomarkers had an impact on the US findings of breast cancer. When two biomarkers were known to be in the same status, studying the relationship between US features and one biomarker may be a feasible approach.
Previous studies [2,16,17] revealed a correlation between the HER2+ subtype and US features, such as posterior enhancement or calcifications. However, many studies on conventional US features and breast cancer subtypes have been limited to correlation exploration [16][17][18][19], and fewer studies have built predictive models. In contrast to previous studies [2,16,17] that directly correlated breast cancer subtypes with relevant features, we aimed to build a predictive model for the HER2+ subtype using relevant features of three biomarkers and evaluate its performance.
The purpose of this study was to evaluate the associated clinical and US features of ER, PR, and HER2 when two biomarkers were in known same status, and then develop a predictive model for the HER2+ subtype. To the best of our knowledge, this study is the first to identify clinical and US features associated with ER, PR, and HER-2 status when two biomarkers were known to be in the same status.

Study population
This study included patients who underwent preoperative breast US in projects funded by the National Natural Science Foundation from September 2017 to August 2020. Patients with the following characteristics were included: (1) evident lesions on preoperative US images, and multiple US images of breast tumors; (2) breast cancers diagnosed by core needle biopsy or surgical pathology; and (3) the absence of treatment, prior to US. Patients with the following characteristics were excluded: (1) any treatment, such as radiotherapy, before US examination; (2) invisible or obscure lesions on the US examinations; or (3) incomplete clinical data.
This retrospective study was approved by the institutional ethics committee of our center. The requirement for informed consent was waived due to the retrospective nature of the study. We randomized the enrolled data into a training set (1169 cases) and a test set (501 cases) by a 7:3 ratio.
As mentioned previously, no article considered the possible interaction of biomarkers on the ultrasound feature Conclusions: Our study established a model to predict the human epidermal growth factor receptor2-positive subtype with moderate performance. And the results demonstrated that clinical and ultrasound features were significantly associated with biomarkers. Keywords: Breast cancer, Estrogen receptor, Progesterone receptor, Human epidermal growth factor receptor-2, Ultrasound

Pathology and immunohistochemistry analysis
Data regarding the age, BMI (body mass index), menopause, palpable axillary lymph node (ALN), breast cancer family history, US ALN, pathological type, histological grade, and ER, PR, HER2 status were collected from the medical record system. The positivity of ER or PR is defined as ≥ 1% on IHC staining [3]. The positivity of HER2 is defined by any of following test results: (1) IHC 3 + , or complete and strong member staining of > 30% of invasive cancer cells; (2) FISH measurement of HER2/CEP17 ratio of > 2.2/2.0; and (3) CISH of a HER2 gene copy number of > 6.0 signals per nucleus [8].

US images assessment
Most breast US images were obtained using the Super-Sonica Aixplorer US scanner (SuperSonic Imagine S.A., Aix-en-Provence, France) equipped with a 7-15 MHz linear array transducer. Other breast US images were obtained using the Mindray Resona 5S US scanner (Shenzhen Mindray Bio-Medical Electronics Co., Ltd., Shenzhen, China) equipped with a 5-14 MHz linear array transducer. The imaging acquisition standards were as follows: 12 conventional US images were captured starting with the largest cross-section of the tumor at equal intervals in a 180° clockwise range. Suspicious breast lesions were measured at the maximal diameter on US images. The US features were assessed according to the Breast Imaging-Reporting and Data System [20,21], including shape, orientation, margins, boundary, echo pattern, calcification, and posterior acoustic features. Additionally, vascularity was assessed according to Adler's index (0, I, II, or III) [22]. All US images were reviewed by two US specialists who were blinded to the patients' pathological results. If the two reviewers disagreed, a consensus was reached after their discussion.

Statistical analysis
Data analysis was performed using SPSS version 20 (IBM Corp, Armonk, NY, USA). The data were randomly divided into the training and test sets using random numbers. Normally distributed data, assessed using the Kolmogorov-Smirnov test, were expressed as mean ± standard deviation. And non-normally distributed data were expressed as median with interquartile range. Categorical variables were analyzed using the Chisquare or Fisher's exact test. Multivariate logistic regression analysis (backward stepwise) was used to study the association between clinical and US features and the three biomarkers. The model was built using R software (version 4.1.0, R Foundation for Statistical Computing, Vienna, Austria) based on the independent predictors from the multivariate regression analysis, and the diagnostic efficacy of the model was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy (ACC), sensitivity (SENS) and specificity (SPEC). Statistical significance was set at P < 0.05.

Clinical and pathological characteristics
The study enrolled 1670 breast cancer cases (1662 female patients) with a mean age of 53.1 years (range 22-95 years) and mean tumor size of 21.0 mm (range 6-68 mm). The training set had a mean age of 53 years (range 22-95 years) and mean tumor size of 20 mm (range 5-68 mm), while the test set had a mean age of 53 years (range 28-87 years) and mean tumor size of 20 mm (range 6-56 mm). The two sets were not statistically different at the baseline of clinical and US features (excluding calcifications). All details are shown in Table 1 and Fig. 1.

Relationship among biomarkers, clinical and US features in group I-III from the training set
The following groups were formed as previously described, Group I: ER + vs ER-(PR and HER2 negative), Group II: PR+ vs PR− (ER+, HER2−), and Group III: HER2+ vs HER2− (ER and PR negative).  In Group I, shape was the only factor related to ER status in the univariate analysis (p < 0.05). Oval shape was more frequent in ER-; additionally, older patients (age > 50 years) were more likely to express ER + (Table 2). No multivariate regression analysis was performed because there was only one significant factor in the univariate analysis.
In Group II, age, menopause status and posterior acoustic features were related to PR status in the univariate analysis (p < 0.05). Age ≤ 50 years was an independent predictor of PR+ (OR 2.204, 95% CI 1.238-3.924, p = 0.007), and younger patients were 2.204 more likely than older patients to express PR+; additionally, posterior enhancement was a negative predictor of PR+ (OR 0.418, 95% CI 0.211-0.830, p = 0.013), and tumors with posterior enhancement were 0.418 times more likely to exhibit PR+ than tumors with no change in posterior echogenicity (Table 3).
All details are illustrated in Tables 2, 3 and 4.

Performance of prediction model for the HER2+ subtype (non-luminal) in training and test sets
The HER2+ subtype accounted for 138 cases (11.8%) in the training set and 51 cases (10.1%) in the test set.  Fig. 2). The efficacy of the model in the test set was AUC 0.725, ACC 72.06%, SENS 64.71%, SPEC 72.89%; and the best cutoff was 0.1321628 (Fig. 3).

Discussion
Preoperative prediction of biomarkers and breast cancer subtypes can improve the efficiency of the treatment process. It remains unknown that the relationship between clinical and US features and the remaining biomarker, when maintain two of ER, PR, and HER2 in same status. After the matching analysis, the results of this study suggested that, regarding the US features of breast cancers, PR status was mainly reflected by the posterior acoustic features, and HER2 status by the tumor shape and presence of calcifications. None of the US features were found to be independent predictors of ER status when both PR and HER2 were known negative. This study established a predictive model with moderate diagnostic power for predicting the HER2+ subtype. Biomarker status and molecular subtypes play an important role in the clinical management, outcome and prognosis. For example, ER+/HER2− breast cancer has a better prognosis and requires only endocrine therapy, while the HER2+ subtype requires chemotherapy. The relationship between US features and biomarkers of breast cancer was investigated to better distinguish molecular subtypes before surgery. In previous studies of ER+/ER− breast cancer (regardless of PR and HER2 status), Kim et al. [23] suggested that hypoechoic and complex echo patterns were significantly related to ER and PR negativity; the study by Xu et al. showed  an association between ER and PR positivity and echo halo [15]. Inconsistent with previous studies [15,23], we found no US feature to be an independent predictor of ER status in Group I. However, although tumor's shape did not correlate with ER status in the multivariate regression analysis, but it was still evident that ERtumors showed a much higher percentage of oval shapes than ER+ tumors (14.4% vs 1.6% in Group I, 9.9% vs 4.1% in the training set). Contrary to a previous study [24], this study suggested that age was not relate with ER in Group I. Additionally, older patients with breast cancer were more frequent in ER+. Although the age structure of the included cases in this study was generally consistent with the study by Zhu et al. [24] (age < 50 years: 39.5% vs 44.8%), the proportion of ER + of both tumors and younger patients (age < 50 years) in the training set was relatively lower in our study (71.5% vs 78.4%, 74.5% vs 80.3%). This may be the reason for the different results. In Group II, younger age was an independent predictor PR+ (p < 0.005), consistent with the study by Zhu et al. [24]. Besides, this study suggested that younger patients were approximately twice more likely than older patients  to express PR+. Tumors with higher histological grade have faster cell proliferation, increased cell and necrotic components, less fibrous tissue, and increased tissue structure uniformity [25]. Considering these characteristics, sound waves can penetrate the tumor to form a posterior echo enhancement without excessive reflection or attenuation. However, previous studies [23,24,[26][27][28] have suggested that PR+ breast cancers were more frequently associated with low degrees of malignancy and low histological grades, and Xu et al. indicated directly that the internal necrosis was related to PR negativity. In the training set, 67.6% (507/739) of PR+ lesions were histological grade I/II. Therefore, it's reasonable that posterior enhancement was negatively correlated with PR+, and that our findings indicated that tumors with posterior enhancement were about 0.4 times more likely to express PR+ than tumors with no change in posterior echogenicity. HER2 receptors are located in the cell membrane which are involved in the transmission of signals that control normal cell growth and differentiation [29,30]. HER2 overexpression plays a vital role in tumor transformation and tumorigenesis [29]. In Group III, palpable ALN, shape (round, irregular), and calcifications were independent predictors of HER2+. In previous studies, the presence of calcifications on US or mammography were related to HER2+ [23,31], which mainly manifests as pleomorphic and branching calcifications on mammography [31]. Several studies [15,23] suggested that tumor shape was not related to HER2 status; in contrast, our findings suggested that round and irregular shapes were more than two and seven times more likely to appear HER2 positive than oval tumors, respectively. This may only be the relevant in studies that explore the related features of HER2 status in controlled groups. The HER2+ subtype are prone to ALN metastasis (approximately 60% [32]), so it is understandable that our study found that palpable ALN were significantly associated with HER2+.
The HER2+ subtype has a high degree of malignancy and the main pathological type is invasive ductal carcinoma. This study attempted to predict the HER2+ subtype on the basis of the independent predictors of three biomarkers (i.e. age, palpable ALN, posterior acoustic features, calcifications, and shape). Some studies have also shown that the HER2+ subtype was associated with posterior acoustic features, calcifications and Fig. 3 The receiver operating characteristic curve of the predictive model for the HER2 positive subtype in the test set. This figure shows that the model with combined clinical and ultrasound features had moderate predictive power for HER2+ subtype, with an AUC of 0.725 in the test set. HER2 human epidermal growth factor receptor-2, AUC area under the receiver operating characteristic curve age [2,16,17]. The diagnostic efficacy of the model in this study was AUC 0.697 in the training set and AUC 0.725 in the test set. To our best knowledge, regarding aspects on predicting breast cancer subtypes based on conventional US feature models, only the study by Zhang et al. [2] was found to have 87.9% accuracy in predicting the HER2+ subtypes using an ensemble decision method based on clinical and US features. Although the present model had inferior performance, however, the two models defined HER2+ subtypes differently, with the former having a 10% cutoff for ER and PR positivity compared with the currently widely used cutoff of 1%. Therefore, the results of this study may be more in line with the current clinical situation. Besides, the diagnostic efficiency of our model was moderate, suggesting the feasibility of predicting breast cancer subtypes based on the related features of biomarkers and providing an alternative modeling idea for predicting subtypes.
This study has certain limitations. First, it was a retrospective study and the US diagnosis was subjective. However, all enrolled cases met uniform imaging standards and had multiple US images to ensure maximum integrity of US features of breast lesions. Additionally, two radiologists independently reviewed the US images, which reduced the subjectivity to a certain extent. Second, despite the considerable size of the data, they were obtained only from a single center. Therefore, data of breast cancer patients from other centers are needed to increase data objectivity. Third, no new image analysis methods were performed, including radiomics or deep learning. The US images in this study were derived from funded projects and have unified image acquisition standards. Thus, they are suitable for image analysis using radiomics to study the relationship between radiomics features and breast cancer subtypes or biomarkers in breast cancer. Although some research in this area [33,34] has been conducted, some areas could be still improved. However, it is worth noting that data is often affected by uncertainty or inaccuracy. Therefore, it would be necessary to use a fuzzy prediction technique proposed by M Cacciola et al. [35]. We will include this as part of our research in the future.

Conclusions
Our research suggested that PR status was related to posterior acoustic features, and HER2 status to shape and calcifications. These findings may help non-invasively predict the HER2+ subtype and the status of the biomarkers, and provide an alternative modeling idea for predicting subtypes. Perhaps future studies on the correlation between the expression status of ER, PR, and HER2 and imaging features could consider the influence of biomarkers on each other and might try to change the approach of exploration. In summary, the results could help in formulating an initial impression and treatment plan prior to surgery.