A total of three ultrasound devices were chosen for evaluation based on ability for the study team to obtain the device, prior FDA approval for gynecologic imaging, weight less than 2 lb, and cost less than $10,000, making it a reasonable option for POC-US in a low resource setting. No ultrasound company was involved in the design or analysis of the study, and did not have input into the results, scoring or publication. For details of the selected test devices, see “Appendix 1”.
Ethical approval was obtained from the University of Rochester Medical Center institutional review board (Rochester, NY), and research support was obtained from University of Rochester Department of Obstetrics and Gynecology through intradepartmental grant funding application. A total of 60 subjects were prospectively enrolled in a sequential convenience sample. Subjects were women previously scheduled by their primary gynecologist for a pelvic ultrasound for any reason (new complaint or follow-up of prior known pathology) at a University-affiliated clinic with specialization in OBGYN ultrasound. Women were enrolled sequentially from October 1 to 31, 2017 on a volunteer basis with small monetary compensation provided. Women were excluded if they were less than age 18 at the time of the study, non-English speaking, lacked legal competence to make medical decisions, were currently pregnant, or were currently incarcerated. Informed consent was obtained, and subjects underwent a transabdominal-only assessment of GYN organs and pelvis, irrespective of bladder filling, in accordance with AIUM (American Institute of Ultrasound in Medicine) Practice Parameter for the Performance of an Ultrasound Examination of the Female Pelvis [16]. Ultrasound was performed by a gynecologic-ultrasound trained practitioner as ultrasound operator (M.T.). All images were labeled according to anonymous study ID and stored on secure online database.
Overview of study design
Part 1: Comparison of three POC-US devices
Because this was a pilot study, sample size was not powered for statistical significance. The first twenty sequentially-enrolled subjects underwent clinician-performed pelvic POC-US with each of the 3 units (GE Vscan (Vscan), Sonosite Iviz + L38v linear probe (Iviz), Philips Lumify C5-2 curvilinear probe + Samsung Galaxy Tab S2 9.7 with mobile app (Lumify)) in rotating order followed by diagnostic pelvic ultrasound (Dx-US) using one of the following machines: Philips IU22 (Koninklijke Philips N.V, Amsterdam, Netherlands), Medison Accuvix 20 (Samsung Medison, Seoul, South Korea), Voluson E10 (GE Healthcare Ultrasound, Milwaukee, WI, USA) with 4–5 MHz abdominal probe. Clinician-performed POC-US was performed by single author (MT), a third-year resident in Obstetrics and Gynecology, with supervision by RDMS certified sonographer and co-investigator (KS). POC-US was performed prior to reference imaging, therefore reference test results were not available to investigators at the time of POC-US. A diagnostic ultrasound (Dx-US) in accordance with AIUM practice parameters for performance of an ultrasound examination of the female pelvis was then performed by a different OBGYN sonographer (RDMS) who was blinded to any findings on the POC-US [16].
Test images from each POC-US were blindly compared to Dx-US images by two independent sonographer reviewers. Reviewers completed a qualitative assessment for each test image of POC-US image quality and for correlation of POC-US image to Dx-US image by non-validated 5-point Likert scale from 0–5 (0—structure not seen, 1—poor to 5—excellent) for each test image. The following gynecologic structures were scored: uterus, endometrium, cervix, left ovary, right ovary, cul de sac, pathology of interest. Gynecologic structures were considered satisfactorily assessed by a POC-US device if the mean sonographer review score was ≥ 3 for image quality score and ≥ 3 for correlation score. Collapsing Likert scale response into dichotomous neutral/positive (score 3, 4 or 5) or negative (score 1, 2) categories performed to minimize ambiguity and clarify intent of responder to better capture trends in data. Neutral response aggregated with positive response to reduce response bias related to survey satisficing and maintain focus on lack of negative score as primary outcome of interest [17]. Aggregate score for each device was calculated by totaling the number of scans possessing qualitative assessment scores ≥ 3 for all structures. Two-tailed Fisher’s exact tests used to compare proportions of scans with assessment scores ≥ 3 between devices to inform superiority of a single POC-US device.
Qualitative measurements of gynecologic structures by each POC-US device were compared to measurements obtained by Dx-US for uterine volume, endometrial thickness, left and right ovary mean diameter, and pathologic structure mean diameter. For both POC-US and Dx-US, uterine volume was calculated by measuring the maximum length (excluding the cervical component), anteroposterior and transverse diameters of the uterine corpus, and using the formula for the volume of a prolate ellipsoid: V = 0.52 × (L × AP × T) [18]. Endometrial thickness was measured at the thickest part of the endometrium perpendicular to its longitudinal plane in the anteroposterior diameter from echogenic to echogenic border. Ovaries were measured in 3 dimensions (longitudinal, transverse, and anteroposterior diameters) on views obtained in 2 orthogonal planes [16]. Pathologic structures included simple or complex adnexal cysts or myomas. Simple ovarian cysts were defined as > 3 cm, thin and smooth walled, round or oval, anechoic spaces with no flow by means of color Doppler US [19]. Paraovarian and paratubal cysts were considered together with ovarian cysts. Simple ovarian cysts were measured in 3 dimensions (longitudinal, transverse, and anteroposterior diameters) on views obtained in 2 orthogonal planes. Cysts of any size which contained septations, solid or mixed cystic/solid components and were considered complex ovarian cysts and were measured as described above. Myomas were measured in three perpendicular diameters [18]. Endometrial pathology defined as thickened endometrium > 4 mm in a post-menopausal subject [20], fluid filled endometrium or IUD in situ. Endometrial measurements > 4 mm in a pre-menopausal subject were not considered pathology. Absolute mean difference in measurements defined as absolute difference in measurement by POC-US minus Dx-US. P < 0.05 for absolute mean difference considered significant “disagreement” in measurements and one-way analysis of variance performed to inform superiority of a single POC-US device.
A single best POC-US device was then selected by a combination of (1) highest aggregate score for image quality > 3, (2) highest aggregate score for image correlation > 3, and (3) lowest number of “disagreements” in quantitative measurements of individual structures (as defined above), and (4) suitability for field use based on total continuous scan time on one battery charge and ease of use.
Part 2: Diagnostic evaluation of selected POC-US
The selected device then underwent further testing with additional prospective enrollment of forty subjects in a sequential convenience sample. Because this was a pilot study, sample size was not powered for significance, though a larger sample size of 40 was chosen to allow for inclusion of subjects with a variety of gynecologic pathologies, each of which has a baseline low prevalence. Clinician performed POC-US with selected device was performed by single blinded operator (MT) and Dx-US by RDMS. POC-US was again performed prior to reference imaging, therefore results of reference imaging were not available to POC-US operator at the time of image acquisition or interpretation. Similarly, results of POC-US were unknown to RDMS at the time of reference imaging. Interpretation of test images for diagnosis was performed by clinician POC-US operator (MT) at the time of image acquisition. Interpretation of reference imaging for diagnosis was by board-certified maternal–fetal medicine specialists who were blind to results of POC-US. At this research institution, maternal–fetal medicine specialists are also certified by AIUM for interpreting gynecologic ultrasound studies. All indeterminate or missing results from the POC-US were considered false-negative (in those with pathology present on reference imaging) or false-positive (in those with pathology absent on reference imaging).
Field study procedures
A pilot field study of POC-US in a low-resource setting was performed in the remote and mountainous Borgne community on the north coast of Haiti, west of Cap Haitian. The region is accessible only by rough footpaths. The majority of the population lives in extreme poverty and relies on subsistence agriculture, fishing and trade of crops at local/regional markets. There is a low level of educational attainment and limited access to clean water or sanitation and no current access to diagnostic imaging services.
For this prospective field study, 20 total subjects were enrolled in a sequential convenience sample. Because this was a pilot study, sample size was not powered for statistical significance. Subjects were women presenting to a mobile health clinic for any complaint of possible gynecologic etiology. Prior assessment of the perceived women’s health needs in the mobile clinics revealed a need for improved general gynecologic care for benign and routine pathologies, most commonly abnormal uterine bleeding and pelvic pain. Women were approached and enrolled in January, 2018 with the help of Haitian Creole translator. Women were excluded if they were less than age 18 at the time of the study or currently pregnant. Subjects did not receive compensation or incentives for participation. Clinician performed POC-US was performed and interpreted by single author (MT) with supervision by co-investigator (NW) after patient consent.
Because there is no access to imaging capabilities in the region of the field study, no reference imaging was performed for comparison to POC-US imaging for subjects in the field study.
Statistical analysis
Demographic characteristics of the two groups were compared with chi square tests for categorical variables. Distribution of continuous demographic data was established using the Shapiro–Wilk test followed by Mann–Whitney U test for non-parametric data.
Two-tailed Fisher’s exact tests were used to compare proportions of scans with assessment scores ≥ 3 between the three test devices to inform superiority of a single POC-US device. Absolute mean difference in measurements of gynecologic structures (defined as absolute difference in measurement by POC-US minus Dx-US) compared with one-way analysis of variance to inform superiority of a single POC-US device.
Concordance of diagnoses between POC-US and Dx-US was assessed with Cohen’s unweighted kappa coefficient with P value < 0.05 defining agreement of nominal variables exceeding that expected under the null hypothesis. For analysis, subjects with more than one diagnosis were included in multiple categories.
Continuous variables compared by linear regression with a value close to 1 representing high agreement between device measurements. Bland–Altman plots were constructed as a visual representation of the mean difference between single paired measurements by the two methods. The limits of agreement indicated by the dotted lines and calculated as the interval of two standard deviations of the measurement differences on either side of the mean difference (solid line). A priori determination of acceptable limits of agreement was not established. Agreement in measurements defined as 95% of data points (difference between the two measurements) included in the 95% limits of agreement. Sensitivity, specificity, positive predictive value and negative predictive value were also calculated by 2 × 2 contingency table for 1) dichotomous outcome of presence or absence of pathology and 2) stratified by individual diagnoses. All statistical analyses were performed using IBM SPSS Statistics 21.