Study cohort
Before the initiation of this research project, Institutional Review Board (IRB) approval (HUM00098656) was obtained. Patient informed consent was not required given that this was a retrospective investigation. This study included 77 patients presented to the UMHS Department of Radiology for CT imaging for the evaluation of abdominal blunt force trauma between 01/01/2009 and 8/30/2014, as well as those CTs ordered by the Emergency Department. In total, the 77 CT scans comprised 8072 axial CT slices. Average patient age was 41.43 years, with a range of 18–88 years. Of the 77 patients included in this investigation, 34 had evidence of liver trauma and 43 had no evidence of liver parenchymal disruption on contrast-enhanced CT.
All CT scans were acquired in the axial plane using either GE Medical Systems (LightSpeed VCT or Discovery CT750 HD models) or SIEMENS (Emotion 16 model). Trauma protocol CT scans often include both an arterial and portal venous phase to evaluate for both arterial (e.g., aortic) and solid organ injuries. Only the portal venous phase was utilized in this study as this phase is optimal for the detection of hepatic parenchymal injuries.
To generate ground truth for all 77 patients, livers were manually annotated, which meant that the margins of the liver itself were outlined. Next, any liver laceration or hematomas were manually annotated for 34 CT scans with visible liver parenchymal disruption. Each CT scan was manually annotated slice by slice to generate binary masks (i.e., ground truth) for injury and organ. All annotations were verified by a fellowship-trained abdominal radiologist with 5 years of post-training experience (EBS).
Experimental design
The study design for liver trauma segmentation and severity assessment is shown in Fig. 1. First, deep learning-based models were developed to segment both liver organ and trauma regions. Then, to assess the severity of the liver trauma, the automatically segmented regions were processed to measure liver disruption volume and, accordingly, calculate the proportion of the liver tissue affected by those injuries (i.e., LDI).
Liver segmentation
Fig. 2 demonstrates a high-level overview of the proposed liver segmentation method.
With a contrast-enhanced CT scan, we first employed a U-net model [28] to generate the initial liver mask (see Additional file 1: Method Section for the specifications of the U-net model). U-net is the most widely used deep convolutional neural networks model for biomedical image segmentation tasks and was introduced by Ronneberger et al.[28]. In the proposed model, data augmentation was performed by rotating, re-scaling, and translating the images to enhance the training dataset. Next, the post-processing module transformed the volumetric masks from the U-net model into the final segmentation map. To that end, the initial mask was filtered using 3D Gaussian kernel smoothing to achieve spatial coherency and smooth binary mask contours according to the neighboring pixels. Finally, morphological operations were used to remove small, sparse regions; fill the holes in axial planes; and exclude any region that was not connected to the largest 3D connected component.
Liver disruption segmentation
As shown in Fig. 3, a second U-net backbone model was trained to segment the liver trauma regions (see Additional file 1: Method Section for the specifications of the U-net model). The post-processing module comprises the volumetric reconstruction of the U-net output, during which human domain knowledge regarding the location and intensity distribution of liver trauma was integrated into the model. It is noteworthy that the domain knowledge about location and intensity of liver trauma is incorporated into the pipeline during the model development phase. While testing the model, this information is used to automatically post-process the initial segmented region.
Considering that trauma regions are within the liver parenchyma, if more than 50% of the initial segmented trauma mask fell outside the segmented liver, the region would be excluded.
Pre-existing conditions such as fatty livers (Fig. 4b) or congestive hepatopathy (Fig. 4c) lead to the different representation of non-trauma liver parenchyma on CT scans [29,30,31]. In theory, these pre-existing conditions could cause the U-net model to falsely detect trauma given the presence of low-attenuation of the parenchyma at baseline (Fig. 4a). To exclude regions falsely segmented as trauma (e.g., part of the normal liver parenchymal), two intensity distributions were generated, corresponding to: (1) pixels of the CT image segmented as the liver, and (2) pixels of the CT image segmented as liver trauma (Fig. 4). Next, the means of these two distributions were compared using a two- ample t-test. If the test statistic value was less than a fixed threshold, we concluded that the two intensity distributions were from the same texture and thus the segmented trauma region was part of the non-trauma liver parenchyma. Correspondingly, these false positive components were excluded from the segmentation using the intensity distribution.
Next, the 3D Chan-Vese active contour model (ACM) [32] was used to iteratively evolve the boundary of the initial segmentation according to local intensity and spatial coherence. The energy function F (s1, s2, S) was defined as
$$\begin{aligned} F(s_{1} ,s_{2} ,S) & = \mu \cdot A(S) + \nu \cdot V(S) \\ & + \lambda_1 \int_{\mathrm{inside}(S)\ } |I(x,y,z) - s_1|^2 \,dx\,dy\,dz \\ & + \lambda_2 \int_{\mathrm{outside}(S)} |I(x,y,z) - s_2|^2 \,dx\,dy\,dz \\ \end{aligned}$$
(1)
where S is the current surface, and s1 and s2 respectively correspond to the average intensities inside and outside the surface S. I(x, y, z) denotes the intensity value of a pixel at the (x, y, z) coordinate. Moreover, A(.) and V (.) calculate the area and volume of a surface respectively. In Eq. (1), parameters µ, v, λ1, and λ2 are constants. Following the Chan-Vese paper, parameters λ1 and λ2 were set to 1. The parameter µ, which specifies the degree of smoothness of the segmented region, was set to 0.1 based on prior work on an independent medical image processing problem [33]. Finally, the parameter v controls contraction bias, which specifies the tendency of the active contour to grow outward. This parameter was determined using a grid search. To evolve the contour, at each iteration a Sparse-Field level-set method, similar to the one proposed in Whitaker et al. [34] was implemented. After each iteration, the mask was modified to exclude the added pixels that fell outside the automatically segmented liver. Finally, morphological operations were applied to remove small, sparse regions and fill the holes in the axial plane. The effect of the post-processing step is analyzed in the result section.
Liver disruption involvement measurement
After segmenting both liver and trauma regions (when present) as binary masks, the volumes were estimated according to respective pixel maps and the unit pixel volumes (i.e., number of pixels from the binary mask × unit pixel volume). The unit pixel volume was calculated using slice spacing and pixel spacing values extracted from CT scan metadata.
LDI was then estimated as
$${\text{LDI}}(\% ) = \frac{{\hat{V} (trauma)}}{{\hat{V} (r)}} \times 100$$
(2)
where \(\hat{V}(\dot)\) corresponds to the estimated volume of a segmented region. For patients with no detected traumatic injury to the liver, the trauma region volume was set to zero.
Statistical analysis
A comprehensive evaluation of the segmentation model’s performance was performed on the validation sets using Dice similarity coefficient, recall, precision, Relative Volume Difference (RVD), and Volumetric Overlap Error (VOE). RVD and VOE error measures were calculated according to definition from Heimann et al. [35] as
$$\begin{aligned} {\text{RVD}} & = \frac{|S| - |GT|}{{|GT|}} \\ {\text{VOE}} & = 1 - \frac{|S\,\cap\,GT|}{|S\,\cap\,GT|}, \\ \end{aligned}$$
(3)
where GT and S correspond to the ground truth and segmented masks, respectively, while |.| |.| computes the number of pixels in the corresponding mask.
To measure the variability in the LDI estimates, linear regression analysis was performed in which the computed and reference LDI measurements were plotted against each other. The linear regression relation between the two measures was then calculated. Moreover, to better perceive the algorithm’s agreement with the ground truth and potential systematic errors, a Bland-Altman analysis was employed [36, 37].