Using clinical dataset from real scoliotic patients is important for this study because topology of the spine on MR images varies a lot from normal to scoliotic patient adding important challenge in the segmentation process. In this context, it is commonly accepted to set manual segmentation as gold standard. Aside from the Dice Similarity Coefficient (DSC) used to test the validation criteria, the calculation of volume is also part of the evaluation. The similarity coefficient has the advantage of taking into account the spatial dependency, which is not the case when reporting volumes only. Conversely, although geometrically intuitive, the DSC lacks information about the type of segmentation error, namely whether over- or under-segmentation occurs. By taking into account both metrics (DSC and volume), the current study provides a comprehensive quantitative evaluation of the automatic segmentation applied to a clinical dataset composed of 27 intervertebral disks coming from nine scoliotic patients.
From the comparison of the automatic segmentation with manual segmentation, we find that the proposed algorithm yields to spatial volumes that are similar to the gold standard, since the average 3D DSC values of 0.79 for the 3D MEDIC and 0.75 for the Spin Echo (Table 2) are higher than the 0.7 threshold for good segmentation performance.
No other study on segmentation based on region detection for 3D reconstruction of intervertebral disk of scoliotic patient exists. However Michopoulou et al [. have an automatic segmentation for intervertebral disk based on a priori shape information and fuzzy c-mean algorithm. Their segmentation procedure is applied on the mid-sagittal image to evaluate if the disk is degenerated. They have evaluated their segmentation accuracy using a 2D DSC value on the mid-sagittal image. Hence we can partly compare our results with this study. On Figure 6 of the current study, the 2D DSC value for the Spin Echo at the mid-sagittal image (image 7) is 0.9 and 0.88 for the 3D MEDIC at the mid-sagittal level. This is comparable to the results obtained by Michopoulou. Indeed they obtained 0.88 (for the elastic-Atlas-RFCM method), 0.84 (for the Atlas-FCM method) and 0.87 (for the Atlas-RFCM method) on degenerated disk.
Results reveals that the reconstructed 3D volumes of intervertebral disks are systematically underestimated (mean discrepancy of 22.5%) compared to volumes obtained with manual segmentation performed on 3D MEDIC and Spin Echo MR images. For the 3D FISP, there is no trend in over- or under-segmentation but there is a mean discrepancy of 30% between the automatic volumes and the manual volumes. Indeed, there is less consistency from slice to slice for the 3D FISP images because the automatic segmentation algorithm has trouble with the blurred boundaries of intervertebral disks often found in the 3D FISP sequences. The three clinical experts who performed the manual segmentation all agreed that the intervertebral disks were harder to delimitate in the 3D FISP sequences because of the blurred contours (due to variation of pixel intensities along the boundaries). Hence, even in the manually identified volumes, there is less consistency from slice to slice compared to the two other types of MR images.
The volume underestimation resulting from the automatic segmentation algorithm applied to 3D MEDIC and Spin Echo images occurs more in the lateral slices than in the mid-sagittal slices. Superimposition of volumes in space and 2D evaluation of the DSC (see Tables 3 and 4) show higher 2D DSC results in the mid-sagittal slices than in the lateral slices for all patients, meaning that the differences between the volumes lie mainly in the lateral regions of the disks. For the 3D FISP (Table 5), there is no specific region of volume under- or over-estimation since the results for the 2D DSC vary as much in the mid-sagittal slices as in the lateral slices.
For surgeons, the underestimation of the volume of anatomical structures is viewed as a margin of safety in a computer assistance system. Indeed, by reasonably underestimating the working volume (e.g. the intervertebral disk), surgeons will have more confidence in the 3D model, since they will know that if their surgical tools are inside the 3D model there is no chance to injure critical anatomical structures (e.g. the spinal cord). For example, in spinal release before instrumentation of scoliotic patient, the intervertebral disk must be partially removed and delicate anatomical structures surrounding the disk like the spinal canal and aorta must not be injured during the procedure. These structures are located to the anterior left side of the disk (for the aorta) and to the posterior side (for the spinal canal). The distance in mm between the manual and the automatic segmentations in the sagittal slices spanning the spinal canal is of 3.4 mm (±1.5mm) for the Spin Echo and 1.8 mm (±0.8mm) for the 3D MEDIC. The greater underestimation of the disk for the Spin Echo sequence can be explained by the fact that for half of the patients, the Spin Echo sequence resulted in images with some pixels being brighter in the nucleus compared to the annulus, thus misleading the automatic segmentation process which detected the nucleus boundary as the external disk boundary. A modification of the parameters of the Spin Echo sequence would eliminate this discrepancy between the results in mm of the 3D MEDIC and Spin Echo sequences. Hence, for a disk resection application, a mean underestimation distance of 1.8 mm in the mid-sagittal planes compared to manually segmented contours gives an adequate margin of safety.
The variability associated with the use of automatic segmentation is lower than the variability associated with manual segmentation performed by different users. This is true for both the 3D MEDIC and Spin Echo MR sequences, therefore making the use of the automatic segmentation method clinically feasible. Hence, this study addresses an important issue concerning the use of computer assistance in a clinical environment. Indeed, for an automatic segmentation algorithm to be acceptable, the variability of the 3D model on which the computer assistance system relies should be equal to or lower than the variability of an equivalent 3D model obtained from manual segmentation.
One of the limits of the study is that for the three MRI sequences, the Field Of View (FOV) encompasses only five to seven vertebral levels. It is well known that scoliotic patients often have double curvature (one in the thoracic region and one in the lumbar region of the spine). With such a small FOV it is not possible to image both curvatures at a time. This limitation also entails that in the robustness evaluation, the effect of the position of the disk relative to the spinal region (thoracic or lumbar) has not been considered. Because thoracic disks are smaller than lumbar disks, the behavior of the automatic segmentation algorithm may vary for different vertebral levels. In the current study, the spinal curves included in the MRI were mainly in the lumbar and lower thoracic regions.
However, the robustness study does include an evaluation of the effects of five important factors. Results show that the proposed automatic segmentation algorithm is robust, in light of the fact that the results for the 3D DSC are not affected by the severity of the spinal deformity, the position of the disk relative to the apex or the inter-patient MR intensity variation. On the other hand, the type of MR acquisition sequence is important and could substantially affect the results of the automatic segmentation. Considering that the mean 3D DSC value is significantly lower for the 3D FISP than for the other two sequences (Table 2), the 3D FISP MR acquisition protocol is not recommended for good performance of the proposed automatic segmentation method.
The recommended MR acquisition protocols for the proposed intervertebral disk automatic segmentation method are thus the Spin Echo and 3D MEDIC MR sequences. There is no statistical difference between the 3D DSC results of for these two protocols. The choice between 3D MEDIC and Spin Echo will depend on the clinician. The acquisition time of the 3D MEDIC sequence is 2.5 times longer than for Spin Echo. A longer acquisition time means less reproducible results because patients are more prone to move during the acquisition. Depending on the clinician and on the application, one might decide to use Spin Echo even if some interpolation is required between slices in order to reconstruct in 3D, because an acquisition time of only 12 minutes is more feasible and will have more chance of giving non-blurred images for all patients.