Automated vs manual delineations of regions of interest- a comparison in commercially available perfusion MRI software

Background In perfusion magnetic resonance imaging a manual approach to delineation of regions of interest is, due to rater bias and time intensive operator input, clinically less favorable than an automated approach would be. The goal of our study was to compare the performances of these approaches. Methods Using Stroketool, PMA and Perfscape/Neuroscape perfusion maps of cerebral blood flow, mean transit time and Tmax were created for 145 patients with acute ischemic stroke. Volumes of hypoperfused tissue were calculated using both a manual and an automated protocol, and the results compared between methods. Results The median difference between the automatically and manually derived volumes was up to 210 ml in Perfscape/Neuroscape, 123 ml in PMA and 135 ml in Stroketool. Correlation coefficients between perfusion volumes and radiological and clinical outcome were much lower for the automatic volumes than for the manually derived ones. Conclusions The agreement of the two methods was very poor, with the automated use producing falsely exaggerated volumes of hypoperfused tissue. Software improvements are necessary to enable highly automated protocols to credibly assess perfusion deficits.


Background
The increased use of the perfusion imaging (PI)diffusion-weighted imaging (DWI) mismatch hypothesis [1] in studies of acute ischemic stroke (AIS) and clinical practice [1,2] is raising demands from software packages developed for volumetric calculations of hypoperfusion. Typically these programs offer the possibility to delineate a region of interest (ROI) manually but also, to a varying extent, automatically. Manual approaches to delineation are biased and require time intensive operator input. Therefore a mostly automated procedure, if accurate, would be preferred in clinical practice. The goal of our study was comparing the automated approach to the manual approach while using a number of different software packages.

Methods
The study design and cohort characteristics have been previously reported [3]. Briefly, using Stroketool (Digital Image Solutions, Germany, http://www.digitalimagesolutions.de), PMA (v3.2.0.4, ASIST, Japan, http://asist.umin.jp/index-e. htm) and Perfscape/Neuroscape (Olea Medical SAS, France, http://www.olea-medical.com) perfusion maps of cerebral blood flow (CBF), mean transit time (MTT) and Tmax were calculated for 145 patients imaged within 24 hours of AIS. The inclusion criteria were: clinically and radiologically confirmed AIS, hypoperfusion on the initial PI examination (assessed by the attending neuroradiologist) and availability of a follow-up MRI scan. For each parameter map, three thresholds were applied. The Tmax thresholds were 4, 6 and 8 seconds of delay [4] and the MTT thresholds were 5, 6 and 8 seconds. As no uniform CBF scale was available, the three CBF thresholds were different across software. Both MTT and CBF thresholds were chosen empirically using a random sample of acute stroke patients. ROI volumes were calculated using a manual and an automated protocol. In the automated protocol, once the thresholds have been applied, no further post-processing was done. Maps created in PMA and Stroketool also underwent a second post-processing step in SPM8 (Wellcome Trust Centre for Neuroimaging, UK) to cut away scalp and spaces filled with cerebrospinal fluid (CSF). This was not necessary for Perfscape/Neuroscape due to the program's implemented filtering. In the manual protocol a human rater excluded, from the thresholded maps, areas unlikely to reflect credible hypoperfusion.
Radiological outcome was defined as the final lesion volume on follow-up FLAIR images. Clinical outcome was defined as the National Institute of Health Stroke Scale (NIHSS) score at the time of hospital discharge. All statistics were done in PASW Statistics 18. Analyses of correlations were performed using the Spearman signed-ranks correlation test.

Results
The median final lesion volume was 6.55 ml (IQR 0.8 -31.6 ml). The median difference between the automatically and manually derived volumes was up to 210 ml (between 90% and 386%) in Perfscape/Neuroscape, 123 ml (between 192% and 1415%) in PMA and 135 ml (between 357% and 815%) in Stroketool. (Table 1). Bland-Altman plots for agreement of methods are shown in Figure 1. All correlation coefficients between volumes of perfusion deficit and radiological and clinical outcome were considerably lower for the automatic volumes than for the manually derived ones. The top performing automated map was Tmax in all three programs. With additional filtering applied to automated volumes calculated in PMA and Stroketool the median difference between the automatically and manually derived volumes dropped down to a maximum of 64.91 ml (between 44% and 238%) and 67.28 ml (between 33% and 369%), respectively (Table 1).

Discussion
The median volume of the perfusion deficit varied greatly based on the map and threshold and even across software for the same map and threshold. This discrepancy could in part be explained through the different choice of AIFs, the use of different deconvolution techniques, different implementations of the same calculation algorithm as well as differences in motion correction across different programs [5][6][7]. Our group had already conducted a study on a group of patients with no ischemia, using the same three software packages, and observed the presence of numerous artifacts; typically the cortex proximal to the skull and infratentorial cerebral and cerebellar tissue [8]. Programs without implemented CSF filtering also suffered from arti facts such as ventricles, eyeballs and scalp [8]. As expected, the same artifacts were present in our current patient cohort. This explains the weaker correlation coefficients and overshoot of the automated delineations as compared to the manual ones. Although most of the values on the Bland-Altman difference plot fall within the limits of agreement (Figure 1), these are much too broad with regards to the cohort's median lesion volumes and the fact that, based on location, even a lesion of a few ml can be clinically significant. Additional CSF filtering greatly reduced the differences between the automatic and the manual volumes, pointing to a need for implemented filtering.

Conclusion
In conclusion, current automated use of the here evaluated programs would lead to falsely exaggerated volumes of hypoperfused tissue in patients with AIS. However a number of improvements, such as algorithms for judging perfusion asymmetry between hemispheres and allowing selection of the vessel territory of expected hypoperfusion, could aid automated protocols in credibly assessing perfusion deficits.