Application of mean-shift clustering to Blood oxygen level dependent functional MRI activation detection
© Ai et al.; licensee BioMed Central Ltd. 2014
Received: 6 September 2013
Accepted: 28 January 2014
Published: 4 February 2014
Functional magnetic resonance imaging (fMRI) analysis is commonly done with cross-correlation analysis (CCA) and the General Linear Model (GLM). Both CCA and GLM techniques, however, typically perform calculations on a per-voxel basis and do not consider relationships neighboring voxels may have. Clustered voxel analyses have then been developed to improve fMRI signal detections by taking advantages of relationships of neighboring voxels. Mean-shift clustering (MSC) is another technique which takes into account properties of neighboring voxels and can be considered for enhancing fMRI activation detection.
This study examines the adoption of MSC to fMRI analysis. MSC was applied to a Statistical Parameter Image generated with the CCA technique on both simulated and real fMRI data. The MSC technique was then compared with CCA and CCA plus cluster analysis. A range of kernel sizes were used to examine how the technique behaves.
Receiver Operating Characteristic curves shows an improvement over CCA and Cluster analysis. False positive rates are lower with the proposed technique. MSC allows the use of a low intensity threshold and also does not require the use of a cluster size threshold, which improves detection of weak activations and highly focused activations.
The proposed technique shows improved activation detection for both simulated and real Blood Oxygen Level Dependent fMRI data. More detailed studies are required to further develop the proposed technique.
KeywordsMean-shift fMRI BOLD Clustering
Functional Magnetic Resonance Imaging (fMRI) is a technique that is used to identify regions of activations in the brain. Techniques based on cross-correlation analysis (CCA) and the General Linear Model (GLM) is commonly used for fMRI data analysis [1–7], however these techniques are not without drawbacks. Both techniques typically perform its calculations on a per voxel basis. This means that each calculation does not take into consideration any relationship that neighboring voxels may have with each other. This has the effect of lowering the sensitivity of the technique when looking for activations. This is especially true in low contrast to noise ratio (CNR) situations.
There has been interest in enhancing fMRI data analysis with cluster size tests. Various techniques have been examined with that intention in mind. Cluster analysis (based on random field theory) is commonly used to help isolate activations [8–10]. K-means clustering  is a method where observations are partitioned into "k" number of clusters where each observation belongs to the cluster with the closest mean. Fuzzy clustering  is similar to k-means clustering, except that fuzzy clustering takes into consideration that a single observation can belong to more than one cluster. Both K-means and Fuzzy clustering have been examined for improving fMRI data analysis [13–16]. Mean-shift clustering (MSC) is another technique to consider for the same purpose.
The MSC technique was first introduced by Fukunaga et al.  for examining pattern recognition, but the technique was mostly unexplored until more recently [18, 19]. The technique has found uses in image processing and vision tasks. Image segmentation has also been explored with this technique on brain images . MSC revolves around a density estimation that is done on a predetermined feature space. Intuitively, the technique works by calculating the mean shift vector, then shifting the kernel as dictated by the mean shift vector. This process is repeated as appropriate until convergence at which time a cluster in the feature space can be identified. By selectively choosing the features used for the feature space, it would be possible to incorporate characteristics of the data that normally would not be part of the analysis with CCA and GLM based techniques. MSC also offers some other advantages with regards to implementation. The technique does not require assumptions to be made about noise distribution. Compared to typical cluster analysis, no hard cut-off in cluster size is required with MSC. Since MSC is based on density estimation of a feature space, it does not make any assumption on the shape of the clusters either as well as allowing different feature spaces to be used to incorporate different characteristics of the data into analysis. These advantages with MSC may allow higher sensitivity when detecting activations for improved results.
To the best knowledge of the authors, MSC has not been closely explored in the application of fMRI activation analysis. As a first step, this study investigates a straight forward application of MSC to fMRI data analysis.
The proposed MSC technique was evaluated using both simulated and real fMRI data. The data analysis was performed using Cross Correlation Analysis (CCA) to generate a statistical parametric image (SPI). The mean-shift clustering was then applied to a feature space constructed using selected characteristics of the SPI. Comparisons were made among CCA, CCA plus cluster analysis (CCA + CA), and CCA + MSC to examine the application of MSC.
The simulated data was designed to emulate fMRI data using one hundred images, 128 × 128 matrix size, and with a block design of two and a half off/on cycles (20 images per off or on cycle). Activations of various sizes (20 × 20, 10 × 10, 2 × 2 voxels) were inserted onto the data for analysis. Gaussian noise at different CNRs (0.20, 0.40, 0.06, and 0.80) was generated and inserted into the simulated data.
Ten subjects (5 females, 5 males, age 22–32) gave informed written consent with the approval of the University of Iowa’s (USA) Institutional Review Board. All subjects reported that they were right-handed, not using medications at the time of scanning, healthy, and had no history of any mental or psychiatric conditions. All ten subjects were scanned at the University of Iowa's Medical Education and Research Facility.
Blood Oxygen Level Dependent (BOLD) fMRI data were acquired on a Siemens 3 T Trio scanner (Siemens Medical Solutions, Erlangen, Germany). A gradient echo EPI pulse sequence was used with the following parameters: TR = 2000 ms, flip angle = 90 degrees, TE = 30 ms, matrix = 64 × 64, FOV = 220 mm, slice thickness = 5 mm with 20% gap, 180 images per run. Each scanning session was composed of seven six-minute runs, though only the first run was used for the purpose of this study. A T1 anatomical scan was also performed with the following parameters: TR = 1590 ms, flip angle = 10 degrees, TE = 3.39 ms, matrix = 128 × 128, FOV = 220 mm, slice thickness = 2 mm.
Unilateral electrical stimulation was delivered to the subject’s right median nerve using a Grass S48 stimulator (Grass Technologies, West Warwick, Rhode Island, USA). The stimulation voltage used was 15 volts above the motor threshold, which was individually defined as the minimum voltage required to obtain a thumb twitch. The delivered stimulations were square wave pulses with 0.2 ms duration. A block design of four and a half off/on cycles (40 seconds off, 40 seconds on) with a randomized inter-stimulation interval (ISI) between 1.0-2.0 seconds was used. A randomized ISI was used to reduce any effect that expecting a stimulation occurring with a fixed inter stimulation interval might have on the resulting BOLD signal. The volunteers were asked to passively feel the stimulation, stay still, stay awake, and not actively perform anything else for the duration of the scan.
Mean shift clustering
MSC is based on density estimation of a predetermined multi-modal feature space of image characteristics. Previously used feature spaces, such as perceived color [21, 22], are generally not applicable to fMRI analysis since color is not a feature that would be of interest. For this study, a feature space of the estimated Z values of the SPI and the mean voxel values surrounding a voxel (eight neighboring voxels in 2D and twenty six neighboring voxels in 3D) was used as they can be features of interest and incorporating them in the analysis may help with activation detection. The estimated Z values were used because it relates directly to statistical significance, and the mean voxel value of the surrounding voxels were used to take into consideration neighboring effects. The image features are mapped into a point in a multi-dimension space. The density is calculated within a defined kernel on the feature space. The kernel is moved based on the density gradient in the feature space until the local maximum is found. Points in the feature space associated with the same local maximum are considered to belong to the same cluster, and the calculation is repeated until all points are assigned to a cluster.
is the mean shift vector where g is the kernel, h is the kernel size, x is the mean estimate inside the kernel, and xi is the element inside the kernel. The mean shift vector, m(x), defines how the kernel will move along the density gradient towards the local maximum which corresponds with dense regions in the feature space. This calculation is performed at each data point, shifted by m(x) along the density gradient, and repeated until convergence is reached when local maximum is found. This procedure allows the mean shift clustering technique to identify such locations without having to estimate the probability density function of the associated data. Points associated with the same local maximum belong to the same cluster.
Within the mean-shift vector equation, the parameter that likely has the largest effect on the analysis is the kernel size, h, as differences in kernel sizes can change the density estimates calculated which the MSC technique is based on. While adaptive techniques do exist, a range of kernel sizes will be used to examine how the technique will behave.
The general approach to the proposed MSC analysis method is applying MSC to a feature space constructed using selected characteristics of the SPI generated using CCA. CCA was chosen over GLM because (1) which technique used to generate the SPI is less important for the purposes of this study; (2) the CCA technique allows easy control over the significance level while it is more difficult to do so with GLM.
The real fMRI images were processed using Analysis of Functional NeuroImages (AFNI)  and custom Matlab software. As part of the CCA analysis, three-dimensional motion correction was performed to minimize motion effects. All images were normalized to Talairach space (http://www.bic.mni.mcgill.ca). Constant, linear, and quadratic trends were removed. To investigate the effect of a Gaussian filter on activation detection with MSC, no Gaussian filter and a Gaussian filter with full width half maximum (FWHM) of 4 mm was applied. SPIs were generated for individual subjects.
The proposed method (CCA + MSC) was compared with typical CCA and CCA plus cluster analysis (CCA + CA) procedure using the same simulated data. Activations of sizes 20 × 20, 10 × 10, and 2 × 2 were inserted onto the data to identify how the techniques behave with different sized activations. The total area of all test patterns were maintained to be the same by varying the number of inserted activations. The 2 × 2 activation size can be considered to be a simulation for highly focused activations. Gaussian white noise was generated and inserted into the simulated data at several CNRs (0.20, 0.40, 0.60, 0.80) for examination on how the technique reacts to noise. No additional smoothing filter was applied to the simulated data.
The proposed technique was assessed based on sensitivity and specificity and compared with the performances of the aforementioned techniques (CCA, CCA + CA, CCA + MSC) on simulated data. True positive rate comparisons were used to examine how the different kernel sizes affect the outcomes at various CNRs, and false positive rate comparisons were used to examine the amount of noise that appear in the results of each technique. Since simulated data was used, the ground truth is known, so identifying true and false positive rates is a simple task by comparing detected activations with true activations. Receiver operating characteristic (ROC) curves were drawn to allow a direct comparison of performance between the techniques.
The real fMRI data from ten subjects was analyzed for evaluating CCA, CCA + CA, and CCA + MSC. The three techniques are evaluated using both individual fMRI data and averaged fMRI data while controlling significance level at p = 0.01. Different statistical thresholds were determined and applied in order to set the significance level to p = 0.01 for comparison purposes. The threshold for CCA was calculated to be z = 4.8 for the filtered and z = 4.9 for the unfiltered (Bonferroni corrected). The CCA + CA used a threshold of z = 2.6 which was calculated based on a study performed by Xiong et al.  with a cluster of threshold of 6 voxels for FWHM = 4 mm and 4 voxels for the unfiltered dataset. The CCA + MSC used a threshold of z = 2 for both filtered and unfiltered data. The z threshold for CCA + MSC was selected based on the results of simulations to achieve an approximate significance level of p = 0.01 (see Results section).
True positive rate comparison
False positive rate comparison
ROC comparison of different kernel sizes
The relative performance of CCA + MSC and CCA + CA shows a rather complex relationship. In the 20 × 20 and 10 × 10 cases at kernel size = 0.20, CCA + MSC is better than CCA + CA when the false positive rates range from 0.01 and 0.05. It is inferior to CCA + CA when the false positive rate is below 0.01. Considering that significance levels of p = 0.05 and p = 0.01 are commonly used for activation detection, CCA + MSC should show improvement over CCA + CA in a practical situation. In the 2 × 2 case, CCA + MSC appears to be better than CCA + CA up to a false positive rate of about 0.01, then becomes worse until about 0.02, and the two techniques perform similarly after that.
CCA vs CCA + MSC vs CCA + CA on real data
Activation volume and average Z-scores for averaged real fMRI data
FWHM = 0 mm
FWHM = 4 mm
CCA + CA
CCA + MSC 0.05
CCA + MSC 0.10
CCA + MSC 0.15
CCA + MSC 0.20
Activation volume and average Z-score for individual subjects on real fMRI data
FWHM = 0 mm
FWHM = 4 mm
CCA + CA
CCA + MSC
CCA + CA
CCA + MSC
In this study, the adoption of MSC into fMRI analysis was examined by comparing it to CCA and CCA + CA. The ROC curves (Figure 4) indicate that the proposed MSC technique show an improvement over CCA and an improvement over CCA + CA when the false positive rate is above 0.01 in most cases. The false positive rate comparisons (Figure 2) showed significant improvement over CCA which indicates that CCA + MSC controls noise very well. This allows a lower statistical threshold to be used when identifying activations (Figure 3). Another potential benefit of MSC is for highly focused activation detection since no cluster threshold is applied.
The performance of CCA + MSC depends on the kernel size being used (Figure 1), but determining an optimal range of kernel sizes is not a trivial issue. If the kernel size used is too large, no activations would be detected. Conversely, if the kernel size used is too small, the proposed technique does not show an improvement when compared to CCA and CCA + CA. A proper kernel size needs to be used in order to see an improvement over CCA and CCA + CA. When tested with simulated data, the ROC curves and true positive comparison indicate that a kernel size of roughly 0.20 should show an improvement over CCA (Figure 4), but no activations were detected with real fMRI data if that kernel size is used. Real fMRI data showed activations at kernel size of 0.05 and 0.10 (Figure 5). The difference in optimal kernel sizes between the real and simulated data may be explained by the different noise characteristics of the data. The simulated data is the ideal situation with only Gaussian white noise. Real fMRI data will have multiple noise types such as movement artifacts, physiological noise, noise from the MRI machine itself, etc. It is likely that the optimal kernel size depends on the structure of noise in the data and needs optimized for each individual data set. Future studies are required to examine this issue in more detail.
Significant improvement is seen with CCA + MSC over CCA on the simulated data in the ROC curves in the 20 × 20 and 10 × 10 cases at a kernel size of 0.20 (Figure 4B and E), but this improvement is sudden and is concentrated in the region of false positive rates up to 0.02. While this type of improvement is typically unexpected, it is consistent with Figure 2 where false positive rates are shown to be well controlled (lower than 0.02) at a large range of z thresholds for all cases except the 2 × 2 case. This explains why the data is concentrated in the region of false positive rate up to 0.02.
The false positive rate comparison for the 2 × 2 test pattern showed a curve that is similar to the CCA curve, which does not follow the 10 × 10 and 20 × 20 cases (Figure 2). This may be due to the 2 × 2 case having many more neighboring voxels adjacent the test pattern than the other cases. Four 20 × 20 test patterns were inserted into simulated fMRI images which results in 320 neighboring voxels. To maintain the same total area of the test pattern, four hundred 2 × 2 activations were used, resulting in 3200 neighboring voxels. The 2 × 2 test image has ten times the number of voxels that are directly adjacent the inserted activations when compared to the 20 × 20 test image. It is more likely for falsely activated voxels to be detected by cluster analysis techniques if it is attached to the test pattern than when isolated. More falsely activated voxels are expected to be detected for the 2 × 2 test pattern, thus increasing the false positive rate. Regardless, the proposed technique still shows improvement when compared to CCA in the 2 × 2 case.
CCA + MSC can show an improvement over CCA + CA at a false positive rate of greater than about 0.01 with a kernel size of 0.20 (Figure 4), but this improvement is not seen in the 2 × 2 case where CCA + CA was either superior or about the same as CCA + MSC. A cluster threshold of 4 voxels was used to set the significance value at p = 0.01, which also happens to be the exact size of the simulated activations in the 2 × 2 case. If the activation size was three voxels or a different significance level was used and the voxel cluster threshold was increased, the activations would have been removed by the cluster analysis. The CCA + MSC does not have a cluster threshold and has the potential of better detecting highly focused activations.
The selection of image characteristics used in the feature space should be examined in future studies as the feature space will likely have a large effect on the results as well as the range of acceptable kernel sizes. The image characteristics selected for the featured space used in this study was the estimated Z values found in the SPI and the mean voxel values surrounding a voxel. While there are many methods of constructing a feature space, the feature space used did show an improvement over CCA in the simulated and real fMRI data at the same significance level, it is unknown what image features or what combination of image feature will produce the best feature space for fMRI analysis. The feature space used in this study does not incorporate temporal features in the data or positional features for example. There may be other types and combinations of image features that can be used, and it would certainly be an area of further examination.
The proposed technique has the limitation that the significance levels cannot be easily theoretically calculated (or at least we have not been able to come up with a method of doing so). This presents a particular drawback when doing comparisons. Significance levels can be approximated using simulations as was done in this study, but still may present some challenges when very accurate comparisons are required.
The experiment performed in this study examines the application of MSC to CCA in fMRI activation detection. The results show that an improvement with CCA + MSC can be seen over the typical CCA and CCA + CA analysis technique. The proposed technique maintains a low false positive rate which allows the use of lower statistical thresholds while controlling for noise and helps activation detection in low CNR situations. This also helps in detecting small highly focused activations especially considering that CCA + MSC does not require the application of a cluster size threshold, which is required by most cluster analysis techniques. By nature, CCA + MSC also has the ability to incorporate different image characteristics into a feature space for analysis. These benefits can help to improve activation detection in fMRI data. However, studies in the optimization in kernel size and feature space are needed to further develop the proposed technique. Despite the aforementioned limitations, the proposed technique shows promise in improving fMRI activation detection.
Blood oxygen level dependent
Cross correlation analysis
Contrast to noise ratio
Functional magnetic resonance imaging
General linear model
Mean shift clustering
Receiver operating characteristic
Statistical parametric image.
Funding for this study was provided by National Institutes of Health (R21 MH 082187–01, R01 DC004290-11), the National Natural Science Foundation of China (81371640), and the International Science and Technology Cooperation Plan of Suzhou, China (SH201210). None of these organizations had a further role in study design, data collection, data analysis, data interpretation, the writing of this manuscript, or the decision to submit the manuscript for publication.
- Bandettini P, Jesmanowicz A, Wong E, Hyde J: Processing strategies for time-course data sets in functional MRI of the human brain. Magn Reson Med. 1993, 30 (2): 161-173. 10.1002/mrm.1910300204.View ArticlePubMedGoogle Scholar
- Boynton G, Engel S, Glover G, Heeger D: Linear systems analysis of functional magnetic resonance imaging in human. J Neurosci. 1996, 16 (13): 4207-4221.PubMedGoogle Scholar
- Bullmore E, Brammer M, Williams SCR, Rabe-Hesketh S, Janot N, David A, Mellers J, Howard R, Sham P: Statistical methods of estimation and inference for functional MR image analysis. Magn Reson Med. 1996, 35 (2): 261-277. 10.1002/mrm.1910350219.View ArticlePubMedGoogle Scholar
- Calhoun V, Adali T, McGinty V, Pekar J, Watson T, Pearlson G, fMRI Activation in a Visual-Perception Task: Network of Areas Detected Using the General Linear Model and Independent Components Analysis. NeuroImage. 2001, 14 (5): 1080-1088. 10.1006/nimg.2001.0921.View ArticlePubMedGoogle Scholar
- Cohen M: Parametric Analysis of fMRI Data Using Linear Systems Methods. NeuroImage. 1997, 6 (2): 93-103. 10.1006/nimg.1997.0278.View ArticlePubMedGoogle Scholar
- Friston K, Holmes A, Worsley K, Poline J, Frith C, Frackowaik R: Statistical parametric maps in functional imaging: a general linear approach. Hum Brain Mapp. 2004, 2 (4): 189-210.View ArticleGoogle Scholar
- Penny W, Friston K: Mixtures of general linear models for functional neuroimaging. IEEE Trans Med Imag. 2003, 22 (4): 504-514. 10.1109/TMI.2003.809140.View ArticleGoogle Scholar
- Foreman S, Cohen J, Fitzgerald M, Eddy W, Mintun M, Noll D: Improved assessment of significant activation in functional magnetic resonance imaging (fMRI): use of a cluster-size threshold. Magn Res Med. 1995, 33 (5): 636-647. 10.1002/mrm.1910330508.View ArticleGoogle Scholar
- Worsley K, Evans A, Marrett S, Neelin P: Three-dimensional statistical analysis for CBF activation studies in human brain. J Cereb Blood Flow Metab. 1992, 12 (6): 900-918. 10.1038/jcbfm.1992.127.View ArticlePubMedGoogle Scholar
- Xiong J, Gao J, Lancaster J, Fox P: Clustered Pixel Analysis for Functional MRI Activation Studies of the Human Brain. Hum Brain Mapp. 1995, 4 (4): 287-301.View ArticleGoogle Scholar
- MacQueen J: Some Methods for Classification and Analysis of Multivariate Observatoins. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967, Berkeley, USA: University of California Press, 281-297.Google Scholar
- Bezdek J, Ehrlich R, Full W: FCM: The fuzzy C-Means Clustering Algorithm. Comput Geosci. 1984, 10 (2): 191-203.View ArticleGoogle Scholar
- Baumgartner R, Scarth G, Teichtmeister C, Somorjai R, Moser E: Fuzzy Clustering of Gradient-Echo Functional MRI in the Human Visual Cortex Part I: reproducibility. J. Magn Reson Imag. 1997, 7 (6): 1094-1101. 10.1002/jmri.1880070623.View ArticleGoogle Scholar
- Baumgartner R, Windischberger C, Moser E: Quantification in functional magnetic resonance imaging: fuzzy clustering vs correlation analysis. Magn Reson Imag. 1998, 16 (2): 115-125. 10.1016/S0730-725X(97)00277-4.View ArticleGoogle Scholar
- Moser E, Diemling M, Baumgartner R: Fuzzy clustering of gradient-echo functional MRI in the human visual cortex. Part II: quantification. J Magn Reson Imag. 1997, 7 (6): 1102-1108. 10.1002/jmri.1880070624.View ArticleGoogle Scholar
- Singh M, Patel P, Khosla D, Kim T: Segmentation of functional MRI by K-Means Clustering. IEEE Trans Nucl Sci. 1996, 43 (3): 2030-2036. 10.1109/23.507264.View ArticleGoogle Scholar
- Fukunaga K, Hostetler L: The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition. IEEE Trans Inf Theory. 1975, 21 (1): 32-40. 10.1109/TIT.1975.1055330.View ArticleGoogle Scholar
- Cheng Y: Mean shift, Mode seeking, and Clustering. IEEE Trans Pattern Anal Mach Intell. 1995, 17 (8): 790-799. 10.1109/34.400568.View ArticleGoogle Scholar
- Dorin C, Peter M: Mean shift: a robust approach towards feature space analysis. IEEE Trans Pattern Anal Mach Intell. 2002, 24 (5): 603-619. 10.1109/34.1000236.View ArticleGoogle Scholar
- Mayer A, Greenspan H: An Adaptive Mean-Shift Framework for MRI Brain Segmentation. IEEE Trans Med Imag. 2009, 28 (8): 1238-1250.View ArticleGoogle Scholar
- Connolly C: The relationship between colour metrics and the appearance of three-dimensional coloured objects. Color Res Appl. 1996, 21: 331-337. 10.1002/(SICI)1520-6378(199610)21:5<331::AID-COL2>3.0.CO;2-Z.View ArticleGoogle Scholar
- Wyszecki G, Stilles W: Color Science: Concepts and Methods, Quantitative Data, and Formulae. 1982, New York: J WileyGoogle Scholar
- Parzen E: On Estimation of a Probability Density Function and Mode. Ann Math Statist. 1962, 33 (3): 1065-1076. 10.1214/aoms/1177704472.View ArticleGoogle Scholar
- Cox R: AFNI: Software for Analysis and Visualition of Functional Magnetic Resonance Neuroimages. Comput Biomed Res. 1996, 29 (3): 169-173.View ArticleGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2342/14/6/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.