Skip to main content

Hahn-PCNN-CNN: an end-to-end multi-modal brain medical image fusion framework useful for clinical diagnosis



In medical diagnosis of brain, the role of multi-modal medical image fusion is becoming more prominent. Among them, there is no lack of filtering layered fusion and newly emerging deep learning algorithms. The former has a fast fusion speed but the fusion image texture is blurred; the latter has a better fusion effect but requires higher machine computing capabilities. Therefore, how to find a balanced algorithm in terms of image quality, speed and computing power is still the focus of all scholars.


We built an end-to-end Hahn-PCNN-CNN. The network is composed of feature extraction module, feature fusion module and image reconstruction module. We selected 8000 multi-modal brain medical images downloaded from the Harvard Medical School website to train the feature extraction layer and image reconstruction layer to enhance the network’s ability to reconstruct brain medical images. In the feature fusion module, we use the moments of the feature map combined with the pulse-coupled neural network to reduce the information loss caused by convolution in the previous fusion module and save time.


We choose eight sets of registered multi-modal brain medical images in four diease to verify our model. The anatomical structure images are from MRI and the functional metabolism images are SPECT and 18F-FDG. At the same time, we also selected eight representative fusion models as comparative experiments. In terms of objective quality evaluation, we select six evaluation metrics in five categories to evaluate our model.


The fusion image obtained by our model can retain the effective information in source images to the greatest extent. In terms of image fusion evaluation metrics, our model is superior to other comparison algorithms. In terms of time computational efficiency, our model also performs well. In terms of robustness, our model is very stable and can be generalized to multi-modal image fusion of other organs.

Peer Review reports


Deep learning technology is currently revolutionizing medical diagnostic services. Convolutional networks are fusing or surpassing human operators in multi-modal brain medical image fusion and are increasingly proposed as an aid to human medical decision-making. Multi-modal brain medical images use different sensors to image the head to show the anatomy and metabolism of the head [1,2,3]. Among them, Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) display the structural information of organs with high spatial resolution. They are called structural images. Positron Emission Tomography (PET) and Functional Magnetic Resonance Imaging (fMRI) images provide information about the function of organs. They are called functional images. Each unimodal image has its own characteristics. CT images are used to clearly show bone structure information, while MRI images are good at showing the physiological details of soft tissues. PET images can be used to quantitatively and dynamically detect human metabolites or drugs, while fMRI images are good at detecting changes in blood flow in the magnetic field of brain cells to help confirm the diagnosis. For example, glioma is the most common brain tumor [4,5,6], accounting for 80% of all malignant brain tumors [7]. The symptoms are not only related to their metabolism in the functional image, but also have an important relationship with their position in the brain. Therefore, it is necessary to perform multi-mode medical image fusion to gather all the features of multi-source images into an image with high contrast and resolution [8,9,10,11,12]. The obtained fusion image not only helps doctors make a more favorable diagnosis of patients, but also reduces the uncertainty of medical images generated by multiple sensors [13,14,15].

We will explain with a detailed example below.

Fig. 1
figure 1

Multi-modal image of a brain metastasis of a bronchial cancer

As shown in Fig. 1, brain images showed a large mass with edema around it. The mass and edema pressure forced the midline to shift and adjacent midbrain structures were compressed. On the Proton Density (PD) and T2-weighted (T2) images of Magnetic Resonance (MR), the large area of the left temporal area showed high signal intensity. On the contrast-enhanced image, the lesion contains cystic components. A narrower sulcus in the left cerebral hemisphere indicates severe swelling of the left cerebral hemisphere. In addition, according to perfusion Single-Photon Emission Computed Tomography (SPECT) imaging, the blood flow in the lesion area is very low. It can be seen that the fused image can reduce the diagnosis scope and eliminate interference information better than the unfused source image.

At this stage, the mainstream algorithms that prevail in multi-modal brain medical images are generally divided into two categories, namely filtering hierarchical fusion algorithms and deep learning algorithms. Among them, Dual-Tree Complex Wavelet Transform (DTCWT) [16] algorithm and Nonsubsampled Contourlet (NSCT) algorithm [17] are representative filtering hierarchical fusion algorithms. In deep learning, Coupled Neural Network (CNN) [18] algorithm and Laplacian Pyramid Sparse Representation (LPSR) algorithm [19] are more typical. Of course, there are Guided Filtering Fusion (GFF) algorithm [20], Internal Generative Mechanism (IGM) algorithm [21], Visual Saliency Map and Weighted Least Square (VSMWLS) algorithm [22]and Laplacian Re-Decomposition (LRD) algorithm [23] in between are also algorithms with good performance.

Although the above eight algorithms have achieved good fusion results, they also have their own shortcomings. The traditional algorithm pyramid decomposition wavelet transform fusion algorithm has low time complexity, however, the overall brightness of the fusion image is dark and some areas have inexplicable shadows, which shows that their details are mishandled; the image fidelity and color saturation of the convolutional neural network fusion are very good, which shows its ability to extract features. Of course, the finer the feature extraction, the higher the time complexity. At the same time, the more feature extraction, the easier it is to cause artifacts in the fusion image. In order to better express the tissue metabolism, the remaining algorithm enhances the brightness of the fusion image without changing the color information, resulting in low image contrast and affecting the presentation of the image structure. Based on the problems of the above algorithms, we propose a fusion model. The model selects the convolutional neural network to construct the feature extraction module and the image reconstruction module. In the feature fusion module, Hahn moments are used to guide the potential block to activate Pulse-Coupled Neural Network (PCNN) to realize feature map fusion. Convolutional neural networks are currently the best technology in the field of feature extraction and image reconstruction. PCNN is a global fusion algorithm that can retain more detailed information, and its signal form and processing principles are more in line with the physiological basis of the human visual nervous system . In solving the blurring of image structure, Hahn moments can represent the shape of the image well. As for how to avoid fusion artifacts, we introduce the Hahn moment energy of the block.


Related work

Hahn moment

The moment invariants of the image are good at describing the characteristics of the image. It is not only simple, but also immune to interference from light, noise and geometric distortion. Among these moments, the Hahn moment is widely used because it has a more general meaning. The Hahn moment is a kernel function of the orthogonal polynomial of Hahn. The Hahn polynomial of order N can be expressed as

$$\begin{aligned} h_n^{(\alpha ,\beta )}\left( x \right)= & {} \frac{{{B_n}}}{{\rho \left( x \right) }}{\nabla ^n}\left[ {{\rho _n}\left( x \right) } \right] \end{aligned}$$
$$\begin{aligned} {\nabla ^n}f\left( x \right)= & {} \sum \limits _{k = 0}^n {{{( - 1)}^k}\frac{{n!}}{{k!(n - k)!}}f(x - k)} \end{aligned}$$

where \(x,n = 0,1,2, \ldots N - 1\), N is a given positive integer. \(\alpha > - 1\), \(\beta > - 1\), \({B_n}(x) = \frac{{{{( - 1)}^n}}}{{n!}}\), \(\rho (x) = \frac{{\Gamma (N + \alpha - x)\Gamma (\beta + 1 + x)}}{{\Gamma (x + 1)\Gamma (N - x)}}\) and \({\rho _n}(x) = \frac{{\Gamma (N + \alpha - x)\Gamma (n + \beta + 1 + x)}}{{\Gamma (x + 1)\Gamma (N - n - x)}}\).

In order to reduce the oscillation in the calculation of higher moments and avoid the divergence of polynomial values, the polynomial is regularized. The regularized Hahn polynomial is defined as

$$\begin{aligned} {\overline{h}} _n^{(\alpha ,\beta )}(x) = h_n^{(\alpha ,\beta )}(x)\sqrt{\frac{{\rho (x)}}{{d_n^2}}} \end{aligned}$$

where \(x,n = 0,1,2, \ldots N - 1\) and \(d_n^2 = \frac{{\Gamma (\alpha + n + 1)\Gamma (\beta + n + 1)\Gamma (\alpha + \beta + n + 1 + N)}}{{(\alpha + \beta + 2n + 1)n!(N - n - 1)!\Gamma (\alpha + \beta + n + 1)}}\). Hahn moments of order \((n + m)\) in terms of weighted Hahn polynomials, for an image \((M \times N)\) with intensity function f(xy) is defined as

$$\begin{aligned} {H_{mn}} = \sum \limits _{x = 0}^{N - 1} {\sum \limits _{y = 0}^{M - 1} {{\overline{h}} _m^{(\alpha ,\beta )}(x)} } {\overline{h}} _n^{(\alpha ,\beta )}(y)f(x,y) \end{aligned}$$

where \(x,y \in [0,N - 1]\). The parameters N and M are substituted with N-1 and M-1 respectively to match the pixel points of an image.

Simplified PCNN

In view of the fact that many parameters of the original PCNN require feedback iteration to adjust, which is both inconvenient and time complexity. In this article, we use a simplified PCNN neuron model [24]. This model is loved by most scholars, so it is widely used. Its mathematical model is described as follows:

$$\begin{aligned} {F_{ij}}(n)= & {} {S_{ij}} \end{aligned}$$
$$\begin{aligned} {L_{ij}}(n)= & {} \sum {{W_{ijkl}}{Y_{ij}}} \end{aligned}$$
$$\begin{aligned} {U_{ij}}(n)= {F_{ij}}(1 + \beta {L_{ij}}(n)) \end{aligned}$$
$${T_{ij}}(n)= {e^{ - {\alpha _T}}}{T_{ij}}(n - 1) + {V_T}{Y_{ij}}(n - 1)$$
$${Y_{ij}}(n)= \left\{ {\begin{array}{*{20}{c}} {1,{U_{ij}}(n) \ge {\theta _{ij}}(n)}\\ {0,{U_{ij}}(n) < {\theta _{ij}}(n)} \end{array}} \right.$$

In simplified PCNN, the input neuron in the F channel is the pixel gray value of the external image, and the input neuron in the L channel only considers the output of the internal neuron affected by the eight neighborhoods. This operation greatly reduces the time complexity.

When the \(P \times Q\) image is input into PCNN, PCNN becomes a network composed of \(P \times Q\) neurons. The gray value of each input pixel is each stimulation signal \({S_{ij}}\). When there are pixels with similar gray values in the neighborhood of the inner matrix W and M, the pulse output generated by stimulating a pixel will cause the stimulation of the corresponding neuron in the neighborhood of the pixel with similar gray value, resulting in an edge including edge, Pulse output sequence of texture and area information. The binary image produced by it is a fusion image, so the parameters of PCNN affect the result of image fusion.

The framework of our multi-modal brain medical image fusion

In terms of image processing, the convolutional layer can extract features, which are usually more meaningful and valuable than the features obtained by traditional feature extractors. In addition, the convolutional layer also acts as a weighted average to generate output images. We design a multi-modal brain medical image fusion framework based on CNN with Hahn moment, hereinafter referred to as Hahn-PCNN-CNN. Hahn-PCNN-CNN consists of feature extraction module, feature fusion module and image reconstruction module. As shown in Fig. 2, the feature extraction module is employed to extract image information features from MR-T1 and Positron Emission Tomography labeling Glucose with Positron Nuclide 18F (18F-FDG). Secondly, the convolution feature of source images is fused through the feature fusion module. Finally, the image reconstruction module is used to generate the fused image.

Fig. 2
figure 2

Our multi-modal brain medical image fusion model

Feature extraction module

In the feature extraction module, we intend to use three convolutional layers to extract image features from source images. Feature extraction is a key step in the transform domain image fusion algorithm. For CNN, the stochastic initialization convolution kernel training regression model has been faced with instability problem and it is difficult to train. The best approach is to take advantage of model migration in migration learning. Therefore, we plan to use the first convolutional layer of GoogLeNet, which is pre-trained on ImageNet, as the First Convolutional Layer (CONV1) of our network , where it contains 64 convolution kernels of \(7 \times 7\) size, which can extract many effective features of the image.

Although CONV1 has a wide acceptance range, more graphics functions can be used. Therefore, not all image features must be displayed in the fusion. The key to multi-mode brain medical image fusion is the fusion of effective information, that is, while giving clear edges to metabolic information, as much as possible to ensure that the texture of the structural image is clear, so as to help doctors make a correct diagnosis. Therefore, the Second Convolutional Layer (CONV2) and the Third Convolutional Layer (CONV3) are added to filter the features obtained by CONV1 and finally obtain a feature map that can be fused.

If the image is undersampled, part of the image information will be lost. Therefore, in the feature selection module, we do not want to lose the information of the image. Adjust the value of the corresponding stride and padding for the kernel of each convolution layer. Details of each convolutional layer of feature extraction module are shown in the Table 1.

Table 1 Details of each convolutional layer of feature extraction module

Feature fusion module

We input the feature maps that are obtained from CONV3 into the image fusion module. Firstly, divide them into blocks and calculate the Hahn moment of each block. Then, inspired by the expansion coefficient factor, we defined the potential of the image patch. Finally, we use the potential of the image block as the external stimulus of PCNN to achieve the fusion of feature maps.

The size of a brain medical image I(x,y) is \(M \times N\). It will be divided into blocks of the same size; the block size is \(D \times D\). n order to get a block of integers, we choose to fill with the number 0 so that M and N are equal. After the fusion is completed, the filled elements are removed to obtain the fused image. All the divided blocks will form a set called \(\{ C_{ij}^I\}\), \(i \in \left\{ {1,2,3, \ldots ,M/D} \right\}\), \(j \in \left\{ {1,2,3, \ldots ,N/D} \right\}\). Hahn moments of image blocks of \(\{ C_{ij}^I\}\) can be expressed as

$$\begin{aligned} {H_{ij}} = \left[ {\begin{array}{*{20}{c}} {{H_{11}}}&{}{{H_{12}}}&{} \cdots &{}{{H_{1n}}}\\ {{H_{21}}}&{}{{H_{22}}}&{} \cdots &{}{{H_{2n}}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {{H_{m1}}}&{}{{H_{m2}}}&{} \cdots &{}{{H_{mn}}} \end{array}} \right] \end{aligned}$$

where \(m,n \in \left\{ {1,2,3, \ldots ,D} \right\}\), The Potential Energy (PE) of Hahn moments values is computed by

$$\begin{aligned} P{E_{uv}} = \sum \limits _{u = 1}^m {\sum \limits _{v = 1}^n {\frac{{\left| {{H_{uv}} - {H_{11}}} \right| }}{{\sqrt{{{\left( {u - 1} \right) }^2} + {{\left( {v - 1} \right) }^2}} }}} } \end{aligned}$$

where \(u \in \left\{ {1,2, \ldots ,m} \right\}\), \(v \in \left\{ {1,2, \ldots ,n} \right\}\). Its structure is based on the principle of linear expansion coefficient in physics.

In image fusion, how to make full use of the biological features of PCNN and combine PCNN attributes and image characteristics to determine relevant parameters has always been a research hotspot in PCNN adaptive image fusion. PCNN has a strong adaptive link strength and will not change due to changes in the image. Humans have always liked to judge the clarity of the image based on the visually significant area. Whether it is a functional image or a structural image, they all have a prominent area to express information and an inconspicuous black background. And we tend to pay more attention to the former. After all, the visually significant areas are clear. Visual saliency detection through image complexity function is introduced into multi-modal brain medical image fusion. In view of its good performance, it can detect the saliency of medical images well. In this article, the visually significant areas detected by the Complexity-Weighted Saliency (CWS) detection [25] model are used as the link strength of the corresponding neurons. When the link strength beta and the external stimulus PE are determined, they are both input into the pulse coupled neural network. The feature map fusion starts according to the following formula:

$${T^A}= PCNN(PE_{uv}^A)$$
$${T^B}= PCNN(PE_{uv}^B)$$
$$\begin{aligned} F= & {} \left\{ {\begin{array}{*{20}{c}} {\max (A(i,j),B(i,j))\mathrm{{ }}{T^A} = {T^B}}\\ {A(i,j){T^A} > {T^B}}\\ {B(i,j){T^A} < {T^B}} \end{array}} \right. \end{aligned}$$

where A represents the structure image and B represents the functional image. F represents the fused images. \(PE_{uv}^A\) and \(PE_{uv}^B\) are external stimuli of PCNN. The linking strength \({\beta ^A}\) and \({\beta ^B}\) are calculated by the CWS model. The firing times matrix \({T^A}\) and \({T^B}\) are determined by \({\beta ^A}\) and \({\beta ^B}\), respectively.

Image reconstruction module

Since our feature extraction module is composed of three convolutional layers, we continue to use the convolutional layer to reconstruct it to get our final fused image after the feature map is fused. In the image reconstruction module, we choose two convolutional layers, namely CONV4 and CONV5. The convolution kernel of CONV4 is 3 × 3, stride and padding are both set to 1. The convolution kernel of CONV5 is 1 × 1, stride and padding are both set to 0. The kernel number of CONV4 is the same as CONV3. Since CONV5 reconstructs feature maps into a three-channel output. Thereby, the kernel number of CONV5 is 3. Details of each convolutional layer of the image reconstruction module are shown in the Table 2.

Table 2 Details of each convolutional layer of image reconstruction module

Loss function

All researchers who use neural networks for image fusion will face the problem of information loss. Therefore, how to reduce information loss is particularly important. The loss function that determines the amount of information loss has become the focus of research. In the case of ground-truth image, Mean Square Error (MSE) is generally used as a loss function and the objective function is to reduce the distortion of images. However, multi-modal brain medical image fusion has two characteristics that make it unable to use MSE as a loss function. One is that there is no reference image for brain medical images and the other is that brain medical images are multi-modal. Therefore, the fusion image should retain the texture, edge and metabolic activity information in the source image as much as possible, not simply inherit all the information in the source image.

In the article, we plan to use cross_entropy loss function, Multiscale-Structural Similarity (MS-SSIM) loss function and Total Variation (TV) loss function. Cross_entropy loss function can reduce the loss in the pixel model pixel by pixel; MS-SSIM loss function can reduce the loss of brightness, contrast and structural information in the reconstruction; TV loss function can make our images smoother while eliminating noise in the source image. The total loss function formula is as follows.

$$\begin{aligned} {L_{total}} = {L_{cross\_entropy}} + {L_{MS - SSIM}} + {L_{TV}} \end{aligned}$$

Cross_entropy loss function can be described as

$$\begin{aligned} {L_{cross\_entropy}} = - \sum {I\log (O) + (1 - I)\log (1 - O)} \end{aligned}$$

MS-SSIM loss function can be described as

$$\begin{aligned} {L_{MS - SSIM}} = 1 - MS - SSIM(I,O) \end{aligned}$$

TV loss function can be described as

$$\begin{aligned} {L_{TV}} = \sum \limits _{i,j} {(({{(F(i,j - 1) - F(i,j))}^2} + {{(F(i + 1,j) - F(i,j))}^2}} ) \end{aligned}$$

where I represents the input image and O represents the output image. \(SSIM( \cdot )\) represents multi scale structural similarity operation. F(ij) is the pixel value at point (ij).

Fig. 3
figure 3

The evolution of images through feature extraction part and image reconstruction part

As shown in Fig. 3, our images reduce the loss of much information through the feature extraction layer and the image reconstruction layer.


We use the brain medical images provided by the Harvard Medical School website as the data set. The data set of this website consists of normal brain images, stroke images, brain tumor images and brain images of other diseases. In order to enhance the robustness of our model, we selected 8000 images from the normal and abnormal images on the website as the training images of our model. All images have a resolution of \(256 \times 256\) and are saved in the training set in Portable Network Graphics (PNG) format. The data set contains both CT and MRI, which are good at expressing structure, as well as PET and SPECT, which are professional at expressing metabolic information.

Qualitative and quantitative analysis methods

Qualitative methods are to combine the knowledge of medical nuclear imaging to qualitatively analyze our fusion images. The quality of brain image fusion is judged by the texture, edge and color of the image; Quantitative methods are to use several representative image fusion evaluation metrics to evaluate the quality of brain image fusion. Evaluation metrics are generally divided into five categories. There are evaluation metrics based on information theory, mainly including information entropy, mutual information, edge mutual information and normalized mutual information; there are evaluation metrics based on structural similarity, mainly including SSIM [26], MS_SSIM, MSE; there are evaluation metrics based on image features, Mainly include Spatial Frequency (SF), Standard Deviation (SD), Average Gradient (AG); there are evaluation metrics based on source image and generated image, mainly including Correlation Coefficient (CC) and \(Q_{AB}^F\) [27]; there are evaluation metrics based on human visual perception, mainly including Visual Information Fidelity (VIF) [28]. In this article, we used six metrics of Cross Entropy (CE), Feature Mutual Information (FMI) [29], SSIM, SF, VIF and \(Q_{AB}^F\) to evaluate the quality of our model and comparison algorithm.


Experiment details and result analysis

We selected multi-modal medical images of four diseases from the Harvard Medical School website to verify the performance of our model. The four diseases are Glioma, Huntington’s disease, Metastatic bronchogenic carcinoma and Alzheimer’s disease. For structural images, we generally choose MR-T2 images that are good at expressing lesion information; For functional images, we choose SPECT-Tl images, SPECT-Tc images and FDG-PET images. Our model is implemented in the PyTorch framework and trained and tested on a platform with Intel Core i7-8700k CPU and NVIDIA RTX 2080 GPU. In this experiment, we plan to adopt eight representative image fusion algorithms that are mentioned in Background.


The source images in Figs. 4 and 5 are from a 51-year-old female. She has right hemiplegia and hemianopia due to glioma. In the MR-T2 image, the astrocytoma is located in the left parietal lobe; in the SPECT-T1 image, its metabolism is abnormal. Ideally, the fused image should contain both clear astroglioma texture and edges, as well as its metabolism. From Figs. 4 and 5, it can be seen that the fused images obtained based on GFF, IGM and VSMWLS algorithms have better structural information, however, the metabolic state of the tissue can not be obtained. The contrast of the fused image obtained by DTCWT and NSCT algorithms is too low, which causes the image to darken and is not suitable for observation. The fused image obtained based on the CNN algorithm has good structural information and metabolic status, however, the texture and edges of astrogliomas are poor. The fused image obtained by the LPSR algorithm is seriously distorted. Although the fusion image obtained based on the LRD algorithm retains the color information and structural information of the source image well, it is seriously blurred in the diseased tissue, which is not conducive to the diagnosis of the doctor. In general, whether it is normal tissue or glioma, the fused image obtained by our algorithm is the best in preserving structural information and metabolism. Tables 3 and 4 show the quantitative analysis of all fusion algorithms. Obviously, our algorithm performs well in terms of feature mutual information, spatial frequency and cross entropy. Among other metrics, it is also slightly better than other algorithms.

Table 3 The objective evaluation scores about group 1 fused images
Table 4 The objective evaluation scores about group 2 fused images
Fig. 4
figure 4

The first set of fused MRI-SPECT images from 9 methods on glioma

Fig. 5
figure 5

The second set of fused MRI-SPECT images from 9 methods on glioma

Huntington’s disease

The source images in Figs. 6 and 7 are from a 70-year-old woman who was clinically diagnosed as Huntington’s disease. In the MR-T2 image, the caudate nucleus has a tendency to shrink ; in the SPECT-Tc image, we can see the metabolism of the caudate nucleus in the shrinking process. Therefore, the fusion of the above two images has far-reaching significance. The structure information of the fusion image obtained by GFF, IGM and VSMWLS algorithms is well preserved, but the metabolism can hardly be expressed; the color information of the fusion image obtained based on the LPSR algorithm is seriously distorted; the fusion image obtained based on the LRD algorithm contains the source image. However, the detail information is seriously lost; the fused image obtained by DTCWT and NSCT algorithms can also obtain the source image information, however, the visual fidelity is somewhat poor and the low contrast leads to dark shadows in the caudate body; the fused image obtained by the GFF algorithm has a deviation in the expression of metabolic information at the edge of the caudate nucleus; our model performs well in retaining overall structural information and the expression of metabolic conditions in and around the caudate nucleus. The evaluation results shown in Tables 5 and 6 can prove that our model performs far better than other algorithms on the metric of structural similarity; on other metrics, the model is also quite satisfactory and stable.

Table 5 The objective evaluation scores about group 3 fused images
Table 6 The objective evaluation scores about group 4 fused images
Fig. 6
figure 6

The first set of fused MRI-SPECT images from 9 methods on huntington’s disease

Fig. 7
figure 7

The second set of fused MRI-SPECT images from 9 methods on huntington’s disease

Metastatic bronchogenic carcinoma

The source images in Figs. 8 and 9 are from a 42-year-old patient with brain metastases from bronchial cancer. In the MR-T2 image, most of the left temporal area showed high signal intensity. The narrow sulcus in the left hemisphere is caused by severe swelling of the left hemisphere; in the SPECT-Tc image, the blood flow in the lesion area is very low. Ideally, the fused image can not only maintain high signal strength and the metabolic status of the diseased area, but also eliminate interference information. Algorithms like IGM, CNN and VSMWLS, the overall information of the fused image is well preserved, but the metabolism of the lesion is almost absent. The fused image obtained based on the GFF algorithm shows inexplicable black blocks at the lesion; the fused image obtained by DTCWT and NSCT algorithms retain the metabolism of the tissue, but the edge structure is blurred in details; the color is distorted during the fusion process based on the LPSR algorithm; the fused image obtained based on the LRD algorithm can present a better metabolic status, but the overall texture information is still unclear. The fused image obtained by our model performs well in anatomical information retention, metabolic information acquisition and visual perception. In Tables 7 and 8, we found that our algorithm excels in cross entropy, structural similarity and spatial frequency and is better than other algorithms in other metrics.

Table 7 The objective evaluation scores about group 5 fused images
Table 8 The objective evaluation scores about group 6 fused images
Fig. 8
figure 8

The first set of fused MRI-SPECT images from 9 methods on metastatic bronchogenic carcinoma

Fig. 9
figure 9

The second set of fused MRI-SPECT images from 9 methods on metastatic bronchogenic carcinoma

Mild Alzheimer’s disease

The source images in Figs. 10 and 11 are from a 70-year-old man. He suffers from memory loss and is always depressed, which is a manifestation of mild Alzheimer’s disease. In the MR-T2 image, the global hemispheric sulcus is enlarged and is more prominent in the parietal lobe; in PET-FDG images, local brain metabolism is abnormal, especially in the anterior and posterior regions of the parietal wall of the temporal lobe. Changes have also occurred on both sides of the temporal lobe. Ideally, the above-mentioned information is included in the fused image and at the same time, interference information items are eliminated. The color distortion of the fused image obtained based on GFF and LPSR algorithms will affect the expression of metabolism; the brightness of the fused image obtained by IGM and VSMWLS algorithms is too high, resulting in unclear structural information; the brightness of the fused image obtained based on DTCWT and NSCT algorithms is too dark, resulting in unclear edges of the caudate nucleus; the fused image obtained by CNN and LRD algorithms retain the effective information of the source image, but the color saturation is poor; the fused image obtained based on the Hahn-PCNN-CNN algorithm realizes the perfect fusion of the source image, both in terms of structural information and metabolism. At the same time, it also performs well in terms of chroma and saturation and the image is easy to observe. Tables 9 and 10 show the objective performance of all algorithms. It can be seen that the Hahn-PCNN-CNN algorithm is superior to other algorithms in all metrics.

Table 9 The objective evaluation scores about group 7 fused images
Table 10 The objective evaluation scores about group 8 fused images
Fig. 10
figure 10

The first set of fused MRI-PET images from 9 methods on mild Alzheimer’s disease

Fig. 11
figure 11

The second set of fused MRI-PET images from 9 methods on mild Alzheimer’s disease

Comparison and analysis with IFCNN

Inspired by Image Fusion Framework based on Convolutional Neural Network (IFCNN), we proposed our model. In multi-modal brain medical image fusion, IFCNN can not fuse images containing metabolic information of organs and tissues and only performs image fusion on CT and MRI. Therefore, the functional image with color information can not be directly fused in IFCNN, however, our model can do it. In the process of model training, we added color images to the training set so that our model can fully process this type of image. In order to prove that our model is more suitable for multi-modal brain medical image fusion than IFCNN, we designed a set of experiments to convert functional images with colors to three representative color spaces that are RGB, HSV and YUV. Then we compare their fusion results with ours.

Fig. 12
figure 12

The second group of source images and fused images by IFCNN in different color space and Hahn-PCNN-CNN

In Fig. 12, we can clearly find that the final images of the three color spaces obtained by IFCNN are not as good as our model. In HSV color space and YUV color space, the fused images have poor texture and blur and it is difficult to provide valuable information to doctors. In RGB color space, the fused image has better texture information preservation, but it lags far behind the fused image generated by our model in terms of visual fidelity. Therefore, in the multi-modal brain medical image fusion, our model is far superior to IFCNN.

Time complexity analysis

In this part, we focus on the time complexity of our model. Time complexity is an important metric that measures the efficiency of a model in this field. Generally, time complexity is directly proportional to the fusion effect. That is, the more time it takes, the better the result of the fusion. However, this is not the result we want. Our goal is to obtain a better fusion result in a short time through our fusion algorithm. In the article, we conducted a total of eight sets of multi-modal brain medical image fusion experiments.We calculate the average time loss of eight sets of images under different algorithms and put it as the average time loss of the corresponding algorithm. Figure 13 shows the average time loss of all algorithms.

Fig. 13
figure 13

The time complexity of different types of multi-modal medical images

Figure 13 shows the average time loss (in seconds) of all algorithms. Among the above methods, DTCWT showed the smallest average execution time, while IGM showed the highest execution time. Considering our proposed Hahn-PCNN-CNN algorithm, compared with NSCT, LPSR, CNN, VSMWLS, LRD, its average execution time is shorter; compared with GFF, its average execution time is longer. This is because our model uses the block potential of Hahn moment blocks, thus avoiding the possibility of image artifacts and enhancing visualization. Therefore, the algorithm execution time will increase, resulting in an average time consumption higher than the average time consumption of DTCWT and GFF. It can be seen that the focus of future work is how to minimize the average time loss of the model.

Statistical analysis of the results

We used a non-parametric test method to verify the performance of our model. Among the many non-parametric testing methods, Friedman’s test can make full use of all the information in the relevant samples. Others such as Kruskal-Wallis test and the popular median method are also correct to judge whether multiple related samples come from a population with significant differences in size, but this approach is like using a non-parametric method to test two independent samples. The information about the differences between different individuals in the sample is ignored, thereby reducing the power of the test. Therefore, We used the non-parametric Friedman’s test and the post-hoc Nemenyi test to analyze how the analyzed methods differ from each other. Tukey’s distribution is introduced to calculate the Critical Difference (CD). If the difference between the algorithm levels is greater than the CD, the difference is considered significant [30]. We use the values to calculate the level of the average data fusion method from Tables 3, 4, 5, 6, 7, 8, 9 and 10. The resulting grade is shown in Fig. 14.

Fig. 14
figure 14

The time complexity of different types of multi-modal medical images

In Fig. 14, we find that the level difference between all comparison algorithms and our algorithm is greater than the value of CD. Therefore, our algorithm has obvious statistical advantages compared with other comparison algorithms.

Tri-modal fusion

Brain medical images are often different from other types of images. Since the imaging comes from multiple sensors, each sensor has a different focus. CT images focus on skill tissues; MRI images have better imaging effect on brain tissues; PET images help doctors observe the metabolism state of tissues. Therefore, it is very challenging to extend the traditional fusion method of two-mode medical images to three or more images while avoiding the overlap of information and the blurring of key textures. Such research is of great value. Most fusion methods are designed for image fusion of two modes. If you want to achieve tri-modal image fusion, you can only manually select two images for fusion and then merge the fusion result with the third image to get the final result. This is not the best choice because the fusion algorithm assumes that the image prior may not appear in the intermediate fusion results. The previous neural network methods all fuse two images from the training set and continuous fusion will result in multiple false images.

Our fusion model just overcomes shortcomings of the above fusion method. There will be no continuous fusion of multiple false images, nor will it affect the effect of later fusion image because of different selection order in the early stage. Please note that the addition of modalities will not affect the processing time of our neural network because feature maps are extracted in parallel. Figure 15 shows us such an example. In the final fusion image, there are not only the eye-catching skull from CT, but also the clear brain tissue from MRI and metabolic information from PET around it.

Fig. 15
figure 15

Tri-modal fusion of magnetic resonance imaging, computed tomography, and single-photon emission computed tomography slices. Best viewed on screen


Current research shows that the image obtained by our model retains the texture information of the anatomical image and the color information of the functional image to the greatest extent, while maintaining the contrast of the image during fusion. On the six representative evaluation metrics, the performance of our model is significantly better than the other eight algorithms. In addition, the diversity of images enhances the robustness of our algorithm. Since our algorithm is a lightweight and high-quality algorithm, it has broad application prospects in intelligent medicine. If it is extended to other image data sets, the performance of the model can be improved.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from Experimental images in Fig. 1 are downloaded from; Experimental images in Fig. 2 are downloaded from the 85th coronal slice from; Experimental images in Fig. 3 are downloaded from the 42th sagittal slice from; Experimental images in Fig. 4 are downloaded from; Experimental images in Fig. 5 are downloaded from; Experimental images in Fig. 6 are downloaded from; Experimental images in Fig. 7 are downloaded from; Experimental images in Fig. 8 are downloaded from; Experimental images in Fig. 9 are downloaded from; Experimental images in Fig. 10 are downloaded from; Experimental images in Fig. 11 are downloaded from; Experimental images in Fig. 12 are downloaded from; Experimental images in Fig. 15 are downloaded from



Computed tomography


Magnetic resonance imaging


Positron emission tomography


Functional magnetic resonance imaging


Pulse-coupled neural network


Cerebral blood flow


Positron emission tomography labeling glucose with positron nuclide 18F


Coupled neural network


Proton density




Portable network graphics


Structural similarity


Spatial frequency


Standard deviation


Average gradient


Correlation coefficient


Visual information fidelity


Cross entropy


Future mutual information


Mean square error


Multi-scale structural similarity


Dual-tree complex wavelet transform


Nonsubsampled Contourlet


Guided filtering fusion


Laplacian pyramid sparse representation


Internal generative mechanism


Visual saliency map and weighted least square


Laplacian re-decomposition


Single-photon emission computed tomography


  1. Lim KY, Mandava R. A multi-phase semi-automatic approach for multisequence brain tumor image segmentation. Expert Syst Appl. 2018;112:288–300.

    Article  Google Scholar 

  2. Langen KJ, Galldiks N, Hattingen E, Shah NJ. Advances in neuro-oncology imaging. Nat Rev Neurol. 2017;13:279–89.

    Article  Google Scholar 

  3. Nesselroth D, Klang E, Soffer S, Druskin E, Zimlichman E. Yield of head CT for acute findings in patients presenting to the emergency department. Clin Imaging. 2021;73:1–5.

    Article  Google Scholar 

  4. Sauwen N, Acou M, Sima D, Veraart J, Maes F, Himmelreich U, et al. Semi-automated brain tumor segmentation on multi-parametric mri using regularized non-negative matrix factorization. BMC Med Imaging. 2017;17(1):29.

    Article  Google Scholar 

  5. Sun R, Wang K, Guo L, Yang C, Chen J, Ti Y, et al. A potential field segmentation based method for tumor segmentation on multi-parametric mri of glioma cancer patients. BMC Med Imaging. 2019;19(1):48.

    Article  Google Scholar 

  6. Wu Y, Zhao Z, Wu W, Lin Y, Wang M. Automatic glioma segmentation based on adaptive superpixel. BMC Med Imaging. 2019;19(1):1–14.

    Article  Google Scholar 

  7. Goodenberger M, Jenkins R. Genetics of adult glioma. Cancer Genet. 2012;205(12):613–21.

    Article  CAS  Google Scholar 

  8. Singh S, Gupta D, Anand R, Kumar V. Nonsubsampled shearlet based CT and MR medical image fusion using biologically inspired spiking neural network. Biomed Signal Process Control. 2015;18:91–101.

    Article  Google Scholar 

  9. Shahdoosti H, Mehrabi A. Multimodal image fusion using sparse representation classification in tetrolet domain. Digit Signal Process. 2018;79:9–22.

    Article  Google Scholar 

  10. Shahdoosti H, Mehrabi A. MRI and PET image fusion using structure tensor and dual ripplet-II transform. Multimed Tools Appl. 2018;77:22649–70.

    Article  Google Scholar 

  11. Li S, Kang X, Fang L, Hu J, Yin H. Pixel-level image fusion: a survey of the state of the art. Inf Fusion. 2016;33:100–12.

    Article  Google Scholar 

  12. Wang Q, Li S, Qin H, Hao A. Robust multi-modal medical image fusion via anisotropic heat diffusion guided low-rank structural analysis. Inf Fusion. 2015;26:103–21.

    Article  Google Scholar 

  13. Tanaka H, Hayashi S, Ohtakara K, Hoshi H, Iida T. Usefulness of ct-mri fusion in radiotherapy planning for localized prostate cancer. J Radiat Res. 2011;6:66.

    Google Scholar 

  14. Yoshino M, Nakatomi T, Kin T, Saito T, Shono N, Nomura S, et al. Usefulness of high-resolution 3d multifusion medical imaging for preoperative planning in patients with posterior fossa hemangioblastoma: technical note. J Neurosurg. 2016;1:66.

    Google Scholar 

  15. Bhatnagar G, Wu QJ, Liu Z. A new contrast based multimodal medical image fusion framework. Neurocomputing. 2015;157:143–52.

    Article  Google Scholar 

  16. Yu B, Jia B, Ding L, Cai Z, Wu Q. Hybrid dual-tree complex wavelet transform and support vector machine for digital multi-focus image fusion. Neurocomputing. 2016;182:1–9.

    Article  Google Scholar 

  17. Zhang Q, Guo B. Multifocus image fusion using the nonsubsampled contourlet transform. Signal Process. 2009;89(7):1334–46.

    Article  Google Scholar 

  18. Liu Y, Chen X, Cheng J, Peng H. A medical image fusion method based on convolutional neural networks. In: 2017 20th international conference on information fusion (Fusion). IEEE; 2017. p. 1–7

  19. Liu Y, Liu S, Wang Z. A general framework for image fusion based on multi-scale transform and sparse representation. Inf Fusion. 2015;24:147–64.

    Article  Google Scholar 

  20. Li S, Kang X, Hu J. Image fusion with guided filtering. IEEE Trans Image Process. 2013;22:2864–75.

    Article  Google Scholar 

  21. Zhang X, Li X, Feng Y, Zhao H, Liu Z. Image fusion with internal generative mechanism. Expert Syst Appl. 2015;42(5):2382–91.

    Article  Google Scholar 

  22. Ma J, Zhou Z, Wang B, Zong H. Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrar Phys Technol. 2017;82:8–171.

    Article  CAS  Google Scholar 

  23. Li X, Guo X, Han P, Wang X, Li H, Luo T. Laplacian redecomposition for multimodal medical image fusion. IEEE Trans Instrum Meas. 2020;69(9):6880–90.

    Article  Google Scholar 

  24. Wang Z, Ma Y, Cheng F, Yang L. Review of pulse-coupled neural network. Image Vision Comput. 2010;28:5–13.

    Article  CAS  Google Scholar 

  25. Liu M, Gu K, Zhai G, Callet, PL. Visual saliency detection via image complexity feature. In: 2016 IEEE international conference on image processing (ICIP). IEEE; 2016. p. 2777–81.

  26. Ma J, Yu W, Chen C, Liang P, Jiang J. Image quality assessment: from error measurement to structural similarity. IEEE Trans Image Process. 2004;13(4):600–12.

    Article  Google Scholar 

  27. Piella G, Heijmans H. A new quality metric for image fusion. In: 2003 international conference on image processing (ICIP). IEEE; 2003. p. 173–6.

  28. Han Y, Cai Y, Cao Y, Xu X. A new image fusion performance metric based on visual information fidelity. Inf Fusion. 2013;14(2):127–35.

    Article  Google Scholar 

  29. Haghighat M, Razian MA. Fast-FMI: non-reference image fusion metric. In: 2014 IEEE 8th international conference on application of information and communication technologies (AICT). IEEE; 2014. p. 1–3.

  30. Demsar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1–30.

    Google Scholar 

Download references


Not applicable.


This research was funded by the National Key Research and Development Project of China under Grant 2019YFC0409105, by the National Natural Science Foundation of China under Grant 61801190, by the Nature Science Foundation of Jilin Province under Grant 20180101055JC, by the Industrial Technology Research and Development Funds of Jilin Province under Grant 2019C054-3, by the “Thirteenth Five-Year Plan” Scientific Research Planning Project of Education Department of Jilin Province (JKH20200678KJ, JJKH20200997KJ).

Author information

Authors and Affiliations



K.G., X.L. and X.H conceived the study and design. K.G. and X.H made a formal analysis of data. K.G., X.L. and X.H verified the design. K.G. and J.L are responsible for collecting data. J.L. is responsible for software implementation. K.G. and J.L analyzed and interpreted the data and drafted the manuscript. K.G., X.L. and X.H. performed critical revision of the manuscript. K.G. and J.L are responsible for the visualization of the model. T.F. supervised the study. T.F., X.L. and X.H. provided funding support for this research. All authors read and approved the final manuscript.

Author information

Kai Guo received the B.S. degree in computer science and technology from Jilin University, in 2015, where he is currently pursuing the Ph.D. degree with the College of Computer Science and Technology. His research interests include machine learning, image processing, especially for image fusion and image segmentation.

Xiongfei Li received the B.S. degree in computer software in 1985 from Nanjing University, the M.S. degree in computer software in 1988 from the Chinese academy of sciences, the PhD degree in communication and information system in 2002 from Jilin University. Since 1988, he has been a member of the faculty of the computer science and technology at Jilin University, Changchun, China. He is a professor of computer software and theory at Jilin University. He has authored more than 60 research papers. His research interests include data mining, intelligent network, image processing and analysis.

Xiaohan Hu received her B.M. degree in Medical tests(Clinically Oriented) from Southern Medical University(The Former First Military Medical University) in 2008, M.M. degree in Clinical Medicine from Jilin University in 2012 and received her M.D. degree in Diagnostic and Interventional Radiology from Johann-Wolfgang-Goethe-Universität Frankfurt am Main (Goethe University Frankfurt) in 2015. Now, she works as a lecturer and a doctor at Radiology Department,The First Hospital of Jilin University. Her research interests include medical image data analysis and disease diagnosis. She has authored more than 10 research papers.

Jichen Liu is studying for B.S. degree at College of Software Jilin University. His research interests include machine learning, deep learning , image fusion and image segmentation.

Tiehu Fan received the B.S. degree in computer science and technology from Jilin University, in 2003, and the Ph.D. degree in computer science and technology from Jilin University, in 2010. He is currently an Associate Professor with the College of Instrumentation and Electrical Engineering, Jilin University. He has authored more than 10 research articles. His research interests include Intelligent Control and Integrated computational intelligence.

Corresponding author

Correspondence to Xiaohan Hu.

Ethics declarations

Ethics approval and consent to participate

The experimental protocol was established according to the ethical guidelines of the Helsinki Declaration and was approved by the Human Ethics Committee of the First Hospital of Jilin University. Informed consent was obtained from all participants.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Consent for publication was obtained from every individual whose data are included in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, K., Li, X., Hu, X. et al. Hahn-PCNN-CNN: an end-to-end multi-modal brain medical image fusion framework useful for clinical diagnosis. BMC Med Imaging 21, 111 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: