DMF-Net: a deep multi-level semantic fusion network for high-resolution chest CT and X-ray image de-noising

Medical images such as CT and X-ray have been widely used for the detection of several chest infections and lung diseases. However, these images are susceptible to different types of noise, and it is hard to remove these noises due to their complex distribution. The presence of such noise significantly deteriorates the quality of the images and significantly affects the diagnosis performance. Hence, the design of an effective de-noising technique is highly essential to remove the noise from chest CT and X-ray images prior to further processing. Deep learning methods, mainly, CNN have shown tremendous progress on de-noising tasks. However, existing CNN based models estimate the noise from the final layers, which may not carry adequate details of the image. To tackle this issue, in this paper a deep multi-level semantic fusion network is proposed, called DMF-Net for the removal of noise from chest CT and X-ray images. The DMF-Net mainly comprises of a dilated convolutional feature extraction block, a cascaded feature learning block (CFLB) and a noise fusion block (NFB) followed by a prominent feature extraction block. The CFLB cascades the features from different levels (convolutional layers) which are later fed to NFB to attain correct noise prediction. Finally, the Prominent Feature Extraction Block(PFEB) produces the clean image. To validate the proposed de-noising technique, a separate and a mixed dataset containing high-resolution CT and X-ray images with specific and blind noise are used. Experimental results indicate the effectiveness of the DMF-Net compared to other state-of-the-art methods in the context of peak signal-to-noise ratio (PSNR) and structural similarity measurement (SSIM) while drastically cutting down on the processing power needed.


Introduction
X-ray and Computed Tomography (CT) are two distinct imaging techniques used frequently in the medical field for corporal inspection of human lungs.An X-ray is a commonly used and widely available imaging technique, while a CT scan is similar to MRI in that it produces high-quality body organ images.Both types of scans produce images in different ways.The CT scan takes pictures of body organs from all angles, enhancing its accuracy, while X-rays use electromagnetic waves to flow through the patient's body, producing black-and-white photos of the inside structure.In radiology, CT scans and X-rays are frequently used for diagnostic purposes.
The quality of CT and X-ray pictures has deteriorated for a variety of causes, including the following: Blurred image: These medical pictures are blurred due to incorrect protocol parameters and patient movement [1].Patient movement is caused by various variables, including an increase in heart rate, respiration, fluctuations in the number of pixels while scanning in a uniform material, and the patient's unwillingness to cooperate.Blurring rises in direct proportion to the degree of movement.
Field of View (FOV): When the resolution is changed to take a picture of a smaller or larger area, the image quality also gets worse.
Artefact: Artifacts are incidental, supplementary graphics that appear alongside the main image, like metal artifacts, beam hardening, and faulty equipment, accidental damage, or inappropriate input can result in artefacts.
It has a large impact on reducing the model's accuracy during classification, detection, segmentation, and registration when the picture quality is influenced by various amounts of noise [2][3][4].As a result, it is important to make a model for removing noise from medical images like CT and X-rays as part of the pre-processing [5,6].
To denoise the images various models are used can be categorized in two types (1) traditional model and (2) deep models.
(1) Traditional models: Different traditional methods used are linear smoothing, median filtering, wiener filtering, anisotropic diffusion, and wavelet-based methods.
i. Linear smoothing: Here a noisy image y convolved with a Gaussian filter k, to clean up an image.
It can also be done in a fourier domain as follows: Here, capital letters stand for the Fourier transform of their counterparts (for example, Y = F(y), where F is the Fourier transform), and ⊙ stands for the element-wise product.In the Fourier domain, K is also gaussian.ii.Median Filtering: Median filtering can be used instead of linear smoothing.The idea behind median filtering is to take an image and work on it one pixel at a time.Also, every pixel is supplemented by the value that is in the middle of a group of pixels that are close to it.So, the method can also be seen as a way to filter, although the filter is not linear.iii.Wiener Filter: Linear prediction, signal restoration, and channel equalization are just a few examples of the many uses for Wiener filters.This technique works for both additive noise and multiplicative noise.iv.Anisotropic diffusion: Anisotropic diffusion, an iterative method that uses smoothing, can be used to remove noise from images.This strategy makes an effort to meet the following conditions: (a) Object borders must be kept intact, and (b) noise must be effectively filtered out in areas of high similarity.The approach is so-called because its mathematical underpinnings are similar to those of heat diffusion equations and because its smoothing or diffusion is applied in discrete regions rather than globally.v. Wavelet-based method: Here, an image is converted into a wavelet domain and wavelet coefficients.Then, the inverse wavelet transform is used to get the denoised image.
(2) Deep models: Deep learning was first introduced for image de-noising in 1989 by Chiang and Sullivan.Here the proposed neural network uses blur function and additive noise to get a clean image.The network then used weight values to get rid of complicated noise [7].To cut down on the high cost of computation, a feedforward network had been suggested to achieve a tradeoff among both de-noising performance and efficiency [8].After that, further optimization techniques were employed to speed up the convergence of the network and improve the performance of de-noising [9].Also by raising the depth or modifying the activation function, novel network designs shown competitive in removing noise [10].But these models require parameters to be set manually and it got resolved with gradient descent [11,12].Because of the aforementioned reasons, convolutional neural networks (CNNs) were proposed with vanishing gradients and different activation functions such as sigmoid [13] and tanh [14] but it needs a computationally effective platform to implement.As a result the ImageNet challenge started in 2012 brings different pre-trained models like AlexNet, VGGNet, MobileNet [15][16][17] etc. to deal with.These models were used for de-noising image starting from 2015 [18,19].Image de-noising methods mostly use the mathematical model y = x + η to get all the clean images, x, where η and y are just addi- tive noise with standard deviation(σ ) and a noisy image, respectively.Many publications used this formulation as the bedrock for their models [20]

Related work
Different CNN models have been designed for object detection and retrieval of the clear image with modified preexisting networks or designing networks with different plug-ins to improve the model results [24,25].Most of the CNNs has been designed with the aim of improving efficiency and accuracy w.r.t.de-noising.Dabov et al. [20] proposed a model with enhanced sparcity to convert 2D array to 3d array.They got a significant improvement with specially designed weiner filter.Zhang et al. [26] proposed weighted nuclear norm minimization(wnnm) technique to take the advantage of nonlocal self similarity.Buades et al. [27] put emphasis on nonlocal means(nlm) to focus on structual preservation of image.Portilla et al. [28] used statistical model with bayesian estimator to eliminate gaussian noise.Chen et al. [29] proposed residual encoder with decoder convolutional network(RED-CNN) by patch based training for CT-images.Gondara et al. [30] developed an autoencoder with convolutional layers to deal with heterogenious data to address less computational complexity.Kang et al. [22]designed an algorithm with directional wavelet to address the photon starvation in CTimages.Zheng et al. [31] enhanced the spatial adaptivity of nlm with element-wise fractal dimension.Duan et al. [32] proposed a new second-order total generalised variation (TGV) decay model to get rid of speckle noise.Yang et al. [33] developed tensor based adaptive control for principal component analysis with a searching window for image de-noising.Zohair et al. [34] proposed a phase preserving approach got better peak signal noise ratio(psnr).Trinh et al. [35] formulated quadratic programming on weighted image block resulting better performance.Chen et al. [36] introduced deep boosting framework integrating with various cnn to generate required features for noise removal.Also the increased training samples has shown better performance in suppressing the noise.As experimented with a generative adversarial network (GAN) [37] a discriminative network has been used to increase the samples for the training purposes.

Methodology
This section contains an extensive description of the DMF-Net.The noisy X-ray and CT scans are developed by adding a particular noise value with the clean image, which has been described as Y = X + η , used as a souce of data to DICN, where X : clean image, Y: input noisy image with η : external noise.The basic goal is to get the DICN to learn with a noisy image and then replicate a clean or noise-free image.The main aim of the model is to make it learn the noisy data to predict the clean image.

Feature dialation
The architecture with dilated convolutions [38] does the expansion of the receptive field, which increases accuracy with no resolution loss.Integrating multi-scale information with sub-sampling in image classification models reduces resolution.It helps in the area expansion of image data without pooling.The main aim is to cover wider pixel information to convolve for the output feature with the same cost of computation.Here the dilation factor(d) determines what will be the result by convolving with different values of d.The same kernel parameter can be used to retrieve more information.The value of d=1 means the kernel gets mapped with the same size of input but from d=2 onwards one pixel gets skipped while mapping with input.For a dilated convolution or a d-dilated convolution named *d is used.The familiar discrete convolution is simply the 1-dilated convolution.Let f 0 , f 1 , . . ., f n : Z → R be the functions and k 0 , k 1 , . . ., k n : → R be the 3 × 3 filters.Kernel filters has been applied with increasing dilation: The convolution operation is performed as defined (F * k)(p) = X For the dilated convolution operation, let d is the dilation factor then (F Here * d is assumed as the dilation factor.During con- volution operation, the dilation factor determines how many pixels will be skipped with d dilation, d-1 no. of pixels skipped from input receptive(p) during convolution performed with the kernel.The Fig. 1 shows how the convolution performed with the skipping of pixels with different dilation factor.

Proposed DMF-Net architecture
The proposed architecture DMF-Net in Fig. 2 with the layer description in Table 1, contains three parts, DFEB, CFB and PFRB.The noisy images are the input to the DFEB box which has sub-modules like convolution, convolution with batch normalization and dilated convolution with batch normalization.CFB block does the inner layer cascading by adding the output features from low, mid and high levels.Its aim is to give all the levels equal chances for (4) the extraction of the noisy feature.These get concatenated to enhance the feature responsible for the noise.DMF-Net is designed with total of 26 convolution layers.The Dilated Feature Extraction Block consist of dilated 3 × 3 with ReLU [39] activation operation and normalising of several batches (BN) [40] and operation of ReLU activation (blue box).
It adds non-linearity to the network.The Batch Normalization performed in different layers enhances the training with an improved learning rate.Here from layer numbers 14th, 15th and 16th, the feature has been taken with convolution.This has been done keeping in mind that the features from other layers may enhance the noisy information.The three features from these layers are concatenated and passed through a noise fusion block(NFB).The NFB contains Tan hyperbolic function with convolution and ReLU.Finally, it gets convolved to give the final net extracted noise.The NFB module refines the fused noise for the final considered net noise feature (NNF).This NNF is subtracted from the noisy image to give a clear image after passing through a prominent feature extraction block(PFRB).It

Training strategy
The mixed data training approach for deep network construction has been proposed here.(5) Finally, the difference between the original noise-free picture (I) and the anticipated noise-free image is computed ( I P ).After the loss is calculated, the optimization is done using adam optimizer [41] which is considered during training.The number of parameters get optimized due to the noise features extracted during training.However, the dynamic learning rate with respect to different epochs helps in faster convergence.Because the loss calculated in gradient gets transmitted due to the learning (6) I P = I noisy − f DMF −Net (N ) (7) Loss(L) = �I − I P � 2 function, reflecting the variation between the original noise and predicted noise with the parameter.

Details about the data as well as the implementation
Here our data folder contains CT image files as well as X-Ray image files.CT files of counts 1647 are collected from kaggle.com[42,43] and GitHub.com[44], each one of grayscale lung CT slices of size 128 × 128 .The data folder is again divided in two folders as CT-training and CTtesting.Randomly the files are chosen from the training folder containing 1352 files with a ratio of 90:10 as training : validation and 295 files for testing.1550 lungs X-Ray files are collected from kaggle.com[45] and distributed in two folders as X-Ray-training with 1292 files and X-Ray-testing with 258 files.Files from X-Ray-training folder are again chosen randomly as training : validation with ratio 90:10.

Evaluation metrics
The peak signal to noise ratio (PSNR) and the structural similarity index (SSIM) are employed as assessment criteria in this study.The mathematical formulation of PSNR is (8 where, peak signal to noise ratio (PSNR) is the ratio of the maximum pixel value of a noise-free image to the maximum pixel value of the image.and (MSE) is the root mean square error.
where I(i,j) is the pixel data of our noise free image, I ′ (i, j) represents the pixel data of denoised image predicted by model.p: total rows of image data pixels and q: total columns of image data pixels.SSIM measures the incessant difference between similar images.It never conclude about the original or denoised image.To observe the performance, DMF-Net is trained with specific noise and blind noise.Specific noise and blind noise has been normalized with noise level N sp (15,20,25) and N bl (0 to 55) for creating the noisy X-Ray as well as CT image.The table 2 27.53,0.8020)]respectively as shown in Fig. 7(c) and (d).The input CT and X-Ray images with output denoised images has been shown in Fig. 8.

State of art comparision
The proposed model DMF-Net compared with other de-noising models like BM3D [20], WNNM [26],NLM [27], BLS-GSM [28], RED-CNN [29], autoencoder [30], CNN-Wavelets [22], non-local means [31], frequency domain FFT [32], adaptive tensor with PCA [33], coefficient driven variation [46], phase preserving [34], optimal weight [35] and wavelet with sparse [30] .The experimental results on specified and blind noise levels are shown in Table 3 to verify the suggested model's performance.The above methods are evaluated with three different noise levels of 15, 20 and 25.At higher noise levels, our model outperformed the BM3D, WNNM, NLM, and BLS-GSM techniques in terms of PSNR and SSIM.Also our model performance has been evaluated with different medical images like X-Ray and CT as in Table 2, it is found that the performance is better in different two different types of images taken separately or mixed with blind noise and in specific noise.All the result has been compared with different existing models and our proposed model has shown the improved performance over all these.

Conclusion
In Here the model is trained and tested separately for CT images, X-Ray images and mixed images for specific and blind noise.The Fig. 9 shows result when the model trained and validated with specific noise(σ=15,20 and 25)(Fig.9(a), (b)) and Fig. 9(c), (d) shows result when the model trained with blind noise and validated with noise level(σ=15,20 and 25).The network was trained for 60 epochs with just a dynamic decaying learning rate, and it was noticed that in the case of blind noise, mixed images perform better than individual images with and without augmentation and that a good result can also be found in specific noise when using peak signal-noise ratio (PSNR) and structural similarity index measurement (SSIM) as evaluation metrics.As a consequence, the proposed model can learn features taken from mixed images and satisfactorily denoise both CT and X-ray images.Situations where sufficient data is not readily available in advance for training, or when existing data must be modified to account for novel patterns, are ideal applications for real-time machine learning.The preserved state of the trained DMF-Net can be deployed there as an event-driven model to provide real-time generation of clean images with levels of precision.

Fig. 3
Fig. 3 PSNR and SSIM w.r.t.Epoch by using DMF-Net without augmentation and with specific noise in (a, b) and blind noise in (c, d) for noise level(σ ) 15,20 and 25 The data set contains both CT and X-ray image data for the training.The basic goal of mixed training is for the model to learn a number of desirable properties.The features change drastically when a clinical image is added with a certain type of noise.Because of this diversity in noisy images, it has been experimented with here to train the model with mixed clinical images.The results found here are also promising.Also, the DMF-Net is trained separately with CT images, augmented CT images, X-ray images, and augmented X-ray images with normally distributed specific noise levels (15, 20, 25, respectively) and blind noise.The model is trained upto 60 epochs with dynamic learning rate (α=0.001,0.0001, 0.00001, 0.000001) has been considered for the smooth convergence.Learning rate ( α = 0.001 (upto 20 epoch), 0.0001 (20 <epoch ≥ 40), 0.00001 (40< epoch ≥50) and 0.000001 (50< epoch≥60).The contribution of low, mid, and high-level aspects was properly considered by integrating the features from the last three layers to determine noise.Features from the last three-layer have been added to find the net feature f(L).

Fig. 4
Fig. 4 Original CT image with noisy(N sp =15,20,25 and N bl =0 to 55) and denoised or cleaned image by DMF-Net where (a and c) shows for specific noise without and with augmentation in (b and d) shows for blind noise without and with augmentation; red circle and green box to observe inter lungs area and left lungs respectively with original image

Fig. 5
Fig. 5 PSNR and SSIM w.r.t.Epoch by using DMF-Net without augmentation and with specific noise in (a, b) and blind noise in (c, d) for noise level 15,20 and 25

Fig. 6
Fig. 6 Original X-Ray image with noisy(N sp =15,20,25 and N bl =0 to 55) and denoised or cleaned image by DMF-Net where (a and c) shows for specific noise without and with augmentation in (b and d) shows for blind noise without and with augmentation; red box and green box to observe left lungs and right lungs area respectively with original image Fig. 7 PSNR and SSIM w.r.t.Epoch by using DMF-Net without augmentation and with specific noise in (a, b) and blind noise in (c, d) for noise level 15,20 and 25

Fig. 8
Fig. 8 Original CT and X-Ray image with noisy(N sp =15,20,25 and N bl =0 to 55) and denoised or cleaned image by DMF-Net where same images of Figs. 4 and 6 are shown with specific noise and blind noise different noise values.The resulted denoised image has been displayed in Fig.4for different noise type and the peculiar areas has been selected to compare with original image.De-noising X-Ray image where DMF-Net trained with X-Ray image In this section, DMF-Net trained with 128 × 128 X-Ray image files in X-Ray-training folder containing 1292 files with specific(N sp ) and blind(N bl ) noise observing PSNR and SSIM.Also the model performance is observed for augmented images generated by flipping right-left, updown with angle 90 • ,180 • and 270 • randomly, generating 2348 image files.The model is trained for 60 epochs.The trained model for a test data of 258 images achieved average PSNR and SSIM for N sp =15,20 and 25 with [(30.94,0.8848), (29.72,0.8560)and (28.72,0.8282)]respectively and for N bl =0 to 55 with [(28.59,0.8445),(27.72, 0.8185),(26.94,0.7926)] respectively as shown in Fig. 5(a) and (b).But with augmentation the model after 60 epochs results average PSNR and SSIM for N sp =15,20 and 25 with [(31.03,0.8896),(29.95, 0.8637) and (29.19,0.8442)]respectively and for N bl =0 to 55 with [(28.92,0.8525),(28.01,0.8240),(27.19,7954)] respectively as shown in Fig.5(c) and (d) and some resulted images are shown in Fig.6for the visual comparison of some peculiar areas.

Fig. 9
Fig. 9 PSNR and SSIM w.r.t.input CT and X-Ray, where (a, b) shows result when the model trained and validated with specific noise(σ=15,20 and 25) and (c, d) shows result when the model trained with blind noise and validated with noise level(σ=15,20 and 25)

Table 1
Detailed description of layers in DMF-Net

Table 2
Using CT and X-Ray images individually and combined, PSNR and SSIM calculated at different noise levels, i.e. specific noise ( N sp ) and blind noise(N bl )

Table 3
Multiple methods were used to acquire PSNR and SSIM measurements at varying CT and X-Ray noise levels (blind and specified noise levels); the best results are highlighted in bold

De-noising both CT and X-Ray image where DMF-Net trained with mixed image
In this section, DMF-Net trained with mixed data set of 128 × 128 X-Ray image and CT images.
this work, we developed DMF-Net with a strategy of training the network with for de-noising CT and X-Ray images.Here the network is designed with dilated feature extraction block (DEFB), cascaded feature block(CFB) and prominent feature extraction block(PFEB) block.The model consist of total 26 convolutional block where 18 convolution+BN+ReLU(green box), 3 dilated convolution+ReLU (violet box), 4 convolution(orange box), 1 convolution+ ReLU (navy blue box) and Tanh function.The features of the last layers in the cascaded feature block are added with the intention to consider the low, mid and high-level features for evaluating the net noise.The clean image is extracted from the prominent feature extraction block after several convolutions.