Given a noisy image, v, the goal is to find an image u that ideally preserves all important features in the image while reducing the unwanted noise. Since this is an ill-posed inverse problem, additional information needs to be introduced to regularize the problem. The denoised image can then be found by minimizing a cost function of the form:
$$ E_{\lambda}(u)= \frac{1}{2} ||u-v||_{2}^{2} + \lambda R(u) $$
((1))
The first term in E
λ
(u) reflects the requirement that the denoised image should be close to the noisy image and the second term is the regularization. In the context of denoising problem formulated as (1), a proper regularization is necessary because without a regularizing term the noisy image itself is the only solution of the problem; i.e., if λ=0 then u=v is the only minimizer of E
λ
(u). A famous example of variational denoising methods is the Rudin-Osher-Fatemi (ROF) model that has the following form [23]:
$$ E_{\lambda}(u)= \frac{1}{2} ||u-v||_{2}^{2} + \lambda \int_{\Omega} |\nabla u| $$
((2))
where Ω denotes the image domain and λ is the regularization parameter. The second term, where ∇u is the gradient of u, is the regularization term. The choice of ℓ
2-norm in the first term stems from the assumption of additive Gaussian noise. For the case of Poisson distribution considered in this study, this term has to be modified.
Let u(i,j) and v(i,j) denote pixels of discretized versions of u and v. For an arbitrary pixel location (from the probability mass function of a variable with Poisson distribution):
$$ P(v(i,j);u(i,j))= \frac{e^{-u(i,j)} u(i,j)^{v(i,j)}}{v(i,j)!} $$
((3))
assuming the pixel values are independent, for the entire image we will have:
$$ P(v|u)= \prod\limits_{i,j} \frac{e^{-u(i,j)} u(i,j)^{v(i,j)}}{v(i,j)!} $$
((4))
We ignore the denominator, which is independent of u. Since we want to find a functional to minimize, we consider the negative logarithm of the numerator:
$$ -\text{log}(P(v|u) \propto \sum\limits_{i,j} u(i,j)- v(i,j) \text{log}\,(u(i,j)) $$
((5))
With this modified fidelity term, the new energy functional to be minimized will have the following form [24]:
$$ E_{\lambda}(u)= \frac{1}{2} \int_{\Omega} (u-v \, \text{log} u) + \lambda \int_{\Omega} |\nabla u| $$
((6))
Now, compare the optimality conditions for the two models (obtained from the Euler-Lagrange equation):
$$ {}\left\{ \begin{array}{ll} (u - v) + \lambda \, p =0 & \quad \text{For the ROF model in (2)}\\ (u - v) + (\lambda u) \, p =0 & \quad \text{For the new model in (6)} \end{array} \right. $$
((7))
where p is a sub-gradient of \(\int _{\Omega } |\nabla u|\). The only difference between the two equations is the dependence of the regularization parameter on u in the new model. The new model suggests that a stronger smoothing must be applied where signal has larger values. This outcome agrees with what we expect since under Poisson distribution noise level is proportional to signal intensity.
Minimization of the new functional in (6) is not a straightforward problem. One approach is to first replace |∇u| with \(\sqrt {|\nabla u|^{2}+\epsilon }\) for a small ε>0 and then to apply a gradient descent iteration [24]. Another approach, suggested in [25], is to use a Taylor’s expansion of the data fidelity term and to minimize this approximate model. In this paper, we follow an algorithm that was developed for the original ROF model [26]. However, we denoise each sinogram pixel separately by minimizing E
λ
(u) in a small neighborhood around that pixel, and with a regularization parameter inspired by the new model.
Total variation denoising has proved to be very effective when applied to piecewise-constant images. On more complex images, however, it can lead to undesired effects. Most importantly, on images with piecewise-linear features, i.e. regions with smooth change in intensity, ROF’s original model leads to staircase artifacts. This is because, by design, ROF’s model favors piecewise-constant solutions. We believe that this can be a major problem when applying TV denoising to sinogram images because even if the imaged object is piecewise-constant, its projections can be very far from piecewise-constant. This is easy to visualize because a feature with uniform intensity in the imaged object will have a piecewise-constant projection in the sinogram only if different rays from the x-ray source to the detector plane cut the same length through that feature, which is a very unlikely scenario. Hence, the projections are very likely to contain features of higher orders of regularity (i.e., piecewise-linear, piecewise-quadratic, etc.) that would suffer from a staircase effect when treated with the ROF model. A heavily researched approach to reducing the staircase effect is to replace the ℓ
1 norm of the gradient with the ℓ
1 norm of higher-order differential operators [27, 28]. A less sophisticated approach, but one that has a trivial implementation, is to perform the total variation minimization locally. This approach has also been shown to alleviate the staircase effect [29]. Moreover, with a local minimization strategy, if the size of the neighborhood considered in minimization is small enough, one can safely assume that the signal intensity and noise level are approximately constant. Therefore, a solution based on the ROF’s original model will be a good approximation to the solution of the new model based on Poisson noise. This way, we can utilize efficient existing algorithms for the ROF model while avoiding the staircase artifacts.
Since our approach is based on Chambolle’s famous algorithm [26], we briefly describe this algorithm here. If we denote by X and Y the space of the image u and its gradient, ∇u, respectively, then an alternative definition of total variation of u is:
$$ \sum\limits_{i,j} |\nabla u|_{i,j}= \text{sup} \left\{ \langle p,\nabla u \rangle_{Y} : p \in Y, |p_{i,j}| \leq 1 \right\} $$
((8))
Chambolle introduced the discrete divergence operator as the dual of the gradient operator, i.e. 〈p,∇u〉
Y
=〈−div p,u〉
X
. In the discrete image domain:
$$ \left(\text{div} \, p\right)_{i,j}= \left(p_{i,j}^{1}-p_{i-1,j}^{1}\right)+ \left(p_{i,j}^{2}-p_{i,j-1}^{2}\right) $$
((9))
Because of the duality of the gradient and divergence operators, total variation can also be written as:
$$ {}\sum\limits_{i,j} |\nabla u|_{i,j}= \mathop{\text{sup}}_{z \in K} \, \langle z,u \rangle_{X} \; \; \; \; K= \{\text{div }p: p \in Y, \, |p_{i,j}| \leq 1 \} $$
((10))
The minimizer of the energy functional in (2) is then obtained by projecting v onto the set λ
K:
$$ u= \, v- \pi_{\lambda K}(v) $$
((11))
which is equivalent to minimizing the Euclidian distance between v and λ div p, and this can be achieved via the following iteration for computing p:
$$ p^{0}=0; \,\,\, p_{i,j}^{n+1}= \frac{p_{i,j}^{n}+\tau \left(\nabla(\text{div} \, p^{n} -v/\lambda)\right)_{i,j}}{1+\tau |\left(\nabla(\text{div} \, p^{n} -v/\lambda)\right)_{i,j}|} $$
((12))
where τ>0 is the step size. For a small enough step size, τ≤1/8, the algorithm is guaranteed to converge [26].
Instead of a global solution, we minimize the energy functional (6) in a small neighborhood of each pixel. To this end, let us denote by ω the set of indices that define the desired neighborhood around the current pixel. For example, for a square neighborhood of size (2m+1)×(2m+1) pixels that we used in this study, \(\omega =\left \lbrace (i,j): \, i,j=-m:m \right \rbrace \). We also consider a normalized Gaussian weighting function on this neighborhood:
$$ W(i,j)= \exp \left(-\frac{(i^{2}+j^{2})}{h^{2}}\right) $$
((13))
The local problem will then become that of minimizing the following functional:
$$ E_{\lambda,W}(u')= \frac{1}{2} ||u'-v_{\omega}||_{W}^{2} + \lambda' \int_{\omega} |\nabla u'| $$
((14))
where \(||.||_{W}^{2}\) denotes the weighted norm with weights W and v
ω
denotes the noisy image restricted to the window ω around the current pixel. It must be clear that here u
′ is a function defined only on the neighborhood ω. The solution of this local optimization problem will be similar to Chambolle’s algorithm described above [29]. The only difference is in the update formula for p:
$$ p^{0}=0; \,\,\, p_{i,j}^{n+1}= \frac{p_{i,j}^{n}+\tau \left(\nabla\left(D^{-1}\text{div} \, p^{n} -v/\lambda'\right)\right)_{i,j}}{1+\tau |\left(\nabla\left(D^{-1} \text{div} \, p^{n} -v/\lambda'\right)\right)_{i,j}|} $$
((15))
where D is a diagonal matrix whose diagonal elements are the values of W.
The regularization parameter, λ
′, must be chosen according to (7). The simplest approach is to set λ
′= λ
v(i,j), where λ is a global regularization parameter and v(i,j) is the value of the current pixel in the noisy image. Since v(i,j) is noisy, a better choice is to use a weighted local average as the estimate of the intensity of the true image at the current pixel (note that the maximum-likelihood estimate of the mean of a Poisson process from a set of observations is the arithmetic mean of the observations). Therefore, we suggest the following choice for the local regularization parameter.
$$ {} \begin{aligned} \lambda'&= \lambda \, \frac{\sum_{-a \leq i',j' \leq a} W'\left(i',j'\right)v\left(i-i',j-j'\right)}{\sum_{-a \leq i',j' \leq a}W'\left(i',j'\right)}\quad \text{where}\\ W'(i,j)&= \exp \left(-\frac{\left(i^{2}+j^{2}\right)}{h'^{2}}\right) \end{aligned} $$
((16))
There are several parameters in the proposed algorithm. The global regularization parameter λ controls the strength of the denoising. It should be set based on the desired level of smoothing. Parameter m sets the size of the neighborhood considered around each pixel, which in this study was chosen to be a square window of size (2m+1)×(2m+1). Numerical experiments in [29] have shown that in total variation denoising, the influence map of a pixel is usually limited to a radius of approximately 10 pixels for typical values of the regularization parameter. Therefore, a good value for m would be around 10, which is the value we used for all experiments reported in this paper. The width of the Gaussian weighting function W is adjusted through h. We used h=2m which we found empirically to work well. Similarly, a and h
′ in (16) determine the size of the window and the weights used for determining the local regularization parameter. These have to be set based on the noise level in the image; larger values should be chosen when noise is stronger. In this study, we used a=4 and h
′=2a.
A simple implementation of the proposed algorithm can be computationally intensive because it will involve solving a minimization problem, though very small, for every individual pixel in the sinogram. This will be a major drawback because a big advantage of sinogram denoising methods, compared to iterative image reconstruction methods, is the shorter computational time. To reduce the computational time, after minimizing the local cost function (14) around the current pixel, we will replace the value of all pixels in the window of size (2a+1)×(2a+1) around the current pixel, instead of just the center pixel, and then shift the window by (2a+1). Our extensive numerical experiments with simulated and real projections showed that with this approach the results will be almost identical to the case where only one pixel is denoised at a time. This is the approach that we followed in all experiments reported in this paper.