### Experimental protocol and image acquisition

The algorithm was tested in an experimental protocol developed at the Department of Images for Diagnosis and Therapy of the National Cancer Institute of Milan and carried out September through November 2004. We examined 19 consecutive patients of median age 45 years (range 37–67) with suspected primary or metastatic liver parenchyma lesions following US or CT examination. All patients underwent 1.5 T MR imaging (Siemens Vision, Erlangen). VIBE T1-weighted sequences (TR 5.2 ms, TE 2.6 ms, flip angle 20°, slice thickness 1.5 mm) were acquired (axial orientation) before and 45s after contrast agent injection. Acquisition parameters were optimized to best enhance the portal system; 70 to 112 slices per volume were acquired depending on liver size. Breath-holds were at exhale and lasted 21–26s. Gd-DTPA and Gd-BOPTA were used as contrast agents: 0.2 mmol/kg was injected in a single bolus at 1.5–2 ml/s followed by 20 ml physiologic saline (0.09% NaCl) at the same rate to accelerate contrast dissemination to the central venous system.

### Image preprocessing

To improve image quality before registration, including reducing image noise and liver segmentation, the following pre-processing steps were applied.

Volumetric acquisitions have lower signal-to-noise ratio than 2D acquisitions [29]. Superimposed noise may reduce the efficiency of registration [27] and image noise is amplified by volume subtraction thereby reducing the possibilities for structure detection and tissue characterization. Classical noise filtering techniques introduce blurring [30] and reduce spatial resolution and are therefore not suitable. To attenuate noise while preserving image details, an adaptive filter designed for brain MR images was used [30]. Briefly, the filter works as follows: for each pixel (*x*,*y*), a limited homogeneous region enclosing the pixel (template) was searched. The filter output *T*(*x*,*y*) was calculated using the following adaptive 2D formula:

T(x,y)=\frac{{\sigma}_{k}^{2}(x,y)I(x,y)+{\sigma}_{n}^{2}{m}_{j}}{{\sigma}_{k}^{2}(x,y)+{\sigma}_{n}^{2}},

where

(*x*,*y*) = max(0,{\sigma}_{i}^{2}(*x*,*y*) - {\sigma}_{n}^{2})

and where *m*
_{
j
}is local mean, {\sigma}_{i}^{2}(*x*,*y*) is the image variance (computed on the template) and {\sigma}_{n}^{2} is noise variance (defined *a priori*). The filter implies a tradeoff between smoothing efficiency and preservation of discontinuities.

In accordance with previous suggestion [15–17], only liver voxels were considered in the registration algorithm, since other abdominal organs may have motion relative to the liver [31] and thereby confound realignment. The liver is usually segmented to exclude uninteresting structures from the registration. We performed manual liver segmentation [15]: after selecting the slice with maximum area of liver, an irregular polygon (*mask*) was inserted manually to define the liver outline and exclude other tissues. The mask was then applied to all other slices in the volume. Upper and lower slices, where liver formed a small fraction of the total mask area, were excluded.

### Image registration

Liver motion was modeled by a rigid 3D translation to compensate for cranio-caudal movements and by a 2D local non-rigid registration to compensate for deformation within the slice plane.

Rigid registration results in rough realignment of the two volumes, producing slice-to-slice correspondence between the pre- and post-contrast datasets, while non-rigid registration locally refines the rigid realignment.

#### Rigid registration

Rigid 3D translation was achieved using normalized mutual information (NMI) as the intensity-based similarity measure [25]. NMI is defined as:

NMI=\frac{H\phantom{\rule{0.5em}{0ex}}(T({I}_{1}))+H({I}_{2})}{H(T\phantom{\rule{0.5em}{0ex}}({I}_{1}),{I}_{2})}\phantom{\rule{0.5em}{0ex}}\left(1\right)

where *I*
_{1} is the pre-contrast image, *I*
_{2} the post-contrast image, *T* is the estimated translation and *T*(*I*
_{2}) is *I*
_{2} corrected for the motion field; *H*(*I*
_{1}) and *H*(*T*(*I*
_{2})) are the entropies of the images *I*
_{1} and *T*(*I*
_{2}) respectively, and *H*(*I*
_{1},*T*(*I*
_{2})) is their joint entropy. In order to reduce computation time, only translations were considered. With the patient supine, rotations are negligible and anyway mainly occur around the z-axis [33]. Any rotations around this axis can be corrected in the subsequent 2D non-rigid registration step.

#### Non-rigid registration

The 2D non-rigid registration method relies on an adaptation of the algorithm of Magarey [26, 27] originally applied for video coding. The algorithm is applied to each images pair of pre- and post-contrast volumes (Figure 1). There are four main steps in the algorithm.

#### (a) Image analysis

A CDWT pyramid is applied to the pre-contrast image (*I*
_{1}) and the post-contrast image (*I*
_{2}). The CDWT analyses each image *I*(**n**) using banks of *wavelet filters* Ψ(**n**) and *scaling filters* Φ(**n**) whose outputs are downsampled.

In particular, we have:

{D}^{(q,m)}(n)={\displaystyle \sum _{k}I(k){\psi}^{(q,m)}}({2}^{m}n-k)\phantom{\rule{0.5em}{0ex}}\left(2\right)

{I}^{(p,m)}(n)={\displaystyle \sum _{k}I(k){\phi}^{(p,m)}}({2}^{m}n-k)\phantom{\rule{0.5em}{0ex}}\left(3\right)

where Ψ(**n**) and Φ(**n**) are complex valued Gabor-like filters; *p* = 1,2 and *q* = 1, ..., 6, so that at each level there are eight complex outputs. *I*
^{(1,m) }and *I*
^{(2,m) }are images of lower resolution corresponding to inputs at the next level *m* + 1. *D*
^{(q,m) }are the downsampled bandpass subimages of *I*. Each wavelet filter, and its corresponding subimage, has a characteristic orientation specified by the spatial frequency **Ω**
_{
q,m
}. In the spatial frequency domain, the six filters cover the first and the second quadrants which contain non redundant information for image analysis.

#### (b) Multiresolution

At each level *m* of decomposition, the CDWT produces 6 oriented bandpass subimages, with resolution 1/2^{m}times that of the original image. The algorithm starts estimation at the coarsest level *m*
^{max}, using the bandpass subimages to produce one motion estimate at each pixel. The field density is therefore {2}^{{m}^{\mathrm{max}\phantom{\rule{0.5em}{0ex}}}}, i.e. one vector per block of {2}^{{m}^{\mathrm{max}\phantom{\rule{0.5em}{0ex}}}} × {2}^{{m}^{\mathrm{max}\phantom{\rule{0.5em}{0ex}}}} input pixels. After interpolating by 2 in each direction, the estimation field is used as input to estimate at the next finer level, along with the 12 bandpass subimages at that level.

#### (c) Registration criterion

The principal part of the estimate of the motion field (MF) is the figure of merit based on the CDWT coefficients, known as the *subband squared difference* (*SSD*) and defined at each level as follows:

SS{D}^{m}(n,f)={\displaystyle \sum _{q=1}^{6}\frac{{\left|{D}_{1}^{(q,m)}(n+f)-{D}_{2}^{(q,m)}(n)\right|}^{2}}{{P}^{(q,m)}}}\phantom{\rule{0.5em}{0ex}}\left(4\right)

where *D*
_{1}
^{(q,m)}(**n**+**f**) and *D*
_{2}
^{(q,m)}(**n**) are the CDWT coefficients at level *m* as obtained by decomposition of images *I*
_{1} and *I*
_{2}, respectively, and where *P*
^{(q,m) }is the energy of each wavelet filter. The *SSD* is a local measure, computed for each pixel **n**. It can be shown that for a small offset, a 2D quadratic model fits the *SSD* expression,

*SSD*
^{m}(**n**,**f**) ≈ \frac{1}{2}(**f** - **f**
_{0})^{T}
**κ**(**f** - **f**
_{0}) + *δ* (5)

where the minimum value **f**
_{0} is the motion estimate at the pixel **n**. The surface parameters {**f**
_{0}, **κ**, *δ*} (the minimum location, the *curvature matrix* and minimum height of the surface, respectively), are computed directly from the coefficients *D*
_{1}
^{(q,m)}(**n**+**f**), and *D*
_{2}
^{(q,m)}(**n**) and the spatial frequency **Ω**
^{q,m}. It turns out that **f**
_{0} depends heavily on the phase of the two complex coefficient vectors, while the magnitudes have comparatively little influence. This property renders the algorithm insensitive to global perturbations such as level shifts and intensity scaling.

Note that it is possible to obtain **f**
_{0} from equation (5) by analytical calculation, providing a very efficient way of computing **f** over a real interval of values, and avoiding the need for a time-consuming search over a discrete set of candidate solutions.

#### (d) Course to fine refinement

*SSD* is computed at each decomposition level. After estimation at the coarsest level (*m* = *m*
^{max}) motion estimates and other surface parameters used as starting values (using bilinear interpolation to double the field density) at the next finer level (*m* = *m*
^{max} - 1). Hence the surfaces SS{D}^{{m}^{\mathrm{max}\phantom{\rule{0.5em}{0ex}}}-1} are added to those from level *m*
^{max} to give cumulative *SD* surfaces CS{D}^{{m}^{\mathrm{max}\phantom{\rule{0.5em}{0ex}}}-1}. The *CSD* surfaces are then used as inputs to the next finer level *m*
^{max} - 2, and the procedure is repeated until the required level of detail, given by *m*
^{min} is reached. Finally the pre-contrast image is warped using the obtained motion field. A bilinear interpolation is used for non-integer locations.