If you have trouble viewing this, try the pdf of this post. You can download the code used to produce the figures in this post.

My last post showed that detection performance is determined by the signal to noise ratio (SNR). I derived a formula for the SNR with multispectral measurements, which depends on the covariance of the A-space data. This post shows how to compute the A-space covariance from the x-ray data noise and the effective attenuation coefficient matrix. In general, this depends on the type of estimator used so instead I will use the Cramèr-Rao lower bound (CRLB), which is the minimum covariance for any unbiased estimator. This gives a general result independent of the specific estimator implementation. I will show that the constant covariance CRLB is sufficiently accurate for our purposes. These results will allow us to compute signal to noise ratios for limited energy resolution measurements that are directly comparable to the Tapiovaara-Wagner[3] optimal SNR with complete energy information.

In a previous post I showed that we could use a linearized model for the transmitted x-ray data

where **L** = − log(^{I}⁄_{I0}) is the log of the transmitted flux *I* , *I*_{0} is the transmitted flux with no object in the beam, **M** = **(∂L)/(∂A)** is the gradient of the measurements in A-space, and **w** is a zero-mean multinormal random variable. Eq. 1↑ is linearized about an operating point (**L**_{o}, A_{o}) so **δL = L − L**_{o} and **δA = A − A**_{o}. To understand the estimator, I will start by assuming scalar data. The linearized model is then ( I will drop the *δ* to simplify notation so *δ***L** → **L** and *δ***A** → **A** and the model is assumed to be used around the operating point)

where *w* ~ N(0, *σ*^{2}_{L}). Since there is only one measurement, a reasonable way to proceed is to solve the deterministic part of (2↑) with the measurement so *Â* = ^{L}⁄_{M}. The variance of the estimate will then be *var*(*Â*) = *var*(*w* ⁄ *M*) = ^{σ2L}⁄_{M2}.

This estimator is actually the maximum likelihood estimator (MLE). To see this, introduce the log-likelihood function ℒ, which is the logarithm of the probability distribution function evaluated with the actual measurement. The log-likelihood is a function of the data and the unknown parameter ℒ = ℒ(*x*;*A*). The maximum likelihood estimator chooses the value of *A* that maximizes the log-likelihood *Â* = *max*[ℒ(*x*;*A*)]_{A}. With scalar normal data, the log-likelihood is

Notice this reaches its maximum value when the first term is zero. Solving *L* − *MA* = 0, the estimator is *Â*_{MLE} = ^{L}⁄_{M}, which is the solution of the non-random part of the model with the measurement.

The error in the estimate will be proportional to the radius of curvature *ρ* of ℒ at the maximum value. This is the negative of the second derivative evaluated at *A* = *Â*
*radius* *of* *curvature* = *ρ* = − ⎡⎣(∂^{2}ℒ)/(∂*A*^{2})⎤⎦_{Â}

Differentiating (3↑), the radius of curvature is
*ρ* = (*M*^{2})/(*σ*^{2}_{L})

Notice that we can express the variance of the estimate as the inverse of the second derivative
*var*(*Â*)
=
(*σ*^{2}_{L})/(*M*^{2})
=
(1)/(*ρ*)
=
− (1)/(⎡⎣(∂^{2}ℒ)/(∂*A*^{2})⎤⎦_{Â})

In general, the second derivative may depend on the data, so we would use the expected value of the second derivative
*var*(*Â*) = − (1)/(⟨⎡⎣(∂^{2}ℒ)/(∂*A*^{2})⎤⎦_{Â}⟩)

Appendix 3A of Kay[2] proves that this is the minimum variance for any unbiased estimator, called the Cramèr-Rao lower bound (CRLB).

The MLE is the value of the parameter where the derivative is equal to zero. The first derivative of ℒ is

(4)
(∂ℒ)/(∂*A*)
=
(*M*)/(*σ*^{2}_{L})(*L* − *MA*)
=
(*M*^{2})/(*σ*^{2}_{L})⎛⎝(*L*)/(*M*) − *A*⎞⎠
=
*I*(*g*(*L*) − *A*)

With this factoring, the maximum likelihood estimate is *Â*_{MLE} = *g*(*L*) = ^{L}⁄_{M}. Notice that the variance of the estimate, from my discussion in the first paragraph of this section is *var*(*Â*) = (*σ*^{2}_{L})/(*M*^{2}) = ^{1}⁄_{I} . The proof in Appendix 3A of Kay[2] shows that this is a general result. If we can factor the derivative as in (4↑), then the inverse of the leading factor, *I* , is the CRLB.

We can parallel this derivation with vector data by following the two-step approach used in my previous post to derive the optimal detector: first solve the problem assuming “white” equal variance data and then use the whitening transform to convert the general case to the simplified one. If the covariance of the measured data is **C**_{L} = *σ*^{2}_{L}**I**, where **I** is the identity matrix with ones on its diagonal, then the log-likelihood function is

ℒ(**L;A**) = (1)/(2*σ*^{2}_{L})**(L − MA)**^{T}(L − MA) + *constant*

Expanding the product and gathering common terms utilizing the fact that ℒ is a scalar so we can take the transpose of any term without affecting the result

ℒ(**L;A**) = (1)/(2*σ*^{2}_{L})[**L**^{T}L − 2**L**^{T}MA − A^{T}(M^{T}M)A] + *constant*

I will use the general formulas for matrix derivatives from my post,

(∂**b**^{T}x)/(∂**x**) = **(∂****x**^{T}b)/(∂**x**) = b

(5)
(∂**x**^{T}Bx)/(∂**x**)
=
**(B + B**^{T})x
=
2** Bx**
*if*** B** symmetric

where **b** is a constant vector and **B** is a constant matrix and **x** is the vector of variables. The derivative of ℒ is (since **(M**^{T}M) is symmetric)

(6)
(∂ℒ)/(∂**A**)
=
(1)/(*σ*^{2}_{L})[**M**^{T}L − (M^{T}M)A]
=
(**(M**^{T}M))/(*σ*^{2}_{L})[**(M**^{T}M)^{ − 1}M^{T}L − A]

The MLE is derived by setting the derivative (6↑) equal to zero so
**A**̂_{MLE} = **(M**^{T}M)^{ − 1}M^{T}L

and by analogy to the scalar case, the covariance of the estimate is the inverse of the leading factor of (6↑)
*cov*(**A**̂_{MLE}) = *σ*^{2}_{L}**(M**^{T}M)^{ − 1}

We can use these results for the case with a general covariance matrix **C**_{L} by using the whitening transform **V** defined so that **C**^{ − 1}_{L} = V^{T}V and **C**_{L} = V^{ − 1}V^{ − T}. Transforming the measurement data so **L’ = VL**, the transformed linear model is
**L’**
=
**V(MA + w)**
=
**M’A + w’**

where **M’ = VM** and **w’ = Vw**. The covariance of **w’**is the identity matrix
*cov*(**w’**)
=
⟨(**Vw**)(**Vw**)^{T}⟩
=
**VC**_{L}V^{T}
=
**VV**^{ − 1}V^{ − T}V^{T}
=
**I**

so we can apply the white case results. Using the MLE from the white case and noting that **M’**^{T}M’ = **M**^{T}V^{T}VM = **M**^{T}C^{ − 1}_{L}M

(7)
**A**̂_{MLE}
=
(**M**^{T}C^{ − 1}_{L}M)^{ − 1}** M**^{T}V^{T}VL
=
(**M**^{T}C^{ − 1}_{L}M)^{ − 1}** M**^{T}C^{ − 1}_{L}L

Similarly, the covariance of the estimate is

(8)
*cov*(**A**̂_{MLE})
=
(**M’**^{T}M’)^{ − 1}
=
** (M**^{T}C^{ − 1}_{L}M)^{ − 1}

We can see how these formulas work an example. Suppose we have two-spectrum data such as from two x-ray tube voltages or two bin PHA. In this case the **M** matrix is
**M** = ⎡⎢⎣
*M*_{11}
*M*_{12}
*M*_{21}
*M*_{22}
⎤⎥⎦

and the covariance is
**C**_{L} = ⎡⎢⎣
^{1}⁄_{N1}
0
0
^{1}⁄_{N2}
⎤⎥⎦

where *N*_{1} and *N*_{2} are the mean values of the photon numbers of the measurements.

Substituting these into (7↑), the MLE is

(9) **A**̂_{MLE} = (1)/((*M*_{11}*M*_{22} − *M*_{12}*M*_{21}))⎡⎢⎣
*M*_{22}*L*_{1} − *M*_{12}*L*_{2}
− *M*_{21}*L*_{1} + *M*_{11}*L*_{2}
⎤⎥⎦

Notice the MLE is just the solution of the linear system **MA = L**. For example, we can write the solution for *A*_{1}̂ using determinants and cofactors as in elementary algebra
*A*_{1}̂ = (|||
*L*_{1}
*M*_{12}
*L*_{2}
*M*_{22}
|||)/(|** M**|) = (*M*_{22}*L*_{1} − *M*_{12}*L*_{2})/(|** M**|)

where | | is the determinant.

The covariance of the estimate is

(10) *cov*(**A**̂_{MLE}) = (|||
(*M*^{2}_{22})/(*N*_{1}) + (*M*^{2}_{12})/(*N*_{2})
− ⎛⎝(*M*_{21}*M*_{22})/(*N*_{1}) + (*M*_{11}*M*_{12})/(*N*_{2})⎞⎠
− ⎛⎝(*M*_{21}*M*_{22})/(*N*_{1}) + (*M*_{11}*M*_{12})/(*N*_{2})⎞⎠
(*M*^{2}_{21})/(*N*_{1}) + (*M*^{2}_{11})/(*N*_{2})
|||)/(|** M**|^{2})

These results parallel those I derived in my dissertation and my 1976 paper[1]. The covariance (10↑) is Eqs. 14 and 15 of my paper and the MLE (9↑) with two spectra is just the solution of the deterministic equations with the measured data as I showed in my paper.

The matrix multiplications can get tedious so in my next post I will describe how to use the free symbolic mathematics program Maxima to carry them out. I am including Maxima scripts for some of the derivations in the code for this post.

We know that with x-ray data the covariance is not constant. Indeed, the variance increases strongly with object thickness. So an immediate question is: What is the difference between the CRLB computed using constant covariance and the general result that takes the change into account? To answer this, I did a simulation that computes the error by using the constant covariance formula as a function of object thickness.

Appendix 3.C of Kay[2] derives the inverse of the CRLB **I(A)** for the general multinormal case where the covariance can change. The results are (translated to my notation)

(11) [**I(A)**]_{ij} = ⎡⎣(∂**L(A)**)/(∂*A*_{i})⎤⎦^{T}**C**^{ − 1}(A)⎡⎣(∂**L(A)**)/(∂*A*_{j})⎤⎦ + (1)/(2)*tr*⎡⎣**C**^{ − 1}(A)(∂**C**)/(∂*A*_{i})**C**^{ − 1}(A)(∂**C**)/(∂*A*_{j})⎤⎦

Recall from my derivation of the linearized model that **(∂****L(A)**)/(∂A) = **M**, so this is

[**I(A)**]_{ij} = **M**(:, *i*)^{T}**C**^{ − 1}M(:, *j*) + (1)/(2)*tr*⎡⎣**C**^{ − 1}(A)(∂**C**)/(∂*A*_{i})**C**^{ − 1}(A)(∂**C**)/(∂*A*_{j})⎤⎦

where the notation **M**(:, *k*) is the *k*’*th* column of **M**. Comparing this with Eq. 8↑, the first term is the constant covariance CRLB and the second term measures the effect of changes in **C** as a function of **A**.

We can see that the constant covariance term in (11↑) will be much larger than the second term as follows. With Poisson data, the variance of the logarithm is the inverse of the expected value and, since the PHA counts are independent, the covariance and its inverse are
**C** = ⎡⎢⎢⎢⎣
1 ⁄ *N*_{1}
⋱
1 ⁄ *N*_{nbins}
⎤⎥⎥⎥⎦, **C**^{ − 1} = ⎡⎢⎢⎢⎣
*N*_{1}
⋱
*N*_{nbins}
⎤⎥⎥⎥⎦,

where *N*_{k} is the number of counts in bin *k* and *nbins* is the number of bins. The first term in (11↑) is therefore proportional to the number of photons while the second term is essentially the product of **C** and its inverse and is approximately constant. Since in medical imaging the number of counts is usually large, we would expect that the constant covariance CRLB is much larger than the derivative of the covariance term.

This is shown by the code for this post. The code has Matlab functions to compute the general and constant covariance CRLB for points along a diagonal line through A-space. In order to compute the general CRLB, we need to have formulas for the derivatives of the covariance in (11↑). Since **C** is diagonal, we only need the derivative of each diagonal element

which can be computed at each point from **M** and *N*_{k}.

The path through A-space is shown in the left panel of Fig. 1↓. The right panel shows the norm of the difference of the constant covariance and general CRLB matrices divided by the norm of the general CRLB as a function of distance along the line.

Fig. 1↑ shows that the errors in the CRLB using the constant covariance formula Eq. 8↑ are less than a few percent although, as expected, they become larger for thicker objects where the number of transmitted photons is smaller. We can therefore use the simpler formula Eq. 8↑ to compute the CRLB of the A-space data without incurring a significant error. Substituting in the formula from my last post, the overall *SNR*^{2} is
*SNR*^{2} = ** δA**^{T}C^{ − 1}_{A}δA = **δA**^{T}M^{T}C^{ − 1}_{L}MδA

In the next posts, I will discuss computing the covariance of the measurements for different types of detectors. These require the derivation of the effective attenuation coefficient matrix **M** and the covariance of the log data **C**_{L}. Both of these depend on the properties of the detector and the energy resolution of the measurements. The derivations sometimes involve complex matrix computations, so I double-check them using Maxima and also Monte Carlo simulations.

Last edited Sep 24, 2013

© 2012 by Aprend Technology and Robert E. Alvarez

Linking is allowed but reposting or mirroring is expressly forbidden.

[1] : “Energy-selective reconstructions in X-ray computerized tomography”, *Phys. Med. Biol.*, pp. 733—44, 1976.

[2] : *Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory *. Prentice Hall PTR, 1993.

[3] : “SNR and DQE analysis of broad spectrum X-ray imaging”, *Phys. Med. Biol.*, pp. 519—529, 1985.