Next Contents Previous

4.4 Estimators for the Gaussian Distribution

For a sample of n points, all taken from the same Gaussian distribution, the likelihood function is

Equation 45 (45)

Once again, taking the logarithm,

Equation 46 (46)

Taking the derivatives with respect to µ and sigma2 and setting them to 0, we then have

Equation 47 (47)

and

Equation 48 (48)

Solving (47) first yields

Equation 49 (49)

The best estimate of the theoretical mean for a Gaussian is thus the sample mean, which again comes as no great surprise. From the general result in (43), the uncertainty on the estimator is thus

Equation 50 (50)

This is usually referred to as the standard error of the mean. Note that the error depends on the sample number as one would expect. As n increases, the estimate xbar becomes more and more precise. When only one measurement is made, n = 1, sigma(xbar) reduces to sigma. For a measuring device, a thus represents the precision of the instrument.

For the moment, however, sigma is still unknown. Solving (48) for sigma2 yields the estimator

Equation 51 (51)

where we have replaced µ by its solution in (49). This, of course, is just the sample variance.

For finite values of n, however, the sample variance turns out to be a biased estimator, that is the expectation value of s2 does not equal the true value, but is offset from it by a constant factor. It is not hard to show, in fact, that E[s2] = sigma2 - sigma2 / n = (n - 1) sigma2 / n. Thus for n very large, s2 approaches the true variance as desired; however, for small n, sigma2 is underestimated by s2 The reason is quite simple: for small samples, the occurrence of large values far from the mean is rare, so the sample variance tends to be weighted more towards smaller values. For practical use, a somewhat better estimate therefore, would be to multiply (51) by the factor n / (n - 1),

Equation 52 (52)

Equation (52) is unbiased, however, it is no longer the best estimate in the sense that its average deviation from the true value is somewhat greater than that for (51). The difference is small however, so that (52) still provides a good estimate. Equation (52) then is the recommended formula for estimating the variance Note that unlike the mean, it is impossible to estimate the standard deviation from one measurement because of the (n - 1) term in the denominator. This makes sense, of course, as it quite obviously requires more than one point to determine a dispersion!

The variance of sigmahat2 in (52) may also be shown to be

Equation 53 (53)

and the standard deviation of sigmahat

Equation 54 (54)

Next Contents Previous