Statistics and the Treatment of Experimental Data

4.2 The Maximum Likelihood Method

The method of maximum likelihood is only applicable if the form of the theoretical distribution from which the sample is taken is known. For most measurements in physics, this is either the Gaussian or Poisson distribution. But, to be more general, suppose we have a sample of n independent observations x₁, x₂, . . . ,x_n, from a theoretical distribution f(x | ) where theta is the parameter to be estimated. The method then consists of calculating the likelihood function,

Equation 29 (29)

which can be recognized as the probability for observing the sequence of values x₁, x₂, . . ., x_n. The principle now states that this probability is a maximum for the observed values. Thus, the parameter theta must be such that L is a maximum. If L is a regular function, theta can be found by solving the equation,

Equation 30 (30)

If there is more than one parameter, then the partial derivatives of L with respect to each parameter must be taken to obtain a system of equations. Depending on the form of L, it may also be easier to maximize the logarithm of L rather than L itself. Solving the equation

Equation 31 (31)

then yields results equivalent to (30). The solution, thetahat , is known as the maximum likelihood estimator for the parameter theta . In order to distinguish the estimated value from the true value, we have used a caret over the parameter to signify it as the estimator.

It should be realized now that thetahat is also a random variable, since it is a function of the x_i. If a second sample is taken, will have a different value and so on. The estimator is thus also described by a probability distribution. This leads us to the second half of the estimation problem: What is the error on ? This is given by the standard deviation of the estimator distribution We can calculate this from the likelihood function if we recall that L is just the probability for observing the sampled values x₁, x₂,. . ., x_n. Since these values are used to calculate thetahat , L is related to the distribution for . Using (9), the variance is then

Equation 32 (32)

This is a general formula, but, unfortunately, only in a few simple cases can an analytic result be obtained. An easier, but only approximate method which works in the limit of large numbers, is to calculate the inverse second derivative of the log-likelihood function evaluated at the maximum,

Equation 33 (33)

If there is more than one parameter, the matrix of the second derivatives must be formed, i.e.,

Equation 34 (34)

The diagonal elements of the inverse matrix then give the approximate variances,

Equation 35 (35)

A technical point which must be noted is that we have assumed that the mean value of is the theoretical theta . This is a desirable, but not essential property for an estimator, guaranteed by the maximum likelihood method only for infinite n. Estimators which have this property are non-biased. We will see one example in the following sections in which this is not the case. Equation (32), nevertheless, remains valid for all thetahat , since the error desired is the deviation from the true mean irrespective of the bias.

Another useful property of maximum likelihood estimators is invariance under transformations. If u = f( theta ), then the best estimate of u can be shown to be uhat = f( thetahat ).

Let us illustrate the method now by applying it to the Poisson and Gaussian distributions.