Practical Statistics for Astronomers II

3.2 The Maximum-Likelihood (ML) Method

The maximum-likelihood method also has a long history: it was derived by Bernoulli in 1776 and by Gauss around 1821, and worked out in detail by Fisher in 1912.

Consider the probability density function F (x; alpha ), x a random variable, alpha a single parameter characterizing the known form of F. We want to estimate alpha . Let x₁, x₂, . . , x_N be a random sample of size N, the x_i independent and drawn from the same population. Then the so-called ``likelihood function'' is the joint probability density function

Equation 8

This is the probability, given alpha , of obtaining the observed set of results. The maximum-likelihood estimator (MLE) of alpha is baralpha = (that value of alpha that maximizes L ( alpha ) for all variations of alpha ), i.e.

Equation 9

Note that traditionally it is ln L that is maximized. It should be pointed out that the maximum of the function cannot always be determined by this method - finding the ML time of occurrence of a singular observed event is a case in point.

The MLE is a statistic with many highly desirable properties - it is efficient, usually unbiased, it has minimum variance, and it is ~ Normally distributed.

If the residuals are Normally distributed, then minimizing the sum of squares (Section 3.1) is the maximum-likelihood method.

By way of example, Jauncey (1967) showed that ML was an excellent way of estimating the slope of the number - flux-density relation for extragalactic radio sources, and this particular application has made the technique familiar to astronomers. The source count is assumed to be of the power-law form

Equation 10

where N is the number of sources on a particular patch of sky with flux densities greater than S, k is a constant and alpha is the exponent, or slope in the log N-log S plane, which we wish to estimate. If we consider M sources with flux densities S in the range S₀ to S_max, then a straightforward application of the ML procedure above yields the following likelihood function:

Equation 11

where

Equation 12

and

Equation 13

Differentiation of this with respect to alpha then yields the equation from which baralpha , the MLE of alpha , is obtained:

Equation 14

However, with a computer handy it is simplest to forget the differentiation and to evaluate L ( alpha ) over a wide range of alpha , small Delta alpha . Maximum L yields baralpha , while a good estimate of the standard deviation in is obtained from the two values of alpha that cause L to drop by e^1/2 from its maximum, the factor e^1/2 because the asymptotic distribution of L is Gaussian. For large M and b (Jauncey 1967),

Equation 15

This application of ML makes optimum use of the data in that the sources are not grouped and the loss of power that always results from binning is avoided.

To return to general considerations, after the ML estimate has been obtained, it is essential to perform a final check - does the MLE model fit the data reasonably? If it does not then the data are erroneous when the model is known to be right; conversely, the adopted or assumed model is wrong. There are many ways of carrying out such a check: two of these, the chisquare test and the Kolmogorov-Smirnov test, are described in Section 4.2.

The ML procedure may be generalized to obtain simultaneous MLEs of several parameters:

Equation 16

and the solution of these simultaneous equations yields the MLE baralpha _i.

It is important to emphasize the ML principle, which is that when confronted with the choice of hypotheses, choose that which maximizes L, i.e. the one giving the highest probability to the observed event. This sounds reasonable, but in fact the proofs of certain theorems (see e.g. Martin 1971) concerning the ``goodness'' of MLEs are required to justify the procedure.