**3.2 The Maximum-Likelihood (ML) Method**

The maximum-likelihood method also has a long history: it was derived by Bernoulli in 1776 and by Gauss around 1821, and worked out in detail by Fisher in 1912.

Consider the probability density function *F* (*x*; ), *x* a random
variable, a single parameter
characterizing the known form of *F*. We
want to estimate . Let
*x*_{1}, *x*_{2}, . . , *x _{N}*
be a random sample of size

This is the probability, given , of obtaining the *observed* set of
results. The maximum-likelihood estimator (MLE) of is = (that
value of that maximizes
*L* () for all variations
of ), i.e.

Note that traditionally it is ln *L* that is maximized. It should be
pointed out that the maximum of the function cannot always be
determined by this method - finding the ML time of occurrence of a
singular observed event is a case in point.

The MLE is a statistic with many highly desirable properties - it is efficient, usually unbiased, it has minimum variance, and it is ~ Normally distributed.

If the residuals are Normally distributed, then minimizing the sum of squares (Section 3.1) is the maximum-likelihood method.

By way of example, Jauncey (1967) showed that ML was an excellent way of estimating the slope of the number - flux-density relation for extragalactic radio sources, and this particular application has made the technique familiar to astronomers. The source count is assumed to be of the power-law form

where *N* is the number of sources on a particular patch of sky with
flux densities greater than *S*, *k* is a constant and is the exponent,
or slope in the log *N*-log *S* plane, which we wish to
estimate. If we
consider M sources with flux densities *S* in the range
*S*_{0} to *S*_{max}, then
a straightforward application of the ML procedure above yields the
following likelihood function:

where

and

Differentiation of this with respect to then yields the equation from which , the MLE of , is obtained:

However, with a computer handy it is simplest to forget the
differentiation and to evaluate *L* () over a wide range of , small .
Maximum *L* yields ,
while a good estimate of the standard deviation in
is obtained
from the two
values of that cause *L*
to drop by e^{1/2}
from its maximum, the factor e^{1/2} because the asymptotic
distribution of *L* is Gaussian. For large *M* and *b*
(Jauncey 1967),

This application of ML makes optimum use of the data in that the sources are not grouped and the loss of power that always results from binning is avoided.

To return to general considerations, after the ML estimate has been obtained, it is essential to perform a final check - does the MLE model fit the data reasonably? If it does not then the data are erroneous when the model is known to be right; conversely, the adopted or assumed model is wrong. There are many ways of carrying out such a check: two of these, the chisquare test and the Kolmogorov-Smirnov test, are described in Section 4.2.

The ML procedure may be generalized to obtain simultaneous MLEs of several parameters:

and the solution of these simultaneous equations yields the MLE _{i}.

It is important to emphasize the ML principle, which is that when
confronted with the choice of hypotheses, choose that which maximizes
*L*, i.e. the one giving the highest probability to the *observed
event*. This sounds reasonable, but in fact the proofs of certain
theorems (see e.g.
Martin 1971)
concerning the ``goodness'' of MLEs are
required to justify the procedure.