**7.1 The Least Squares Method**

Let us suppose that measurements at *n* points,
*x*_{i}, are made of the
variable *y*_{i} with an error
_{i} (*i* =
1, 2, . . ., *n*), and that it is desired
to fit a function *f*(*x*; *a*_{1},
*a*_{1}, . . ., *a*_{m}) to these data where
*a*_{1}, *a*_{1}, . . .,
*a*_{m}, are unknown parameters to be determined. Of
course, the number of
points must be greater than the number of parameters. The method of
least squares states that the best values of *a*_{j} are
those for which the sum

(70)

is a minimum. Examining (70) we can see that this is just the sum of
the squared deviations of the data points from the curve
*f*(*x*_{i})
weighted by the respective errors on *y*_{i}. The reader
might also
recognize this as the chi-square in (22). for this reason, the
method is also sometimes referred to as *chi-square
minimization*. Strictly speaking this is not quite correct as
*y*_{i} must
be Gaussian distributed with mean *f*(*x*_{i};
*a*_{j}) and variance _{i}^{2} in order
for *S* to be a true chi-square. However, as this is almost always the
case for measurements in physics, this is a valid hypothesis most of
the time. The least squares method, however, is totally general and
does not require knowledge of the parent distribution. If the parent
distribution is known the method of maximum likelihood may also be
used. In the case of Gaussian distributed errors this yields identical
results.

To find the values of *a*_{j}, one must now solve the
system of equations

(71)

Depending on the function *f(x)*, (71) may or may not yield on
analytic solution. In general, numerical methods requiring a computer
must be used to minimize *S*.

Assuming we have the best values for *a*_{j}, it is
necessary to estimate
the errors on the parameters. For this, we form the so-called
covariance or *error matrix*, *V*_{ij},

(72)

where the second derivative is evaluated at the minimum. (Note the
second derivatives form the inverse of the error matrix). The diagonal
elements *V*_{ij} can then be shown to be the variances for
*a*_{i}, while the
off-diagonal elements *V*_{ij} represent the covariances
between *a*_{i} and *a*_{j}.
Thus,

(73)

and so on.