Notes on Statistics for Physicists, Revised

15. THE LEAST-SQUARES METHOD

Until now we have been discussing the situation in which the experimental result is N events giving precise values x₁, ... , x_N where the x_i may or may not, as the case may be, be all different.

From now on we shall confine our attention to the case of p measurements (not p events) at the points x₁, ... , x_p. The experimental results are (y₁ ± sigma ₁), ... ,(y_p ± sigma _p). One such type of experiment is where each measurement consists of N_i events. Then y_i = N_i and is Poisson-distributed with sigma _i = sqrt[N_i]. In this case the likelihood function is

and

We use the notation ybar ( alpha _i; x) for the curve that is to be fitted to the experimental points. The best-fit curve corresponds to alpha _i = alpha _i*. In this case of Poisson-distributed points, the solutions are obtained from the M simultaneous equations

If all the N_i >> 1, then it is a good approximation to assume each y_i is Gaussian-distributed with standard deviation sigma _i. (It is better to use Nbar _i rather than N_i for sigma _i² where Nbar _i can be obtained by integrating ybar (x) over the ith interval.) Then one can use the famous least squares method.

The remainder of this section is devoted to the case in which y_i are Gaussian-distributed with standard deviations sigma _i. See Fig. 4. We shall now see that the least-squares method is mathematically equivalent to the maximum likelihood method. In this Gaussian case the likelihood function is

(23)

where

(24)

Figure 4. ybar (x) is a function of known shape to be fitted to the 7 experimental points.

The solutions _i = _i* are given by minimizing S() (maximizing w):

(25)

This minimum value of S is called S*, the least squares sum. The values of _i which minimize are called the least-squares solutions. Thus the maximum-likelihood and least-squares solutions are identical. According to Eq. (11), the least-squares errors are

Let us consider the special case in which (_i; x) is linear in the _i:

(Do not confuse this f (x) with the f (x) on page 2.)

Then

(26)

Differentiating with respect to _j gives

(27)

Define

(28)

Then

In matrix notation the M simultaneous equations giving the least-squares solution are

(29)

is the solution for the *'s. The errors in are obtained using Eq. 11. To summarize:

(30)

Equation (30) is the complete procedure for calculating the least squares solutions and their errors. Note that even though this procedure is called curve-fitting it is never necessary to plot any curves. Quite often the complete experiment may be a combination of several experiments in which several different curves (all functions of the _i) may be jointly fitted. Then the S-value is the sum over all the points on all the curves. Note that since w(*) decreases by ½ unit when one of the _j has the value (_i* ± _j), the S-value must increase by one unit. That is,

Example 5 Linear regression with equal errors

(x) is known to be of the form (x) = ₁ + ₂x. There are p experimental measurements (y_j ± ).Using Eq. (30) we have

These are the linear regression formulas which are programmed into many pocket calculators. They should not be used in those cases where the _i are not all the same. If the _i are all equal, the errors

Example 6 Quadratic regression with unequal errors

The curve to be fitted is known to be a parabola. There are four experimental points at x = - 0.6, - 0.2, 0.2, and 0.6. The experimental results are 5 ± 2, 3 ± 1, 5 ± 1, and 8 ± 2. Find the best-fit curve.

ybar (x) = (3.685 ± 0.815) + (3.27 ± 1.96)x + (7.808 ± 4.94)x² is the best fit curve. This is shown with the experimental points in Fig. 5.

Figure 5. This parabola is the least squares fit to the 4 experimental points in Example 6.

Example 7

In example 6 what is the best estimate of y at x = 1? What is the error of this estimate?

Solution: Putting x = 1 into the above equation gives

y is obtained using Eq. 12.

Setting x = 1 gives

So at x = 1, y = 14.763 ± 5.137.

Least Squares When the y_i are Not Independent

Let

be the error matrix-of the y measurements. Now we shall treat the more general case where the off diagonal elements need not be zero; i.e., the quantities y_i are not independent. We see immediately from Eq. 11a that the log likelihood function is

The maximum likelihood solution is found by minimizing

where

Generalized least squares sum