Statistics and the Treatment of Experimental Data

7.1 The Least Squares Method

Let us suppose that measurements at n points, x_i, are made of the variable y_i with an error sigma _i (i = 1, 2, . . ., n), and that it is desired to fit a function f(x; a₁, a₁, . . ., a_m) to these data where a₁, a₁, . . ., a_m, are unknown parameters to be determined. Of course, the number of points must be greater than the number of parameters. The method of least squares states that the best values of a_j are those for which the sum

Equation 70 (70)

is a minimum. Examining (70) we can see that this is just the sum of the squared deviations of the data points from the curve f(x_i) weighted by the respective errors on y_i. The reader might also recognize this as the chi-square in (22). for this reason, the method is also sometimes referred to as chi-square minimization. Strictly speaking this is not quite correct as y_i must be Gaussian distributed with mean f(x_i; a_j) and variance sigma _i² in order for S to be a true chi-square. However, as this is almost always the case for measurements in physics, this is a valid hypothesis most of the time. The least squares method, however, is totally general and does not require knowledge of the parent distribution. If the parent distribution is known the method of maximum likelihood may also be used. In the case of Gaussian distributed errors this yields identical results.

To find the values of a_j, one must now solve the system of equations

Equation 71 (71)

Depending on the function f(x), (71) may or may not yield on analytic solution. In general, numerical methods requiring a computer must be used to minimize S.

Assuming we have the best values for a_j, it is necessary to estimate the errors on the parameters. For this, we form the so-called covariance or error matrix, V_ij,

Equation 72 (72)

where the second derivative is evaluated at the minimum. (Note the second derivatives form the inverse of the error matrix). The diagonal elements V_ij can then be shown to be the variances for a_i, while the off-diagonal elements V_ij represent the covariances between a_i and a_j. Thus,

Equation 73 (73)

and so on.