The Techniques of Least Squares and Stellar Photometry with CCDs

Problem 2: Unequal errors

In the foregoing discussion, I assumed that all of the individual y_i values had precisely the same typical expected error, quantified by the so-called "standard error," which is defined as the Gaussian sigma in the probability distribution for epsilon . (Note that the standard error is sometimes referred to as the "mean error" - standard error and mean error mean the same thing. They are not the same thing as the "probable error," which you will sometimes see mentioned in older books and papers. The probable error is defined as the half-length of a 50% confidence interval: when you give a numerical value for your estimate of some physical quantity and quote a probable error, you are saying that you think there is a 50% chance that the true value of that quantity is within the stated error bars. The standard error = the mean error is the half-length of a 68.3% confidence interval: when you quote a standard or mean error, you are saying that you think there's slightly over a two-thirds chance that the true value is contained within your error bars. This latter is the current standard astronomical convention.)

In real life, it is commonly the case that the individual observations have different known or estimated standard errors, sigma _i. This situation is nearly as easy to deal with as the case of equal errors. We now write the Gaussian function as

Equation 18

Equation 19

where I have been very careful to keep the individual sigma 's throughout. Let us define the weight of an observation as w_i ident = 2 s² / sigma _i², where s² is just some arbitrary constant that you can pull out of a hat; I have included it for generality and I have written it as s² to emphasize that it should be a positive constant. And furthermore, . . . well, wait just a bit. Our conditions for a minimum of chi ² are:

Equation 20

You can see now that the specific value that you adopt for the arbitrary constant s² doesn't matter at all - since the summations are going to be set equal to zero anyway, whatever value of s² you use, it can be pulled out of the summations and the equations are still true. In matrix form we now have

Equation 21

or, in algebraic form,

Equation 22

In this case,

Equation 23

If you have used correct values for all the sigma _i², then (1 / (N - 2) sum sigma _i² / sigma _i²) is a so-called "chi-squared" variable with an expected value of unity; it will equal unity more and more precisely for larger and larger sample sizes, N. Thus, if the sigma _i are correct, after you have performed your least-squares fit you should wind up with m.e.1 approx s. Recalling that s is by definition the sigma of an observation of weight 1 (since w ident s² / sigma ²), we can now see why m.e.1 is called the "mean error of unit weight": it is the mean error, or the correct value of sigma , corresponding to a data point with w = 1. If we are uncertain whether our assumed values of the sigma _i are correct, we can use the derived m.e.1 as a guide. We start off by setting s ident 1. Then, if our values of sigma are correct, the derived m.e.1 should come out to have a value near 1.0. If, on the other hand, the m.e.1 comes out with a value near 2.0, we would suspect that we have underestimated our errors by a factor of two. On the other hand, in many cases we do not know the true errors of all our observations, but we have a good handle on their relative errors: we may know that observation number 2 has a sigma twice as large as observation number 1, while not knowing what sigma ₁ and sigma ₂ are, really. In this case, we can arbitrarily assign observation 1 unit weight, and observation 2 weight 1/4 (since weight propto sigma ^-2. In this case m.e.1 will not come out to unity, it will come out to an estimate of what sigma ₁ ("the mean error of an observation of unit weight") actually should have been.