Practical Statistics for Astronomers I

3. THE NORMAL (GAUSSIAN) DISTRIBUTION

for which the density function is

Equation 9

Fig. 1 shows the well-known appearance of this function, and of its integral, the corresponding distribution function. Both functions are tabulated in Table II. Because of the choice of symbols, the mean = µ the variance = sigma ², and the standard deviation = sigma .

Fig. 1. The Normal distribution. The probability density function (solid curve) is

Equation 10

and the distribution function (dotted curve) is

Equation 11

50 per cent of the area lies between ± pe (probable error). See Tables II and III.

Let us draw a sample of N x_is from a population of xs which we believe to be Normally distributed - the basis for such a belief can wait for a paragraph or so. How do we estimate the population mean µ and variance sigma ² ? Not necessarily by applying the definitions of Section 2 to our bunch of x_is, because these may not result in unbiased estimates. In fact, it may be shown (e.g. (4)) that for a Normally-distributed population, xbar _s, the arithmetic mean of the sample, is indeed an unbiased estimate of µ, but the unbiased estimate of sigma ² is

Equation 12

which differs from the definition of sigma ² given earlier by the factor (N / (N - 1)). This factor can give rise to confusion: sigma _s² is usually referred to as the sample variance, while the sigma ² defined in Section 2 is called the population variance, and, of course, sigma _s² -> sigma ² as N -> infty . (That there should be a difference is easy to understand: the x_i of our sample are first used to get xbar _s, an estimate of µ, and although this is an unbiased estimate of µ it is the estimate which yields a minimum value for the sum of the squares of the deviations of the sample, and thus a low estimate of the variance. The theory provides the appropriate correction factor, namely N / (N - 1).) The standard deviation (error) on xbar _s, our estimate of µ, is sigma _s / sqrt N. Moreover, if we have n estimates of µ, namely xbar _j, each having an associated error sigma _j, the best estimate of µ is the weighted mean, i.e.

Equation 13

where the weights are given by

Equation 14

the reciprocals of the sample variances. The best estimate of the variance of xbar _w is

Equation 15

But to return to the Normal distribution itself; why is it Magic? Why might we expect populations, data sets, measurements, etc., to be so arranged? We know from experience that they frequently are. One obvious reason for the prominence of the Normal distribution is that two distributions frequently encountered in Nature, namely the Binomial and Poisson (see Table I), each tend to it in their respective limits n -> infty and µ -> infty . But the real answer lies in what is perhaps the most important theorem in Statistical Inference, Measurement Theory, and Experimental Design, namely (1), (4),

The Central Limit Theorem: Suppose that independent random variables X_i of UNKOWN probability density function are identically distributed with mean µ and variance ² (both finite). Take a sample of n of these variables; as n becomes large, the distribution of the mean of the sample tends to a Normal distribution with mean µ, while the sample variance tends to ² / n.

The tendency of the Binomial and Poisson distributions to the Normal is simply a particular instance of the operation of this miraculous theorem. Indeed, the theorem applies to both discrete and continuously distributed variables, and under certain weak conditions it is even possible to relax the requirements of independence and of identical distribution for the X_i; Normal distributions for means (and even for other linear combinations of the variables) still result.

The Central Limit Theorem thus allows quantitative probabilities to be extimated in experimental situations where the form of the underlying probability distribution is unknown. In particular, the individual members of many data sets which we acquire are the result of some kind of averaging - even ``eyeball integration'' Counts! - and in such circumstances the Central Limit Theorem says that the data tend to a Normal distribution. As a result, we can (and do) estimate population parameters µ and sigma ² from single samples, or from combining the results from different samples, in the manner described above, happy in the thought that the Central Limit Theorem has (probably) justified the procedure. For example, we may wish to determine an intensity from several measurements of it; the best estimate is then xbar _s ± sigma _s as given above. Alternatively, given several independent estimates of the intensity, xbar _j ± sigma _j, the best overall estimate ⁽²⁾ is xbar _w ± sigma _w, as above.

But the Central Limit Theorem cannot always yield us a Normal distribution, as we shall see in discussing signal detection. Indeed, there is a caveat before we get there. Because of the prominence of the Normal distribution it does provide a useful way of describing errors, dispersions, or differences, which is intuitively very acceptable. We use the Normal distribution for this purpose by quoting errors or differences in terms of the standard deviation, which for a Normal distribution is the root-mean-square deviation ``corrected'' by the factor sqrt [N / (N - 1)]; for the Normal distribution, the region ( xbar - sigma ) < x < ( xbar + sigma ) contains 68.3 per cent of the area, i.e. 68.3 per cent of the ``expected results''. The caveat is that the Normal distribution has very short tails - it dies rapidly beyond ± 2 sigma , as Fig. 1 shows. Consider, for example, the comparison of two independent sets of (fictitious) right ascension measurements as shown in Fig. 2. If the differences were Normally distributed, a valid procedure is to calculate their mean to determine the systematic difference, and then sigma ²_diff, the sample variance, = (N / (N - 1)) times the mean-square deviation. Suppose that sigma ₁² and sigma ₂² are the variances known for each of the two independent sets of right ascension; then the expected variance in the differences (see, e.g. (9)) is sigma ₁² + sigma ₂², and the computed sigma ²_diff may be compared with this, perhaps to verify the estimates of the individual variances. However, Fig. 2 shows a few wild differences, of the type which inevitably seem to occur in comparisons of this sort, and which extend the tails far beyond those of a Normal distribution. (Such differences may arise from computational blunders or from some disturbance of the underlying probability distributions; see Section 4.) Moreover, note what happens if we compute our estimates of mean and variance as for Normal distributions - these quantities are completely dominated by the wayward values, and the Normal distribution which they imply (the dashed curve) fits the data very poorly. But we can still describe these in terms of a Normal distribution if we are careful. One way is to reject the large and offensive values from our (µ, sigma ) estimates; but this may not be possible, and is undoubtedly dangerous. A safer method is to use the median as a location measure, and to adopt as a dispersion measure the ± Delta (after removing the offset) within which 68.3 per cent of the values lie. This latter statistic is quite commonly used, and, of course, for the Normal distribution it corresponds exactly to ± 1 sigma .

Fig. 2. A histogram of differences between two fictitious sets of right ascension measurements. The dashed curve is the Normal curve ``describing'' the distribution on the basis of a straightforward calculation of xbar _s and sigma . A better description is obtained when x_m, the median value, is used as the location measure and the ± Delta containing 68.3 per cent of the data as the dispersion measure; the Normal curve using these parameters as measures of xbar and sigma is shown dotted.

In this latter respect, beware of probable error. This is the ± within which 50 per cent (not 68.3 per cent) of the points lie, It is sometimes used with justification when the errors in an experiment are not Normally distributed. However, it is smaller than (~ 0.7 ) and therefore its use provides cosmetic improvement of results. The temptation proves too much for some observers.

² If it is known a priori that the ``real'' frequency distribution of intensities is not uniform (Table I, then it may be argued that xbar

_w is not unbiased even though the errors obey a Normal distribution precisely. Appropriate correction made be made (5). This is the edge of an argument between two types of statisticians, the Bayesians and the Others, which has raged for 200 years. Let us skirt it, for the moment at least. Back.