Practical Statistics for Astronomers I

2. DEFINITIONS

A Statistician: is ``a man prepared to estimate the probability that the Sun will rise tomorrow in the light of its past performance'' (3). This definition, which I admit to taking out of context, is by an eminent statistician; it may confirm our worst suspicions about statisticians eminent or otherwise. At the very least it should emphasize care of definition - what is tomorrow when the Sun does not rise? Let us be more careful:

``Statistics'' is a term loosely used to describe both the science and the values. In fact, the science is

Statistical Inference: the determination of properties of the population from the sample,

while

Statistics are values (usually, but not necessarily, numerical) determined from some or all of the values of a sample.

Good statistics are those from which our conclusions concerning the population are stable from sample to sample, while good samples provide good statistics, and require appropriate design of experiment.

The ``goodness'' of the experiment, the sample, or the statistic is indicated by the

Level of Significance: suppose we perform an experiment to distinguish between two rival hypotheses, the null hypothesis (H₀; ``failure'', no result, no detection, no correlation) and its alternative (H₀; ``success'', etc.). Before the experiment we make ourselves very familiar with ``failure'' by determining, assuming H₀ to be true, the set of all possible values of the statistic under test, the statistic we have chosen to use in deciding between H₀ and H₁. Furthermore, suppose that when we do the experiment we obtain a value of the statistic which is ``unusual'' in comparison with this set, so unusual that, say, only 1 per cent of all values computed under the H₀ hypothesis are so extreme. We can then reject H₀ in favour of H₁ at the 1 per cent level of significance. The level of significance is thus the probability of rejecting H₀ when it is, in fact, true.

Now consider N values of x_i where i = 1, 2 . . . N and x may have a continuous or a discrete distribution. The following definitions are general:

1. Location measures

Equation 1

Median: arrange x_i according to size; renumber. Then

x_med = x_j where j = N / 2 + 0.5, N odd
= 1/2 (x_j + x_j+1 where j = N / 2, N even.

Mode: x_mode is the value of x_i occurring most frequently.

2. Dispersion measures

Equation 2

Equation 3

Equation 4

3. Moments

Equation 5

(Moments may be taken about any value of x; those about the arithmetic mean as above are termed central moments.) Note that µ₂ = sigma ²; this and the next few moments characterize a probability distribution (defined below). The first two moments are useless, since µ₀ ident 1 and µ ident 0.

Skewness: beta ₁ = µ₃² / µ₂³ indicates deviation from symmetry;
= 0 for symmetry about µ.

Kurtosis: beta ₂ = µ₄ / µ₂² indicates degree of peakiness;
= 3 for Normal distribution.

Finally, consider probability distributions: if x is a continuous random variable, then f (x) is its probability density function if it meets these conditions:

(1) Probability
(2) and
(3) f (x) is a single-valued non-negative number for all real x. The corresponding distribution function is

Equation 8

Table 1. The common probability distributions.

Distribution Density function Mean Variance Raison d'Être

Uniform (a+b)/2 (b-a)/12
In the study of rounding errors, and as a tool in theoretical studies of other continuous distributions.
Binomial np npq
x is the number of ``successes'' in an experiment with two possible outcomes, one (``success'') of probability p, and the other (``failure'') of probability q = 1 - p. Becomes a Normal distribution as n -> infty .
Poisson µ µ
The limit for the Binomial distribution as p << 1, setting µ ident np. It is the ``count-rate'' distribution, e.g. take a star from which an average of µ photons are received per Delta t (out of a total of n emitted; p << 1); the probability of receiving x photons in a Delta t is f (x;µ). Tends to the Normal distribution as µ -> infty .
Normal (Gaussian) µ sigma ²
The essential distribution; see text. Central Limit Theorem ensures that majority of ``scattered things'' are dispersed according to f (x;µ, sigma ).
Chi-square 2
Vital in the comparison of samples, model testing; characterizes the dispersion of observed samples from the expected dispersion, because if x_i is a sample of variables Normally and independently distributed with means µ_i and variances sigma _i, then obeys f ( chi ²; ) Invariably tabulated and used in integral form. Tends to Normal distribution as -> infty .
Student t 0 /(-2)
(for > 2)
For comparison of means, Normally-distributed populations; if n x_is are taken from a Normal population (µ, sigma ), and if x_s and sigma _s are found as in text, then t = is distributed as f (t,) where ``degrees of freedom'' = n - 1. Statistic t can also be formulated to compare means for samples from Normal populations with same sigma , different µ (4). Tends to Normal as -> infty .
F
₂/(₂-2)
(for ₂ > 2)
For comparison of two variances, or of more than two means; if two statistics ( chi ₁ and chi ₂) each follow the Chi-square distribution, then is distributed as f(F;₁,₂). Care required in application; see (4), (9).

**Table 1.** *The common probability distributions*.

Distribution	Density function	Mean	Variance	Raison d'Être

Uniform		(a+b)/2	(b-a)/12	In the study of rounding errors, and as a tool in theoretical studies of other continuous distributions.
Binomial		np	npq	x is the number of ``successes'' in an experiment with two possible outcomes, one (``success'') of probability p, and the other (``failure'') of probability q = 1 - p. Becomes a Normal distribution as n -> .
Poisson		µ	µ	The limit for the Binomial distribution as p << 1, setting µ np. It is the ``count-rate'' distribution, e.g. take a star from which an average of µ photons are received per t (out of a total of n emitted; p << 1); the probability of receiving x photons in a t is f (x;µ). Tends to the Normal distribution as µ -> .
Normal (Gaussian)		µ	²	The essential distribution; see text. Central Limit Theorem ensures that majority of ``scattered things'' are dispersed according to f (x;µ,).
Chi-square			2	Vital in the comparison of samples, model testing; characterizes the dispersion of observed samples from the expected dispersion, because if x_i is a sample of variables Normally and independently distributed with means µ_i and variances _i, then obeys f (²; ) Invariably tabulated and used in integral form. Tends to Normal distribution as -> .
Student t		0	/(-2) (for > 2)	For comparison of means, Normally-distributed populations; if n x_is are taken from a Normal population (µ,), and if x_s and _s are found as in text, then t = is distributed as f (t,) where ``degrees of freedom'' = n - 1. Statistic t can also be formulated to compare means for samples from Normal populations with same , different µ (4). Tends to Normal as -> .
F		₂/(₂-2) (for ₂ > 2)		For comparison of two variances, or of more than two means; if two statistics (₁ and ₂) each follow the Chi-square distribution, then is distributed as f(F;₁,₂). Care required in application; see (4), (9).

Probability densities and distribution functions may be similarly defined for sets of discrete values x = x₁, x₂. . . .x_n, and for multivariate distributions. The better-known (continuous) functions appear in Table 1, together with location and dispersion measures - note that the previous definitions for these may be written in integral form for continuous distributions. The table includes some indication of how and/or where each distribution arises, and for most of them, I avoid further discussion. But there is one whose rôle is so fundamental that it cannot be treated in such a cavalier manner. This follows in the next section.

x_med	= x_j where j = N / 2 + 0.5, N odd
	= 1/2 (x_j + x_j+1 where j = N / 2, N even.

Skewness:	₁ = µ₃² / µ₂³ indicates deviation from symmetry;
	= 0 for symmetry about µ.

Kurtosis:	₂ = µ₄ / µ₂² indicates degree of peakiness;
	= 3 for Normal distribution.