Adapted from P. Coles, 1999, The Routledge Critical Dictionary of the New Cosmology, Routledge Inc., New York. Reprinted with the author's permission. To order this book click here:

The testing of theories of structure formation using observations of the large-scale structure of the distribution of galaxies requires a statistical approach. Theoretical studies of the problem of structure formation generally consist of performing numerical N-body simulations on powerful computers. Such simulations show how galaxies would form and cluster according to some well-defined assumptions about the form of primordial density fluctuations, the nature of any dark matter and the parameters of an underlying cosmological model, such as the density parameter and Hubble constant. The simulated Universe is then compared with observations, and this requires a statistical approach: the idea is to derive a number (a `statistic') which encapsulates the nature of the spatial distribution in some objective way. If the model matches the observations, the statistic should have the same numerical value for both model and reality. It should always be borne in mind, however, that no single statistic can measure every possible aspect of a complicated thing like the distribution of galaxies in space. So a model may well pass some statistical tests, but fail on others which might be more sensitive to particular aspects of the spatial pattern. Statistics therefore can be used to reject models, but not to prove that a particular model is correct.

One of the simplest (and most commonly used) statistical methods appropriate for the analysis of galaxy clustering observations is the correlation function or, more accurately, the two-point correlation function. This measures the statistical tendency for galaxies to occur in pairs rather than individually. The correlation function, usually denoted by xi (r), measures the number of pairs of galaxies found at a separation r compared with how many such pairs would be found if galaxies were distributed at random throughout space. More formally, the probability of finding two galaxies in small volumes dV1 and dV2 separated by a distance r is defined to be be

dP = n2 (1 + xi (r)) dV1 dV2

where n is the average density of galaxies per unit volume. A positive value of xi (r) thus indicates that there are more pairs of galaxies with a separation r than would occur at random; galaxies are then said to be clustered on the scale r. A negative value indicates that galaxies tend to avoid each other; they are then said to be anticlustered. A completely random distribution, usually called a Poisson distribution, has xi (r) = 0 for all values of r.

Estimates of the correlation function of galaxies indicate that xi (r) is a power-law function of r:

xi (r) approx (r/r0)-1.8

where the constant r0 is usually called the correlation length. The value of r0 depends slightly on the type of galaxy chosen, but is around 5 Mpc for bright galaxies. This behaviour indicates that these galaxies are highly clustered on scales of up to several tens of millions of light years in a roughly fractal pattern. On larger scales, however, xi (r) becomes negative, indicating the presence of large voids (see large-scale structure). The correlation function xi (r) is mathematically related to the power spectrum P(k) by a Fourier transformation; the function P(k) is also used as a descriptor of clustering on large scales.

FURTHER READING: Peebles, P.J.E., The Large-Scale Structure of the Universe (Princeton University Press, Princeton, 1980).