Statistical methods for cosmological parameter selection and estimation

2. INFERENCE

Inference is the method by which we translate experimental/observational information into constraints on our mathematical models. The model is a representation of the physical processes that we believe are relevant to the quantities we plan to observe. To be useful, the model must be sufficiently sophisticated as to be able to explain the data, and simple enough that we can obtain predictions for observational data from it in a feasible time. At present these conditions are satisfied in cosmology, with the best models giving an excellent representation of that data, though the computation of theoretical predictions for a representative set of models is a supercomputer class problem. Particularly valued are models which are able to make distinctive predictions for observations yet to be made, though Nature is under no obligation to behave distinctively.

The data which we obtain may be subject only to experimental uncertainty, or they may also have a fundamental statistical uncertainty due to the random nature of underlying physical processes. Both types of data arise in cosmology. For instance, the present expansion rate of the Universe (the Hubble constant), could in principle be measured to near-arbitrary accuracy with sufficiently advanced instrumentation. By contrast, the detailed pattern of cosmic microwave anisotropies, as measured by WMAP, is not believed to be predictable even in principle, being attributed to a particular random realization of quantum processes occurring during inflation [2]. Observers at different locations in the Universe see different patterns in the CMB sky, the cosmological information being contained in statistical measures of the anisotropies such as the power spectrum. Observers at any particular location, such as ourselves, can witness only our own realization and there is an inherent statistical uncertainty, cosmic variance, that we cannot overcome, but which fortunately can be modelled and incorporated in addition to measurement uncertainty.

A model will typically not make unique predictions for observed quantities; those predictions will instead depend on some number of parameters of the model. Examples of cosmological parameters are the present expansion rate of the Universe, and the densities of the various constituents such as baryons, dark matter, etc. Such parameters are not (according to present understanding, anyway) predictable from some fundamental principle; rather, they are to be determined by requiring that the model does fit the data to hand. Indeed, determining the values of such parameters is often seen as the primary goal of cosmological observations, and Chapter 3 is devoted to this topic.

At a more fundamental level, several different models might be proposed as explanations of the observational data. These models would represent alternative physical processes, and as such would correspond to different sets of parameters that are to be varied in fitting to the data. It may be that the models are nested within one another, with the more complex models positing the need to include extra physical processes in order to explain the data, or the models may be completely distinct from one another. An example of nested models in cosmology is the possible inclusion of a gravitational wave contribution to the observed CMB anisotropies. An example of disjoint models would be the rival explanations of dark energy as caused by scalar field dynamics or by a modification to the gravitational field equations. Traditionally, the choice of model to fit to the data has been regarded as researcher-driven, hopefully coming from some theoretical insight, with the model to be validated by some kind of goodness-of-fit test. More recently, however, there has been growing interest in allowing the data to distinguish between competing models. This topic, model selection or model comparison, is examined in Chapter 4.

The comparison of model prediction to data is, then, a statistical inference problem where uncertainty necessarily plays a role. While a variety of techniques exist to tackle such problems, within cosmology one paradigm dominates — Bayesian inference. This article will therefore focus almost exclusively on Bayesian methods, with only a brief account of alternatives at the end of this section. The dominance of the Bayesian methodology in cosmology sets it apart from the traditional practice of particle physicists, though there is now increasing interest in applying Bayesian methods in that context (e.g. Ref. [5]).

2.2. Bayesian inference

The Bayesian methodology goes all the way back to Thomas Bayes and his theorem, posthumously published in 1763 [6], followed soon after by pioneering work on probability by Laplace. The technical development of the inference system was largely carried out in the first half of the 20th century, with Jeffreys' textbook [7] the classic source. For several decades afterwards progress was held up due to an inability to carry out the necessary calculations, and only in the 1990s did use of the methodology become widespread with the advent of powerful multiprocessor computers and advanced calculational algorithms. Initial applications were largely in the fields of social science and analysis of medical data, the volume edited by Gilks et al. [8] being a particularly important source. The publication of several key textbooks in the early 21st century, by Jaynes [9], MacKay [10] and Gregory [11], the last of these being particularly useful for physical scientists seeking to apply the methods, cemented the move of such techniques to the mainstream. An interesting history of the development of Bayesianism is given in Ref. [12].

The essence of the Bayesian methodology is to assign probabilities to all quantities of interest, and to then manipulate those probabilities according to a series of rules, amongst which Bayes theorem is the most important. The aim is to update our knowledge in response to emerging data. An important implication of this set-up is that it requires us to specify what we thought we knew before the data was obtained, known as the prior probability. While all subsequent steps are algorithmic, the specification of the prior probability is not, and different researchers may well have different views on what is appropriate. This is often portrayed as a major drawback to the Bayesian approach. I prefer, however, to argue the opposite — that the freedom to choose priors is the opportunity to express physical insight. In any event, one needs to check that one's result is robust under reasonable changes to prior assumptions.

An important result worth bearing in mind is a theorem of Cox [13], showing that Bayesian inference is the unique consistent generalization of Boolean logic in the presence of uncertainty. Jaynes in particular sees this as central to the motivation for the Bayesian approach [9].

In abstract form, Bayes theorem can be written as

(1)

where a vertical line indicates the conditional probability, usually read as ‘the probability of B given A'. Here A and B could be anything at all, but let's take A to be the set of data D and B to be the parameter values θ (where θ is the N-dimensional vector of parameters being varied in the model under consideration), hence writing

(2)

In this expression, P(θ) is the prior probability, indicating what we thought the probability of different values of θ was before we employed the data D. One of our objectives is to use this equation to obtain the posterior probability of the parameters given the data, P(θ|D). This is achieved by computing the likelihood P(D|θ), often denoted (θ) with the dependence on the dataset left implicit.

2.3. Alternatives to Bayesian inference

The principal alternative to the Bayesian method is usually called the frequentist approach, indeed commonly a dichotomy is set up under which any non-Bayesian method is regarded as frequentist. The underpinning concept is that of sampling theory, which refers to the frequencies of outcomes in random repeatable experiments (often caricatured as picking coloured balls from urns). According to MacKay [10], the principal difference between the systems is that frequentists apply probabilities only to random variables, whereas Bayesians additionally use probabilities to describe inference. Frequentist analyses commonly feature the concepts of estimators of statistical quantities, designed to have particular sampling properties, and null hypotheses which are set up in the hope that data may exclude them (though without necessarily considering what the preferred alternative might be).

An advantage of frequentist methods is that they avoid the need to specify the prior probability distribution, upon which different researchers might disagree. Notwithstanding the Bayesian point-of-view that one should allow different researchers to disagree on the basis of prior belief, this means that the frequentist terminology can be very useful for expressing results in a prior-independent way, and this is the normal practice in particle physics.

A drawback of frequentist methods is that they do not normally distinguish the concept of a model with a fixed value of a parameter, versus a more general model where the parameter happens to take on that value (this is discussed in greater detail below in Section 4), and they find particular difficulties in comparing models which are not nested.