Bayes in the sky: Bayesian inference and model selection in cosmology

5. COSMOLOGICAL PARAMETER INFERENCE

Driven by the emergence of inexpensive sensors and computing capabilities, the amount of cosmological data has been increasing exponentially over the last 15 years or so. For example, the first map of cosmic microwave background (CMB) anisotropies obtained in 1992 by COBE [77] contained ~ 10³ pixels, which became ~ 5 × 10⁴ by 2002 with CBI [78, 79]. Current state-of-the art maps (from the WMAP satellite [80]) involve ~ 10⁶ pixels, which are set to grow to ~ 10⁷ with Planck in the next couple of years. Similarly, angular galaxy surveys contained ~ 10⁶ objects in the 1970's, while by 2005 the Sloan Digital Sky Survey [81] had measured ~ 2 × 10⁸ objects, which will increase to ~ 3 × 10⁹ by 2012 when the Large Synoptic Survey Telescope [82] comes online ⁷.

This data explosion drove the adoption of more efficient map making tools, faster component separation algorithms and parameter inference methods that would scale more favourably with the number of dimensions of the problem. As data sets have become larger and more precise, so has grown the complexity of the models being used to describe them. For example, if only 2 parameters could meaningfully be extracted from the COBE measurement of the large-scale CMB temperature power spectrum (namely the normalization and the spectral tilt [77, 83]), the number of model parameters had grown to 11 by 2002, when smaller-scale measurements of the acoustic peaks had become available. Nowadays, parameter spaces of up to 20 dimensions are routinely considered.

This section gives an introduction to the broad problem of cosmological parameter inference and highlights some of the tools that have been introduced to tackle it, with particular emphasis on innovative techniques. This is a vast field and any summary is bound to be only sketchy. We give throughout references to selected papers covering both historically important milestones and recent major developments.

5.1. The "vanilla" CDM cosmological model

Before discussing the quantities we are interested in measuring in cosmology (the "cosmological parameters") and some of the observational probes available to do so, we briefly sketch the general framework which goes under the name of "cosmological concordance model". Because it is a relatively simple scenario containing both a cosmological constant ( Lambda ) and cold dark matter (CDM) (more about them below), it is also known as the "vanilla" Lambda CDM model.

Our current cosmological picture is based on the scenario of an expanding Universe, as implied by the observed redshift of the spectra of distant galaxies (Hubble's law). This in turn means that the Universe began from a hot and dense state, the initial singularity of the Big Bang. The existence of the cosmic microwave background lends strong support to this idea, as it is interpreted as the relic radiation from the hotter and denser primordial era. The expanding spacetime is described by Einstein's general relativity. The cosmological principle states that the Universe is isotropic (i.e., the same in all directions) and homogeneous (the same everywhere). If follows that an isotropically expanding Universe is described by the so-called Friedmann-Robertson-Walker metric,

(42)

where kappa defines the geometry of spatial sections (if kappa is positive, the geometry is spherical; if it is zero, the geometry is flat; if it is negative, the geometry is hyperbolic). The scale factor a(t) describes the expansion of the Universe, and it is related to redshift z by

(43)

where a(t₀) is the scale factor today and a(t) the scale factor at redshift z. The relation between redshift and comoving distance r is obtained from the above metric via the Friedmann equation, and is given by

(44)

where H₀ is the Hubble constant, and Omega _x are the cosmological parameters describing the matter-energy content of the Universe. Standard parameters included in the vanilla model are neutrinos ( Omega , with a mass ltapprox 1 eV), photons ( Omega ), baryons ( Omega _b), cold dark matter ( Omega _cdm) and a cosmological constant ( Omega ). The curvature term Omega is included for completeness but is currently not required by the standard cosmological model (see section 6). The comoving distance determines the apparent brightness of objects, their apparent size on the sky and the number density of objects in a comoving volume. Hence measurements of the brightness of standard candles, of the length of standard rulers or of the number density of objects at a given redshift leads to the determination of the cosmological parameters in Eq. (44) (see next section).

The currently accepted paradigm for the generation of density fluctuations in the early Universe is inflation. The idea is that quantum fluctuations in the primordial era were stretched to cosmological scales by an initial period of exponential expansion, called "inflation", possibly driven by a yet unknown scalar field. This increased the scale factor by about 26 orders of magnitude within about 10^-32s after the Big Bang. Although presently we have only indirect evidence for inflation, it is commonly accepted that such a short burst of exponential growth in the scale factor is required to solve the horizon problem, i.e. to explain why the CMB is so highly homogeneous across the whole sky. The quantum fluctuations also originated temperature anisotropies in the CMB, whose study has proved to be one of the powerhouses of precision cosmology. From the initial state with small perturbations imprinted on a broadly uniform background, gravitational attraction generated the complex structures we see in the modern Universe, as indicated both by observational evidence and highly sophisticated computer modelling.

Of course it is possible to consider completely different models, based for example on alternative theories of gravity (such as Bekenstein's theory [84, 85] or Jordan-Brans-Dicke theory [86]), or on a different way of comparing model predictions with observations [87 - 89]. Discriminating among models and determining which model is in best agreement with the data is a task for model comparison techniques, whose application to cosmology is discussed section 6. Here we will take the vanilla Lambda CDM model as our starting point for the following considerations on cosmological parameters and how they are measured.

5.2. Cosmological observations

The discovery of temperature fluctuations in the Cosmic Microwave Background (CMB) in 1992 by the COBE satellite [77] marked the beginning of the era of precision cosmology. Many other observations have contributed to the impressive development of the field in less than 20 years. For example, around 1990 the picture of flat Universe with both cold dark matter and a positive cosmological constant was only beginning to emerge, and only thanks to the painstakingly difficult work of gluing together several fairly indirect clues [90]. At the time of writing (January 2008), the total density is known with an error of order 1%, and it is likely that this uncertainty will be reduced by another two orders of magnitude in the mid-term [91]. The high accuracy of modern precision cosmology rests on the combination of several different probes, that constrain the physical properties of the Universe at many different redshifts.

Cosmic microwave background (CMB): CMB anisotropies offer a snapshot of the Universe at the time of recombination, about 380,000 years after the Big Bang, at a redshift z ≈ 1100. As described above, the temperature differences measured in the CMB arise from quantum fluctuations during the inflationary phase. Their usefulness lies in the fact that they are small ( Delta

T / T ~ 10^-5) and hence linear perturbation theory is mostly sufficient to predict very accurately their statistical distribution. The 2-point correlation function of the anisotropies is usually described via its Fourier transform, the angular power spectrum, which presents a series of characteristic peaks called acoustic oscillations, see e.g. [92, 93]. Their structure depends in a rich way on the cosmological parameters introduced in Eq. (44), as well as on the initial conditions for the perturbations emerging from the inflationary era (see e.g. [94, 95, 96] for further details). The anisotropies are polarized at the level of 1%, and measuring accurately the information encoded by the weak polarization signal is the goal of many ongoing observations. State-of-the art measurements are described in e.g. [80, 97, 98, 99, 100, 101]. An example of recent measurements of the temperature power spectrum is shown in Figure 4. Later this year, the Planck satellite is expected to start full-sky, high-resolution observations of both temperature and polarization.

Figure 4. State-of-the-art cosmic microwave background temperature power spectrum measurements along with the best-fit Lambda CDM model (solid line), showing data from WMAP 3-yr [80], the Boomerang 2003 flight [101] and ACBAR [97] (from [97]).

Large scale structures (LSS): the correlation function among galaxies gives an estimate of the correlation properties of the underlying dark matter distribution, up to a bias factor relating the dark matter to the baryon distribution. Current data typically extend out to z~ 0.7. The resulting power spectrum (recent data are shown in Figure 5) depends mainly on the ratio of the radiation to matter energy density, on the initial distribution of the perturbations with scale (spectral index) and on the overall normalization (which can be extracted once a bias model is specified). Heavy numerical simulation is nowadays used to model accurately small scales, where non-linear effects become dominant. The tool of choice to measure the power spectrum on small scales is becoming the observations of absorption lines from neutral hydrogen clouds, the so-called Lyman- forest [102], although concerns remain about the reliability of the theoretical modelling of non-linear effects. Recently, both the Sloan Digital Sky Survey [103] and the 2dF Galaxy Redshift Survey [104] have detected the presence of baryonic acoustic oscillations, which appear as a bump in the galaxy-galaxy correlation function corresponding to the scale of the acoustic oscillations in the CMB. The physical meaning is that galaxies tend to form preferentially at a separation corresponding to the characteristic scale of inhomogeneities in the CMB. Baryonic oscillations can be used as rulers of known length (measured via the CMB acoustic peaks) located at a much smaller redshift than the CMB (currently, z ~ 0.3), and hence they are powerful probes of the recent expansion history of the Universe, with particular focus on dark energy properties. The distribution of clusters with redshift can also be employed to probe the growth of perturbations and hence to constrain cosmology. Current galaxy redshift surveys have catalogued about half a million objects, but a new generation of surveys aims at taking this number to a over a billion.

Figure 5. Current matter power spectrum measurements from the Sloan Digital Sky Survey and best-fit Lambda CDM power spectrum (solid/dashed lines, without/with non-linear corrections), corresponding to the cosmological parameters extracted from WMAP 3-yr CMB data (the top and bottom curves are for two different samples). This shows that even without fitting to matter power spectrum data, the best-fit CMB model is in good agreement with the galaxy distribution data (from [105]).

Supernovae type Ia: the commonly accepted scenario for the formation of supernovae type Ia is a white dwarf accreting material from a binary companion. The core heats up as the gravitational pressure increases, eventually leading to carbon fusion ignition, followed by oxygen burning. This runaway reaction releases within seconds a huge amount of energy, resulting in a violent explosion which is accompanied by a massive surge in luminosity. Observationally, type Ia supernovae are characterized by the absence of hydrogen lines in their spectrum and they are considered almost standard candles, in the sense that there is a strong empirical correlation between their duration and their peak luminosity. From measurements of their brightness as a function of time (the light curve), their intrinsic luminosity can be reconstructed. The data are then used to reconstruct the redshift-distance relationship, i.e. the Hubble diagram, which in turn depends on the cosmological parameters [106 - 109]. Current data extend out to about z~ 1.4 and encompass a few hundreds of supernovae (a number which will increase by a factor of 10 thanks to planned searches). Supernovae data were the first line of evidence in 1998 [110, 111] that the expansion of the Universe is accelerating - an effect attributed to the existence of dark energy (see e.g. [112]).

Weak gravitational lensing: the presence of inhomogeneities in the distribution of matter along the line of sight distorts the shape of background galaxies due to the deflection of light rays. This is most spectacularly displayed in observations of strong lensing, showing the characteristic arc-shaped multiple images of background galaxies. However, the same physics affects to a much smaller degree the shape of any background galaxy, distorting it by about 0.1 to 1%. This is called weak gravitational lensing. Although it is impossible to measure such small distortions for any single galaxy, the effect can still be detected statistically, by correlating the shear pattern (distortion due to gravitational lensing) of several thousand galaxies. The resulting weak lensing power spectrum probes a combination of the geometrical setup (distance to the background sources and to the lens) and of the parameters controlling the growth of structures, in particular the amount of matter (both visible and dark) and the strength of the perturbations (for a recent review, see [113] Some recent observations are reported in [114, 115, 116, 117]). By dividing the source galaxies into slices of different redshift, it is possible to carry out a sort of "cosmic tomography", reconstructing the dark matter distribution between us and the sources [118]. Although it has not yet reached the same level of precision of the CMB, weak lensing shows great promise for the future in constraining cosmological parameters and in particular dark energy.

5.3. Constraining cosmological parameters

As outlined is section 3.1, our inference problem is fully specified once we give the model (which parameters are allowed to vary and their prior distribution) and the likelihood function for the data sets under consideration. For the cosmological observations described above, relevant cosmological parameters can be broadly classified in four categories.

Parameters describing the dynamics of the background evolution: they represent the matter-energy content of the Universe and its expansion history, relating redshift with comoving distance, see Eq. (44). The Hubble constant today is written as H₀ = 100h km/s/Mpc, and is used to define the critical energy density, i.e. the energy density needed to make the Universe spatially flat: _crit = 1.88 × 10^-29 h² g/cm³. The remaining density parameters (_x in Eq. (44)) are then written in units of the critical energy density, so that for example the energy density in baryons is given by _b = _b _crit. Standard parameters include the energy density in photons (), neutrinos (), baryons (_b), cold dark matter (_cdm), cosmological constant () or, more generally, a possibly time-dependent dark energy (_de).

Parameters describing the initial conditions for the fluctuations: they give the type of initial conditions, adiabatic (where the spatial distribution of fluctuations is the same for all fluids emerging from inflation, up to multiplicative factors) or isocurvature (where there is a mismatch between perturbations among two components). The most general type of initial conditions is described by a correlation matrix that contains 10 free parameters representing the excitation amplitude of each mode [119 - 121]. The simplest parameterization of the distribution of perturbations with scale is then given for each mode in terms of a spectral index.

Nuisance parameters: these often relate to insufficiently constrained aspects of the physics of the observed objects, or to uncertainties in the measuring process. We are not interested in determining them, but accounting for their uncertainty is important in order to obtain a correct estimate of the error on the physical parameters we are seeking to determine. If the observable quantity has a strong dependence on poorly determined nuisance parameters, then simply fixing the nuisance parameters instead of marginalizing over them will lead to serious underestimation of the uncertainty for the remaining parameters. Example of nuisance parameters are the bias factor in galaxy surveys, residual beam calibration uncertainty for CMB data, supernovae intrinsic evolution parameters, intrinsic ellipticity of background galaxies in weak lensing and others.

Parameters describing new physics: this is where the exciting frontier of data analysis lies, and we are trying to constrain or detect effects arising from new physics in the model, such as time-variation of the fine structure constant, the presence of cosmic strings, the mass of neutrinos, non-trivial topology of the Universe, extra dimensions, time-variations of dark energy properties, and much more. Although often framed as a parameter inference problem, this is actually a model comparison question, and is therefore best tackled with the methods described in section 4. Therefore, in this case the parameter inference step is only the first level towards working out the outcome of the higher level model comparison step.

5.3.1. The joint likelihood function. When the observations are independent, the log-likelihoods for each observation simply add ⁸. Defining chi ² ident -2 ln, we have that

(45)

One important advantage of combining different observations as in Eq. (45) is that each observable has different degenerate directions, i.e. directions in parameter space that are poorly constrained by the data. By combining two or more types of observables, it is often the case that the two data sets together have a much stronger constraining power than each one of them separately, because they mutually break parameters degeneracies. Combination of data sets should never be carried out blindly, however. The danger is that the data sets might be mutually inconsistent, in which case combining them singles out in the posterior a region that is not favoured by any of the two data sets separately, which is obviously unsatisfactory. Such discrepancies might arise because of undetected systematics, or insufficient modelling of the observations.

In order to account for possible discrepancies of this kind, Ref. [122] suggested to replace Eq. (45) by

(46)

where alpha _i are (unknown) weight factors ("hyperparameters") for the various data sets, which determine the relative importance of the observations. A non-informative prior is specified for the hyperparameters, which are then integrated out in a Bayesian way, obtaining an effective chi-square

(47)

where N_i is the number of data points in data set i. This method has been applied to combine different CMB observations in the pre-WMAP era [122, 123]. A technique based on the comparison of the Bayesian evidence for different data sets has been employed in [124], while Ref. [125] uses a technique similar in spirit to the hyperparameter approach outlined above to perform a binning of mutually inconsistent observations suffering from undetected systematics, as explained in [126].

After the likelihood has been specified, it remains to work out the posterior pdf, usually obtained numerically via MCMC technology, and report posterior constraints on the model parameters, e.g. by plotting 1 or 2-dimensional posterior contours. We now sketch the way this program has been carried out as far as cosmological parameter estimation is concerned.

5.3.2. Likelihood-based parameter determination. Up until around 2002, the method of choice for cosmological parameter estimation was either direct numerical maximum likelihood search [127, 128] or evaluation of the likelihood on a grid in parameter space. Once the likelihood has been mapped out, (frequentist) confidence intervals for the parameters are obtained by finding the maximum-likelihood point (or, equivalently, the minimum chi-square) and by delimiting the region of parameter space where the log-likelihood drops by a specified amount (details can be found in any standard statistics textbook). If the likelihood is a multi-normal Gaussian, then this procedure leads to the familiar "delta chi-square" rule-of-thumb, i.e. that e.g. a 95.4% (2 sigma ) confidence interval for 1 parameter is delimited by the region where the chi ² ident -2 log increases by Delta chi ² = 4.00 from its minimum value (see e.g. [43]). Of course the value of Delta chi ² depends both on the number of parameters being constrained and on the desired confidence interval ⁹.

Approximate confidence intervals for each parameter were then usually obtained from the above procedure, after maximising the likelihood across the hidden dimensions rather than marginalising [130, 131], since the latter procedure required a computationally expensive multi-dimensional integration. The rationale was that maximisation is approximately equivalent to marginalisation for close-to-Gaussian problems (a simple proof can be found in Appendix A of [132]), although it was early recognized that this is not always a good approximation for real-life situations [133]. Marginalisation methods based on multi-dimensional interpolation were devised and applied in order to improve on this respect [132, 134]. Many studies adopted this methodology, which could not quite be described as fully Bayesian yet since it was likelihood-based and the the notion of posterior was not explicitly introduced. Often, the choice of particular theoretical scenarios (for example, a flat Universe or adiabatic initial conditions) or the inclusion of external constraints (such as bounds on the baryonic density coming from Big Bang nucleosyhntesis) were described as "priors". A more rigorous terminology would call the former a model choice, while the latter amounts to inclusion (in the likelihood) of external information. Since the likelihood could be well approximated by a simple log-normal distribution, its computation cost was fairly low. With the advent of CMBFAST [135], the availability of a fast numerical code for the computation of CMB and matter power spectra meant that grids of up to 30 million points and parameter spaces of dimensionality up to order 10 could be handled in this way [134].

5.3.3. Bayes in the sky — The rise of MCMC. The watershed moment after which methods based on likelihood evaluation on a grid where definitely overcome by Bayesian MCMC methods can perhaps be indicated in Ref. [136], which marked one of the last major studies performed using essentially frequentist techniques. Pioneering works in using MCMC technology for cosmological parameter extraction include the application to VSA data [137, 45], the use of simulated annealing [138] and the study of Ref. [139]. But it was the release of the CosmoMC code ¹⁰ in 2002 [140] that made a huge impact on the cosmological community, as CosmoMC quickly became a standard and user-friendly tool for Bayesian parameter inference from CMB, large scale structure and other data. The favourable scalability of MCMC methods with dimensionality of the parameter space and the easiness of marginalization were immediately recognized as major advantages of the method. CosmoMC employs the CAMB code [141] to compute the matter and CMB power spectra from the physical model parameters. It then employs various MCMC algorithms to sample from the posterior distribution given current CMB, matter power spectrum (galaxy power spectrum, baryonic acoustic wiggles and Lyman- alpha observations) and supernovae data.

State-of-the-art applications of cosmological parameter inference can be found in papers such as [105, 142, 143, 97]. Table 3 summarizes recent posterior credible intervals on the parameters of the vanilla Lambda CDM model introduced above while Figure 6 shows the full 1-D posterior pdf for 6 relevant parameters (both from Ref. [105]). The initial conditions emerging from inflation are well described by one adiabatic degree of freedom and a distribution of fluctuations that deviates slightly from scale invariance, but which is otherwise fairly featureless.

**Table 3.** State-of-the art cosmological parameter inference from WMAP 3-year CMB data [80] and Sloan Digital Sky Survey data [105]. Posterior median and 68% posterior region, obtained for flat priors on the parameter set in the top section, with the exception of the reionization optical depth , for which a flat prior has been adopted on exp(-2) instead (adapted from [105]).

Parameter	Value	Meaning	Definition

Matter budget parameters

_s	0.5918_{- 0.0020}^{+ 0.0020}	CMB acoustic angular scale fit (degrees)	_s = r_s(z_rec) / d_A(z_rec) × 180 /
_b	0.0222_{- 0.0007}^{+ 0.0007}	Baryon density	_b = _b h² ≈ _b / (1.88 × 10^-26kg/m³)
_c	0.1050_{- 0.0040}^{+ 0.0041}	Cold dark matter density	_c = _cdm h² ≈ _c / (1.88 × 10^-26kg/m³)
Initial conditions parameters

A_s	0.690_{- 0.044}^{+ 0.045}	Scalar fluctuation amplitude	Primordial scalar power at k = 0.05/Mpc
n_s	0.953_{- 0.016}^{+ 0.016}	Scalar spectral index	Primordial spectral index at k = 0.05/Mpc
Reionization history (abrupt reionization)

	0.087_{- 0.030}^{+ 0.028}	Reionization optical depth
Nuisance parameters (for galaxy power spectrum)
b	1.896_{- 0.069}^{+ 0.074}	Galaxy bias factor	See [105] for details.
Q_nl	30.3_{- 4.1}^{+ 4.4}	Nonlinear correction parameter	See [105] for details.
Derived parameters (functions of those above)
_tot	1.00 (flat Universe assumed)	Total density/critical density	_tot = _m + = 1 -
h	0.730_{- 0.019}^{+ 0.019}	Hubble parameter	h = [(_b + _c ) / (_tot - )]^1/2
_b	0.0416_{- 0.0018}^{+ 0.0019}	Baryon density/critical density	_b = _b / h²
_c	0.197_{- 0.015}^{+ 0.016}	CDM density/critical density	_cdm = _c / h²
_m	0.239_{- 0.017}^{+ 0.018}	Matter density/critical density	_m = _b + _cdm
	0.761_{- 0.018}^{+ 0.017}	Cosmological constant density/critical density	≈ h^-2 (1.88 × 10^-26kg/m³)
₈	0.756_{- 0.035}^{+ 0.035}	Density fluctuation amplitude	See [105] for details.

Figure 6. Posterior constraints on key cosmological parameters from recent CMB and large scale structure data, compare Table 3. Top row, from left to right, posterior pdf (normalized to the peak) for the cosmological constant density in units of the critical density, the (physical) baryons and cold dark matter densities. Bottom row, from left to right: optical depth to reionization, scalar tilt and scalar fluctuations amplitude. Yellow using WMAP 1-yr data, orange WMAP 3-yr data and red adding Sloan Digital Sky Survey galaxy distribution data. Spatial flatness and adiabatic initial conditions have been assumed. This set of only 6 parameters (plus 2 other nuisance parameters not shown here) appear currently sufficient to describe most cosmological observations (adapted from [105]).

Addition of extra parameters to this basic description (for example, a curvature term, or a time-evolution of the cosmological constant, in which case it is generically called "dark energy") is best discussed in terms of model comparison rather than parameter inference (see next section). The breath and range of studies aiming at constraining extra parameters is such that it would be impossible to give even a rough sketch here. However we can say that the simple model described by the 6 cosmological parameters given in Table 3 appears at the moment appropriate and sufficient to explain most of the presently available data. MCMC is nowadays almost universally employed in one form or another in the cosmology community.

5.3.4. Recent developments in parameter inference. Nowadays, the likelihood evaluation step is becoming the bottleneck of cosmological parameter inference, as the WMAP likelihood code [80] requires the evaluation of fairly complex correlation terms, while the computational time for the actual model prediction in terms of power spectra is subdominant (except for spatially curved models or non-standard scenarios containing active seeds, such as cosmic strings). This trend is likely to become stronger with future CMB datasets, such as Planck. Currently, for relatively straightforward extensions of the concordance model presented in Table 3, a Markov chain with enough samples to give good inference can be obtained in a couple of days of computing time on a "off-the-shelf" machine with 4 CPUs.

Massive savings in computational effort can be obtained by employing neural networks techniques, a computational methodology consistent of a network of processors called "neurons". Neural networks learn in an unsupervised fashion the structure of the computation to be performed (for example, likelihood evaluation across the cosmological parameter space) from a training set provided by the user. Once trained, the network becomes a fast and efficient interpolation algorithm for new points in the parameter space, or for classification purposes (for example to determine the redshift of galaxies from photometric data [144, 145]). When applied to the problem of cosmological parameter inference, neural networks can teach themselves the structure of the parameter space (for models up to about 10 dimensions) by employing as little as a few thousands training points, for which the likelihood has to be computed numerically as usual. Once trained, the network can then interpolate extremely fast between samples to deliver a complete Markov chain within a few minutes. The speed-up can thus reach a factor of 30 or more [146, 147]. Another promising tool is a machine-learning based algorithm called PICO [148, 149] ¹¹, requiring a training set of order ~ 10⁴ samples, which are then used to perform an interpolation of the likelihood across parameter space. This procedure can achieve a speed-up of over a factor of 1000 with respect to conventional MCMC.

The forefront of Bayesian parameter extraction is quickly moving on to tackle even more ambitious problems, involving thousands of parameters. This is the case for the Gibbs sampling technique to extract the full posterior probability distribution for the power spectrum C_ℓ's directly from CMB maps, performing component separations (i.e., multi-frequency foregrounds removal) at the same time and fully propagating uncertainties to the level of cosmological parameters [150, 153]. This method has been tested up to ℓ ltapprox 50 on WMAP temperature data and is expected to perform well up to ℓ 100-200 for Planck-quality data. Equally impressive is the Hamiltonian sampling approach [154], which returns the C_ℓ's posterior pdf from a (previously foreground subtracted) CMB map. At WMAP resolution, this involves working with ~ 10⁵ parameters, but the efficiency is such that the 800 C_ℓ distributions (for the temperature signal) can be obtained in about a day of computation on a high-end desktop computer.

Another frontier of Bayesian methods is represented by high energy particle physics, which for historical and methodological reasons has been so far mostly dominated by frequentist techniques. However, the Bayesian approach to parameter inference for supersymmetric theories is rapidly gathering momentum, due to its superior handling of nuisance parameters, marginalization and inclusion of all sources of uncertainty. The use of Bayesian MCMC for supersymmetry parameter extraction has been first advocated in [155], and has then been rigorously applied to a detailed analysis of the Constrained Minimal Supersymmetric Standard Model [156 - 161], a problem that involves order 10 free parameters. A public code called SuperBayeS is available to perform an MCMC Bayesian analysis of supersymmetric models ¹². Presently, there are hints that the constraining power of the data is insufficient to override the prior choice in this context [162], but future observations, most notably by the Tevatron or LHC [163, 164] and tighter limits (or a detection) on the neutralino scattering cross section [159], should considerably improve the situation in this respect.

5.4. Caveats and common pitfalls

Although Bayesian inference is quickly maturing to become an almost automated procedure, we should not forget that a "black box" approach to the problem always hides dangers and pitfalls. Every real world problem presents its own peculiarities that demand careful consideration and statistical inference remains very much a craft as much as a science. While Bayes' Theorem is never wrong, incorrect specification of the prior (for example, making unwarranted assumptions of failing to specify relevant external information) or inappropriate construction of the likelihood (e.g., not reflecting the experiment or neglecting relevant sources of uncertainty) will easily lead to wrong inferences. Below we list a few common pitfalls that need to be considered in the cosmological context.

Hidden prior information. Sometimes the choice of flat priors on the parameters is uncritically taken to be uninformative. This is often not the case. For instance, a flat prior is indeed non-informative if we are estimating the mean of a Gaussian, but if we are interested in its standard deviation a prior which is flat in ln ("Jeffreys' prior") is instead the appropriate choice (see e.g. [6]). When considering extensions of the CDM model, for example, it is common practice to parameterize the new physics with a set of quantities about which very little (if anything at all) is known a priori. Assuming flat priors on those parameters is often an unwarranted choice. Flat priors in "fundamental" parameters will in general not correspond to a flat distribution for the observable quantities that are derived from those parameters, nor for other quantities we might be interested in constraining. Hence the apparently non-informative choice for the fundamental parameters is actually highly informative for other quantities that appear directly in the likelihood. It is extremely important that this "hidden prior information" be brought to light, or one could mistake the effect of the prior for constraining power of the data. An effective way to do that is to plot pdf's for the quantities of interest without including the data, i.e. run an MCMC sampling from the prior only (see [45] for an instructive example).

No well-defined prior measure. Another difficulty is that in many cases several physically equally plausible parameterizations exist, in particular for problems involving unknown amplitudes, such as for example isocurvature modes in the initial conditions. Since a flat prior on parameterization A is not flat in parameterization B if the two are related by a non-linear transformation, see Eq. (11), two physically equivalent setups might lead to widely different inferences. In the absence of theoretical or physical reasons to prefer parameterization A or B this leads to the unsatisfactory dependence of the posterior on the volume enclosed by the prior, as discussed for the problem of general initial conditions in [120]. For an example of such a "prior volume effect" applied to inflationary models parameters, see [165]. Other obvious domains where this effect might be problematic are dark energy equation of state reconstruction (e.g., [166, 167]), initial power spectrum reconstruction [168 - 170] and determination of inflationary potential parameters [171 - 173]. In some special cases, fundamental principles can be invoked to define the appropriate prior measure, which exploits either symmetries or invariance properties of the problem. An important example is image reconstruction, where the Maximum Entropy principle is employed to define an informative prior on the image space (see e.g. [174 - 176] for an astronomical application).

Lindley's paradox. A methodological issue is the widespread use of inappropriate tools to answer what is actually a model selection question. Often we want to assess the "significance" of an effect, when a deviation is observed in the posterior from the parameter value that would correspond to the absence of that effect. This situation is extremely common across very different domains, from estimation of photon number counts in presence of a background (see [18, 19 177] for a proper Bayesian treatment), to the assessment of the anomalies in the large-scale CMB power spectrum ([178] compare frequentist methods with the Bayesian evidence), the detection of gravitational waves ([179] introduce a Bayesian technique that is automatically optimal) or the discovery of extra-solar planets [180]. In general, the "number of sigma" away from the expected value in the absence of a signal is not a good indicator of the significance of the effect, a result that goes under the name of "Lindley's paradox" [181]. For further details, see Appendix A in [58].

Using the data twice. Often the data are used to guide in a qualitative fashion the model building or the choice of priors. If the same data are then used for a quantitative comparison the significance of the effect can be drastically overestimated. This is a well-known problem in frequentist statistics, which therefore insists that one should design one's statistical tests before seeing the data at all. In cosmology this might often be problematic or impossible to achieve, as new observations will uncover completely unexpected phenomena (for example, the large-scale anomalies in the cosmic microwave background). It is however important to keep in mind that the significance of such effects is difficult to assess.

⁷ Alex Szalay, talk at the specialist discussion meeting "Statistical challenges in cosmology and astroparticle physics", held at the Royal Astronomical Society, London, Oct 2007. Back.

⁸ This is of course not the case when one is carrying out correlation studies, where the aim is precisely to exploit correlation among different observables (for example, the late Integrated Sachs-Wolfe effect). Back.

⁹ An important technical point is that frequentist confidence intervals are considered random variables — they give the range within which our estimate of the parameter value will fall e.g. 95.4% of the time if we repeat our measurement N→ ∞ times. The true value of the parameter is given (although unknown to us) and has no probability statement attached to it. On the contrary, Bayesian credible intervals containing for example 95.4% of the posterior probability mass represent our degree of belief about the value of the parameter itself. Often, Bayesian credible intervals are imprecisely called "confidence intervals" (a term that should be reserved for the frequentist quantity), thus fostering confusion between the two. Perhaps this happens because for Gaussian cases the two results are formally identical, although their interpretation is profoundly different. This can have important consequences if the true value is near the boundary of the parameter space, in which case the results from the frequentist and Bayesian procedure may differ substantially - see [129] for an interesting example involving the determination of neutrino masses. Back.

¹⁰ Available from: cosmologist.info/cosmomc (accessed Jan 20th 2008). Back.

¹¹ Available from: http://cosmos.astro.illinois.edu/pico/ (accessed Jan 14th 2008). Back.

¹² Code available from: superbayes.org (accessed Feb 13th 2008). Back.