The information criteria are clearly a powerful tool for establishing the appropriate set of cosmological parameters. How do they relate to the standard approach in cosmology of looking at confidence levels of parameter detection?

Use of fairly low confidence levels, such as 95%, to identify new
parameters is inherently very risky because of the large number of
candidate parameters. If there were only one candidate parameter and it
were detected at 95% confidence, that certainly be interesting. However
there are many possible parameters, and if one analyzes a several of
them and finds one at 95% confidence, then one can no longer say that
the base model is ruled out at that level, *because there
were several different parameters any of which might, by chance, have
been at its 95% limit*. As an extreme example, if one considered 20
parameters it would be no surprise at all to find one at 95% confidence
level, and that certainly wouldn't mean the base model was excluded at
that confidence. Consequently the true statistical significance of a
parameter detection is always likely to be
less than indicated by its confidence levels (e.g.
Bromley & Tegmark
2000).
This issue can arise
both within a single paper which explores many parameters, and in a broader
sense because the community as a whole investigates many different
parameters.

This is a form of publication bias - the tendency for authors to preferentially submit, and editors to preferentially accept, papers showing positive statistical evidence. This bias is well recognized in the field of medical trials (see e.g. Sterne, Gavaghan & Egger 2000), where it can literally be a matter of life and death and tends to lead to the introduction of treatments which are at best ineffectual and may even be harmful. The stakes are not so high in cosmology, but one should be aware of its possible effects. Publication bias comes in several forms, for example if a single paper analyzes several parameters, but then focusses attention on the most discrepant, that in itself is a form of bias. The more subtle form is where many different researchers examine different parameters for a possible effect, but only those who, by chance, found a significant effect for their parameter, decided to publicize it strongly.

Publication bias is notoriously difficult to allow for, as it mainly arises due to unpublished analyses of null results. However a useful guide comes from considering the number of parameters which have been under discussion in the literature. Given the list in Table 2, it is clear that, even if the base cosmological model is correct, there are enough parameters to be investigated that one should not be surprised to find one or two at the 95% confidence level.

I conclude that when considering whether a new parameter should be transferred from the candidate parameter list to the base parameter list, a 95% confidence detection should not be taken as persuasive evidence that the new parameter is needed. Because there are so many candidate parameters, a more powerful threshold is needed. The BIC provides a suitably stringent criterion, whereas this line of argument supports the view that the AIC is too weak a criterion for cosmological model selection.

Another subtle point relating to cosmological data is the inability to fully repeat an experiment. Conventionally in statistics, once a dataset has identified an effect which looks interesting (e.g. spectral index running at 95% confidence), one is expected to throw away all that data and seek confirmation from a completely new dataset. This procedure is necessary to minimize publication bias effects, and failure to follow it is regarded as poor practice. Unfortunately, for the microwave anisotropies much of the noise comes from cosmic variance rather than instrumental effects, and so remeasuring does not give an independent realization of statistical noise. For example, if one analyzes the second-year WMAP data (once it becomes available) separately from the first-year data, there will be a tendency for the same cosmological parameter values to be obtained. Finding the same outlying parameter values therefore will have less statistical significance than were the datasets genuinely independent. Even Planck data will have noise significantly correlated to WMAP data in this sense, and properly allowing for that in determining statistical significance of parameter detections would be tricky. This supports the use of information criteria for model selection, rather than parameter confidence levels.