Bayesian Reasoning versus Conventional Statistics in High Energy Physics

2. MEASUREMENT UNCERTAINTY

An idea well rooted among physicists, especially nuclear and particle physicists, is that the result of a measurement must be reported with a corresponding uncertainty. What makes the measured values subject to a degree of uncertainty is, it is commonly said, the effect of unavoidable measurement errors, usually classified as random (or statistical) and systematic ⁽⁷⁾ .

Uncertainties due to statistical errors are commonly treated using the frequentist concept of confidence intervals, although the procedure is so unnatural that the interpretation of the result is unconsciously subjective (as will be shown in a while), and there are known cases (of great relevance in frontier research) in which this approach is not applicable.

As far as uncertainties due to systematics errors are concerned, there is no conventional consistent theory to handle them, as is also indirectly recognized by the ISO Guide [10]. The ``fashion'' at the moment is to add them quadratically if they are considered to be independent, or to build a covariance matrix if not. This procedure is not justified theoretically (in the frequentist approach) and I think that it is used essentially because of the reluctance of experimentalists to add linearly the dozens of contributions of a complicated HEP measurement, as the old fashioned ``theory'' of maximum errors suggests doing ⁽⁸⁾ . The pragmatic justification for the quadratic combination of ``systematic errors'' is that one is using a rule (the famous ``error propagation'' formula ⁽⁹⁾ ) which is considered to be valid at least for ``statistical errors''. But, in reality, this too is not correct. The use of this formula is again arbitrary in the case of ``statistical errors'', if these have been evaluated from confidence intervals ⁽¹⁰⁾ . In fact, there is no logical reason why a probabilistic procedure proved for standard deviations of random variables (the observables) should also be valid for 68% confidence intervals, which is considered, somehow, an uncertainty attributed to the true value.

These examples show quite well the contradiction between the cultural background on probability and the practical good sense of physicists. Thanks to this good sense, frequentist ideas are constantly violated, with the positive effect that at least some results are obtained ⁽¹¹⁾ . It is interesting to notice that in simple routine applications these results are very close, both in value and in meaning, to those achievable starting from what I consider to be the correct point of view for handling uncertainty (subjective probability). There are, on the other hand, critical cases in which scientific conclusions may be seriously mistaken. Before discussing these cases, let us look more closely at the terms of the claimed contradiction.

⁷ This last statement may sound like a tautology, since ``error'' and ``uncertainty'' are often used as synonyms. This hints to the fact that in this subject there is neither uniformity of language, nor of methods, as is recognized by the metrological organizations, which have made great efforts to bring some order into the field [8, 9, 10, 11, 12]. In particular, the International Organization for Standardization (ISO) has published ``Guide to the expression of uncertainty in measurement'' [10], containing definitions, recommendations and practical examples. For example, error is defined as ``the result of a measurement minus a true value of the measurand'' uncertainty ``a parameter, associated with the result of a measurement, that characterize the dispersion of the values that could reasonably be attributed to measurand'', and, finally, true value ``a value compatible with the definition of the given particular quantity''. One can easily see that it is not just a question of practical definitions. It seems to me that there is a well-thought-out philosophical choice behind these definitions, although it is not discussed extensively in the Guide. Two issues in the Guide that I find of particular importance are the discussion on the sources of uncertainty and the admission that all contributions to the uncertainty are of a probabilistic nature. The latter is strictly related to the subjective interpretation of probability, as admitted by the Guide and discussed in depth in [7]. (The reason why these comments on the ISO Guide have been placed in this long footnote is that, unfortunately, the Guide is not yet known in the HEP community and, therefore, has no influence on the behaviour of HEP physicists about which I am going to comment here. This is also the reason why I will often use in this paper typical expressions currently used in HEP and which are in disagreement with the ISO recommendations. But I will use these expressions preferably within quote marks, like ``systematic error'' instead of ``uncertainty due to a recognized systematic error of unknown size''.) Back.

⁸ In fact, one can see that when there are only 2 or 3 contributions to the ``systematic error'', there are still people who prefer to add them linearly. Back.

⁹ The most well-known version is that in which correlations are neglected:

Y stands for the quantity of interest, the value of which depends on directly measured quantities, calibration constants and other systematic effects (all terms generically indicated by X_i). This formula comes from probability theory, but it is valid if X_i and Y are random variables, sigma (X_i) are standard deviation and the linearization is reasonable. It is very interesting to look at text books to see how this formula is derived. The formula is usually initially proofed referring to random variables associated to observables and then, suddenly, it is referred to physics quantities, without any justification. Back.

¹⁰ As far as ``systematic errors'' are concerned the situation is much more problematic because the ``errors'' are not even operationally well defined: they may correspond to subjectivist standard deviations (what I consider to be correct, and what corresponds to the ISO type B standard uncertainty [10]), but they can more easily be maximum deviations, ±50% variation on a selection cut, or the absolute difference obtained using two assumptions for the systematic effect. Back.

¹¹ I am strongly convinced that a rigorous application of frequentist ideas leads nowhere. Back.