In Section 5, we introduced the statistic 2 (eq. [26]) as a measure of the coherence of the residual field between the IRAS and TF data. Here we demonstrate that it has approximately the properties of a true 2 statistic, and indicate how and why it departs from true 2 behavior.
The measure of residual coherence at separation is
(C1) |
where dij is the separation in IRAS-distance space between objects i and j, and m is the normalized magnitude residual (eq. [23]). The sum runs over the Np() distinct pairs of objects with separation ± ; note that a given object may appear in more than one of these pairs. The hypothesis we wish to test is that the IRAS-TF residuals are incoherent, which signifies a good fit on all scales. A formal statement of this condition is that the individual m,i are independent random variables. Furthermore, the m have been constructed to have mean zero and unit variance. Thus, our hypothesis of uncorrelated residuals implies that the expectation value of the product m,i m,j vanishes for i j, and that the expectation value of its square is unity.
It follows that
(C2) |
The variance of () is
(C3) |
Now the expectation value within the sum will vanish under our assumption of uncorrelated residuals unless i = k and j = l. (Notice that we cannot have i = l and j = k because of the ordered nature of the summation.) Thus, the only nonzero terms in equation (C3) are identical pairs, and it follows that E[2()] = Np().
Because () is the sum of Np() random variables, each of zero mean and unit variance, we are tempted to suppose that, by the central limit theorem, its distribution is Gaussian with mean zero and variance Np() when Np() is large. Indeed, for the 200 km s-1 bins used in its construction (cf. Section 5.2), Np is typically 104. And, as shown in the previous paragraph, () does indeed have mean zero and variance Np(). One also may ask about the correlation among the () for different . Specifically, one may compute
(C4) |
Now it is possible to have i = k within this sum. However, because 1 2, if i = k then j l. Similarly, one may have j = l, but in that case i k. Thus, all of the individual expectation values in the sum vanish, and we find E[(1) (2)] = 0. To the extent the above considerations hold, the (i) are independent Gaussian random variables of variance Np(i). It then follows that the statistic 2 is distributed like a 2 variable with M degrees of freedom. This is the statistic proposed in the main text as a measure of goodness of fit.
However, the central limit theorem applies only to sums of independent random variables. The individual products m,i m,j that enter into () are uncorrelated in the specific sense E(m,i m,j) E(m,k m,l) = Ki,k Kj,l (where K is the Kronecker-delta symbol). However, they are not strictly independent from one another. This is because the same object can occur in more than one pair at a given . We thus expect the central limit to apply only approximately, and as a result the () are not strictly Gaussian. As a result, 2 cannot be a true 2 statistic.
Furthermore, just as a single object appears in many pairs at a given , it can appear in pairs at different as well. Let us suppose object i contributes to both (1) and (2). Then the latter are not strictly independent, even though the expectation value of their product vanishes, as shown above. This factor, too, will result in a departure from 2 behavior.