Practical Statistics for Astronomers II

4.4 Summary, One- and Two-Sample Tests

Table IV, adapted from Siegel & Castellan (1988), attempts a summary, demonstrating an apparent wide world of non-parametric tests available for sample comparison; but is this really so? In deciding which test(s) are relevant, the following points should be noted; the decision may be made for you.

The two-sample and k-sample cases each contain columns of tests for related samples, i.e. matched-pair samples or samples of paired replicates. This is common experimental practice in biological and behavioural sciences, where the concept of the control sample is highly developed. It is not so common in astronomy for obvious reasons, but has been exploited on occasion. The powerful tests available to treat such experiments are listed in Table IV, and are described by Siegel & Castellan.
The table runs downward in order of increasing sophistication of measurement level, from ``nominal'' (in which the test objects are simply dumped into classes or bins) through ``ordinal'' (by which objects are ranked or ordered) to ``interval'' (for which objects are placed on a scale, not necessarily numerical, in which distance along the scale matters). None of the tests requires measurement on a ``ratio'' scale, the strongest scale of measurement in which to the properties of the ``interval'' scale a true zero point is added. (Degrees Celsius for temperature measurement represents an ``interval'' scale, and Kelvins a ``ratio'' scale.) An important feature of test selection lies in the level of measurement required by the test: the table is cumulative downward in the sense that at any level of measurement, all tests above this level are applicable.
The efficiency of a particular test depends very much on the individual application. Is the search for goodness-of-fit and general difference, i.e. is this sample from a given population? Are these samples from the same population? Or is it a particular property of the distribution that is of interest, such as the location, e.g. central tendency, mean or median; or the dispersion e.g. extremes, variance, rms. For instance in the two-sample case, the ² and the Kolmogorov-Smirnov (two-tailed) tests are both sensitive to any type of difference in the two distributions, location, dispersion, skewness, while the U test is reasonably sensitive to most properties, but is particularly powerful for location discrimination. To aid the process of choice, Tables V (single samples) and VI (two samples) summarize the attributes of the one- and two-sample tests.

The choice of test may thus come down to Hobson's. If it does not, however, and two (or more) alternatives remain - beware of this plot of the devil - it might be possible to ``test the tests'' in search of support of a point of view. If such a procedure is followed, quantification of the amount by which significance is reduced must be considered: for a chosen significance level p in a total of N tests, the chance that one test will (randomly) come up significant is Np (1 - p)^N-1 approx Np for small p. The application of efficient statistical procedure has power, but the application of common sense has more.