4. METHODOLOGICAL CHALLENGES FROM ASTRONOMICAL
SURVEYS
Many astronomical surveys are not amenable to traditional multivariate
analysis and classification, and present serious needs for
methodological advances by statisticians. Four major difficulties are
outlined here.
First, fluxes or other measured quantities are subject to
heteroscedastic measurement errors with known variances. That is,
each variable of each object has an associated measurement of the
variable uncertainty, and these uncertainties can differ for each
object. Surprisingly, statistical methodology is very poorly developed
for such situations. For instance, there is no clustering algorithm
that weights points by their known measurement errors. Only the
LISREL model of the multivariate linear regression problem can begin
to treat known heteroscedastic measurement errors
(Jöreskog &
Sörbom 1989).
Second, objects may be undetected at one or many wavebands, leading to
upper limits or censored data in one or many variables. A mature field
of statistics known as survival analysis, developed principally for
biomedical and industrial reliability applications, has been developed
for censored datasets. A suite of survival methods is now widely used
in astronomy (Feigelson
1992).
However, most survival statistics
apply only to univariate problems; Cox regression, the principal
multivariate technique, permits censoring only in the single dependent
variable. A more general partial co-parametric methods, Bayesian
approaches, outlier detection and robust methods, multicollinearity
and ridge regression, goodness-of-fit measures, nonparametric density
estimation, wavelet analysis, bootstrap resampling and
cross-validation, mathematical morphology, and many aspects of
traditional multivariate analysis. The methodology for understanding
multivariate databases is vast and constantly growing.
5. ASTROSTATISTICS REFERENCES AND
CODES
Multivariate statistics are briefly reviewed in an astronomical
context by Babu &
Feigelson (1996),
and are more thoroughly described (with FORTRAN codes) by
Murtagh & Heck
(1987). Many monographs
presenting multivariate statistics are available, such as
Johnson & Wichern
(1992). While commercial statistical packages are the most
powerful tools for implementing statistical procedures, a considerable
amount of software is in the public domain on the World Wide Web. An
informative essay on statistical software by
Wegman (1997) can be
found at
http://www.galaxy.gmu.edu/papers/astr1.html.
Information on commercial statistical software packages such as SAS,
SPSS and S-PLUS is available at
http://www.stat.cornell.edu/compsites.html.
Significant archives of on-line public domain statistical software
reside at StatLib
(http://lib.stat.cmu.edu) and the Guide to Available
Mathematical Software
(http://gams.nist.gov). StatLib provides many
state-of-the-art codes useful to astronomers such as XGobi, ODRPACK,
loess and MARS. Penn State operates the Statistical Consulting Center
for Astronomy
(http://www.stat.psu.edu/scca)
for astronomers with
statistical questions, and is initiating a site with links to
statistical software on the Web
(http://www.astro.psu.edu/statcodes).
Acknowledgements
This work was supported by NSF DMS 9626189, NASA
NAGW-2120 and NAS 5-32669.