Next Previous Previous



Many astronomical surveys are not amenable to traditional multivariate analysis and classification, and present serious needs for methodological advances by statisticians. Four major difficulties are outlined here.

First, fluxes or other measured quantities are subject to heteroscedastic measurement errors with known variances. That is, each variable of each object has an associated measurement of the variable uncertainty, and these uncertainties can differ for each object. Surprisingly, statistical methodology is very poorly developed for such situations. For instance, there is no clustering algorithm that weights points by their known measurement errors. Only the LISREL model of the multivariate linear regression problem can begin to treat known heteroscedastic measurement errors (Jöreskog & Sörbom 1989).

Second, objects may be undetected at one or many wavebands, leading to upper limits or censored data in one or many variables. A mature field of statistics known as survival analysis, developed principally for biomedical and industrial reliability applications, has been developed for censored datasets. A suite of survival methods is now widely used in astronomy (Feigelson 1992). However, most survival statistics apply only to univariate problems; Cox regression, the principal multivariate technique, permits censoring only in the single dependent variable. A more general partial co-parametric methods, Bayesian approaches, outlier detection and robust methods, multicollinearity and ridge regression, goodness-of-fit measures, nonparametric density estimation, wavelet analysis, bootstrap resampling and cross-validation, mathematical morphology, and many aspects of traditional multivariate analysis. The methodology for understanding multivariate databases is vast and constantly growing.



Multivariate statistics are briefly reviewed in an astronomical context by Babu & Feigelson (1996), and are more thoroughly described (with FORTRAN codes) by Murtagh & Heck (1987). Many monographs presenting multivariate statistics are available, such as Johnson & Wichern (1992). While commercial statistical packages are the most powerful tools for implementing statistical procedures, a considerable amount of software is in the public domain on the World Wide Web. An informative essay on statistical software by Wegman (1997) can be found at

Information on commercial statistical software packages such as SAS, SPSS and S-PLUS is available at

Significant archives of on-line public domain statistical software reside at StatLib ( and the Guide to Available Mathematical Software ( StatLib provides many state-of-the-art codes useful to astronomers such as XGobi, ODRPACK, loess and MARS. Penn State operates the Statistical Consulting Center for Astronomy ( for astronomers with statistical questions, and is initiating a site with links to statistical software on the Web (



This work was supported by NSF DMS 9626189, NASA NAGW-2120 and NAS 5-32669.

Next Previous Previous