We thus find that astronomy and astrophysics today requires a vast range of statistical capabilities. In statistical jargon, it helps for astronomers to know something about: sampling theory, survival analysis with censoring and truncation, measurement error models, multivariate classification and analysis, harmonic and autoregressive time series analysis, wavelet analysis, spatial point processes and continuous surfaces, density estimation, linear and non-linear regression, model selection, and bootstrap resampling. In some cases, astronomers need combinations of methodologies that have not yet been fully developed (Section 6 below).
Faced with such a complex of challenges, mechanical exposure to a wider variety of techniques is a necessary but not sufficient prerequisite for high-quality statistical analyses. Astronomers also need to be imbued with established principles of statistical inference; e.g., hypothesis testing and parameter estimation, nonparametric and parametric inference, Bayesian and frequentist approaches, and the assumptions underlying and applicability conditions for any given statistical method.
Unfortunately, we find that the majority of the thousands of
astronomical studies requiring statistical analyses use a very
limited set of classical methods. The most common tools used by
astronomers are: Fourier transforms for temporal analysis
(developed by Fourier in 1807), least squares regression and
2
goodness-of-fit
(Legendre in 1805,
Pearson in 1900,
Fisher in 1924),
the nonparametric Kolmogorov-Smirnov 1- and 2-sample nonparametric tests
(Kolmogorov in 1933),
and principal components analysis for multivariate tables
(Hotelling in 1936).
Even traditional methods are often misused. Feigelson & Babu [9] found that astronomers use interchangeably up to 6 different fits for bivariate linear least squares regression: ordinary least squares (OLS), inverse regression, orthogonal regression, major axis regression, the OLS mean, and the OLS bisector. Not only did this lead to confusion in comparing studies (e.g., in measuring the expansion of the Universe via Hubble's constant, Ho), but astronomers did not realize that the confidence intervals on the fitted parameters can not be correctly estimated with standard analytical formulae. Similarly, Protassov et al. [24] found that the majority of astronomical applications of the F test, or more generally the likelihood ratio test, are inconsistent with asymptotic statistical theory.
But, while the average astronomical study is limited to often-improper usage of a limited repertoire of statistical methods, a significant tail of outliers are much more sophisticated. The maximization of likelihoods, often developed specially for the problem at hand, is perhaps the most common of these improvements. Bayesian approaches are also becoming increasingly in vogue.
In a number of cases, sometimes buried in technical appendices of
observational papers, astronomers independently develop
statistical methods. Some of these are rediscoveries of known
procedures; for example,
Avni et al. [2]
and others recovered
elements of survival analysis for treatments of left-censored data
arising from nondetections of known objects. Some are quite
possibly mathematically incorrect; such as various revisions to
2
for Poissonian data that assume the resulting statistic
still follows the
2
distribution. On rare occasions, truly
new and correct methods have emerged; for example, astrophysicist
Lynden-Bell [19]
discovered the maximum-likelihood estimator for a
randomly truncated dataset, for which the theoretical validity was
later established by statistician
Woodroofe [31].
A growing group of astronomers, recognizing the potential for new liaisons with the accomplishments of modern statistics, have promoted astrostatistical innovation through cross-disciplinary meetings and collaborations. Fionn Murtagh, an applied mathematician at Queen's University (Belfast) with long experience in astronomy, and his colleagues have run conferences and authored many useful monographs (e.g., [16], [17], [22] and [27]). We at Penn State have run a series of Statistical Challenges in Modern Astronomy meetings with both communities in attendance (e.g., [3] and [10]). Alanna Connors has organized brief statistics sessions at large astronomy meetings, and we have organized brief astronomy sessions at large Joint Statistical meetings. We wrote a short volume called Astrostatistics [3] intended to familiarize scholars in one discipline with relevant issues in the other discipline. Other series conferences are devoted to technical issues in astronomical data analysis but typically have limited participation by statisticians. These include the dozen Astronomical Data Analysis Software and Systems (e.g., [23]), several Erice workshops on Data Analysis in Astronomy (e.g., [8]), and the new SPIE Astronomical Data Analysis conferences (e.g., [26]).
Most importantly, several powerful astrostatistical research collaborations have emerged. At Harvard University and the Smithsonian Astrophysical Observatory, David van Dyk worked with scientists at the Chandra (3) X-ray Center on several issues, particularly Bayesian approaches to parametric modeling of spectra in light of complicated instrumental effects. At Carnegie Mellon University and the University of Pittsburgh, the Pittsburgh Computational Astrophysics group addressed several issues, such as developing powerful techniques for multivariate classification of extremely large datasets and applying nonparametric regression methods to cosmology. Both of these groups involved academics, researchers and graduate students from both fields working closely for several years to achieve a critical mass of cross-disciplinary capabilities.
Other astrostatistical collaborations must be mentioned. David
Donoho (Statistics at Stanford University) works with Jeffrey
Scargle (NASA Ames Research Center) and others on applying
advanced wavelet methods to astronomical problems. James Berger
(Statistics at Duke University) has worked with astronomers
William Jefferys (University of Texas), Thomas Loredo (Cornell
University), and Alanna Connors (Eureka Inc.) on Bayesian
methodologies for astronomy. Bradley Efron (Statistics at
Stanford University) has worked with astrophysicist Vehé
Petrosian (also at Stanford) on survival methods for interpreting
-ray
bursts. Philip Stark (Statistics at University of
California, Berkeley) has collaborated with solar physicists in
the GONG program to improve analysis of oscillations of the Sun
(helioseismology). More such collaborations exist in the U.S.,
Europe and elsewhere.
3 The Chandra X-ray Observatory is one of NASA's Great Observatories. It was launched in 1999 with a total budget around $2 billion. Back.