Statistical Challenges in Modern Astronomy - E.D. Feigelson & G.J. Babu

3. ASTROSTATISTICS TODAY

We thus find that astronomy and astrophysics today requires a vast range of statistical capabilities. In statistical jargon, it helps for astronomers to know something about: sampling theory, survival analysis with censoring and truncation, measurement error models, multivariate classification and analysis, harmonic and autoregressive time series analysis, wavelet analysis, spatial point processes and continuous surfaces, density estimation, linear and non-linear regression, model selection, and bootstrap resampling. In some cases, astronomers need combinations of methodologies that have not yet been fully developed (Section 6 below).

Faced with such a complex of challenges, mechanical exposure to a wider variety of techniques is a necessary but not sufficient prerequisite for high-quality statistical analyses. Astronomers also need to be imbued with established principles of statistical inference; e.g., hypothesis testing and parameter estimation, nonparametric and parametric inference, Bayesian and frequentist approaches, and the assumptions underlying and applicability conditions for any given statistical method.

Unfortunately, we find that the majority of the thousands of astronomical studies requiring statistical analyses use a very limited set of classical methods. The most common tools used by astronomers are: Fourier transforms for temporal analysis (developed by Fourier in 1807), least squares regression and chi ² goodness-of-fit (Legendre in 1805, Pearson in 1900, Fisher in 1924), the nonparametric Kolmogorov-Smirnov 1- and 2-sample nonparametric tests (Kolmogorov in 1933), and principal components analysis for multivariate tables (Hotelling in 1936).

Even traditional methods are often misused. Feigelson & Babu [9] found that astronomers use interchangeably up to 6 different fits for bivariate linear least squares regression: ordinary least squares (OLS), inverse regression, orthogonal regression, major axis regression, the OLS mean, and the OLS bisector. Not only did this lead to confusion in comparing studies (e.g., in measuring the expansion of the Universe via Hubble's constant, H_o), but astronomers did not realize that the confidence intervals on the fitted parameters can not be correctly estimated with standard analytical formulae. Similarly, Protassov et al. [24] found that the majority of astronomical applications of the F test, or more generally the likelihood ratio test, are inconsistent with asymptotic statistical theory.

But, while the average astronomical study is limited to often-improper usage of a limited repertoire of statistical methods, a significant tail of outliers are much more sophisticated. The maximization of likelihoods, often developed specially for the problem at hand, is perhaps the most common of these improvements. Bayesian approaches are also becoming increasingly in vogue.

In a number of cases, sometimes buried in technical appendices of observational papers, astronomers independently develop statistical methods. Some of these are rediscoveries of known procedures; for example, Avni et al. [2] and others recovered elements of survival analysis for treatments of left-censored data arising from nondetections of known objects. Some are quite possibly mathematically incorrect; such as various revisions to chi ² for Poissonian data that assume the resulting statistic still follows the chi ² distribution. On rare occasions, truly new and correct methods have emerged; for example, astrophysicist Lynden-Bell [19] discovered the maximum-likelihood estimator for a randomly truncated dataset, for which the theoretical validity was later established by statistician Woodroofe [31].

A growing group of astronomers, recognizing the potential for new liaisons with the accomplishments of modern statistics, have promoted astrostatistical innovation through cross-disciplinary meetings and collaborations. Fionn Murtagh, an applied mathematician at Queen's University (Belfast) with long experience in astronomy, and his colleagues have run conferences and authored many useful monographs (e.g., [16], [17], [22] and [27]). We at Penn State have run a series of Statistical Challenges in Modern Astronomy meetings with both communities in attendance (e.g., [3] and [10]). Alanna Connors has organized brief statistics sessions at large astronomy meetings, and we have organized brief astronomy sessions at large Joint Statistical meetings. We wrote a short volume called Astrostatistics [3] intended to familiarize scholars in one discipline with relevant issues in the other discipline. Other series conferences are devoted to technical issues in astronomical data analysis but typically have limited participation by statisticians. These include the dozen Astronomical Data Analysis Software and Systems (e.g., [23]), several Erice workshops on Data Analysis in Astronomy (e.g., [8]), and the new SPIE Astronomical Data Analysis conferences (e.g., [26]).

Most importantly, several powerful astrostatistical research collaborations have emerged. At Harvard University and the Smithsonian Astrophysical Observatory, David van Dyk worked with scientists at the Chandra ⁽³⁾ X-ray Center on several issues, particularly Bayesian approaches to parametric modeling of spectra in light of complicated instrumental effects. At Carnegie Mellon University and the University of Pittsburgh, the Pittsburgh Computational Astrophysics group addressed several issues, such as developing powerful techniques for multivariate classification of extremely large datasets and applying nonparametric regression methods to cosmology. Both of these groups involved academics, researchers and graduate students from both fields working closely for several years to achieve a critical mass of cross-disciplinary capabilities.

Other astrostatistical collaborations must be mentioned. David Donoho (Statistics at Stanford University) works with Jeffrey Scargle (NASA Ames Research Center) and others on applying advanced wavelet methods to astronomical problems. James Berger (Statistics at Duke University) has worked with astronomers William Jefferys (University of Texas), Thomas Loredo (Cornell University), and Alanna Connors (Eureka Inc.) on Bayesian methodologies for astronomy. Bradley Efron (Statistics at Stanford University) has worked with astrophysicist Vehé Petrosian (also at Stanford) on survival methods for interpreting gamma -ray bursts. Philip Stark (Statistics at University of California, Berkeley) has collaborated with solar physicists in the GONG program to improve analysis of oscillations of the Sun (helioseismology). More such collaborations exist in the U.S., Europe and elsewhere.

³ The Chandra X-ray Observatory is one of NASA's Great Observatories. It was launched in 1999 with a total budget around $2 billion. Back.