Statistical Challenges in Modern Astronomy - E.D. Feigelson & G.J. Babu

2. STATISTICAL NEEDS OF ASTRONOMY TODAY

Contemporary astronomy abounds in questions of a statistical nature. In addition to exploratory data analysis and simple heuristic (usually linear) modeling common in other fields, astronomers also often interpret data in terms of complicated non-linear models based on deterministic astrophysical processes. The phenomena studied must obey known behaviors of atomic and nuclear physics, gravitation and mechanics, thermodynamics and radiative processes, and so forth. `Modeling' data may thus involves both the selection of a model family based on an astrophysical understanding of the conditions under study, and a statistical effort to find parameters for the specified model. A wide variety of issues thus arise:

Does an observed group of stars (or galaxies or molecular clouds or -ray sources) constitute a typical and unbiased sample of the vast underlying population of similar objects?

When and how should we divide/classify these objects into 2, 3 or more subclasses?

What is the intrinsic physical relationship between two or more properties of a class of objects, especially when confounding variables or observational selection effects are present?

How do we answer such questions in the presence of observations with measurements errors and flux limits?

When is a blip in a spectrum (or image or time series) a real signal rather than a random event from Gaussian (or often Poissonian) noise or confounding variables?

How do we interpret the vast range of temporally variable objects: periodic signals from rotating stars or orbiting extrasolar planets, stochastic signals from accreting neutron stars or black holes, explosive signals from magnetic reconnection flares or -ray bursts?

How do we model the points in 2, 3, ..., 6-dimensional points representing photons in an image, galaxies in the Universe, Galactic stars in phase space?

How do we quantify continuous structures seen in the sky such as the cosmic microwave background, the interstellar and intergalactic gaseous media?

How do we fit astronomical spectra to highly non-linear astrophysical models based on atomic physics and radiative processes, including confidence limits on the best-fit parameters?

From a superficial examination of the astronomical literature ⁽²⁾, we can show that such questions are very common today. Of appeq 15, 000 refereed papers published annually, 1% have "statistics" or "statistical" in their title, 5% have "statistics" in their abstract, 10% treat time-variable objects, 5 - 10% (est.) present or analyze multivariate datasets, and 5 - 10% (est.) fit parametric models. Accounting for overlaps, we roughly estimate that around appeq 3, 000 distinct studies each year require non-trivial statistical methodologies. Roughly 10% of these are principally involved with statistical methods; indeed, some of these purport to develop new methods or improve on established ones.

² Such bibliometric measures are easily accomplished as the entire astronomical research literature is on-line (in full text at subscribing institutions) through the NASA-supported Astrophysics Data System, http://adsabs.harvard.edu/abstract_service.html. Back.