Next Contents Previous

5. SOME GRAND METHODOLOGICAL CHALLENGES FOR THE COMING DECADE

While it is risky to prognosticate the directions of future research, and judgments will always differ regarding the relative importance of research goals, we can outline a few "grand challenges" for astrostatistical research for the next decade or two.

5.1. Multivariate analysis with measurement errors and censoring

Traditional multivariate analysis is designed mainly for applications in the social and human sciences where the sources of variance are largely unknowable. Measurement errors are usually ignored, or are considered to be exogenous variables in the parametric models [12]. But astrophysicists often devote as much effort to precise determination of their errors as they devote to the measurements of the quantities of interest. The instruments are carefully calibrated to reduce systematic uncertainties, and background levels and random fluctuations are carefully evaluated to determine random errors. Except in the simple case of bivariate regression [1, 5, 9], this information on measurement errors is usually squandered.

While heteroscedastic measurement errors with known variances is common in all physical sciences, only astronomy frequently has nondetections when observations are made at new wavelengths of known objects. These are datapoints where the signal lies below (say) 3 times the noise level. Here again, modern statistics has insufficient tools. Survival analysis for censored data assumes that the value below which the data point must lie is known with infinite precision, rather than being generated from a distribution of noise. Astronomer Herman Marshall [20] makes an interesting attempt to synthesize measurement errors and nondetections, but statistician Leon Gleser [14] argues that he has only recovered Fisher's failed theory of fiducial distributions. Addressing this issue in a self-consistent statistical theory is a profound challenges that lies at the heart of interpreting the data astronomers obtain at the telescope.

5.2. Statistical inference and visualization with very-large-N datasets

The need for computational software for extremely large databases - multi-terabyte image and spectrum libraries and multi-billion object catalogs - is discussed in section 4. A suite of approximate methods based on flowing data streams or adaptive sampling of large datasets resident on hard disks should be sought. Visualization methods involving smoothing, multidimensional shading and variable transparency, should be brought into the astronomer's toolbox. Here, considerable work is being conducted by computer scientists and applied mathematicians in other applied fields so that independent development by astrostatisticians might not be necessary to achieve certain goals.

5.3. A cookbook for construction of likelihoods and Bayesian computation

While the concepts of likelihoods and their applications in maximum likelihood estimation, Bayes Theorem and Bayes factors are becoming increasingly well-known in astronomical research, the applications to real-life problems is still an art for the expert rather than a tool for the masses. Part of the problem is conceptual; astronomers need training in how to construct likelihoods for familiar parametric situations (e.g., power law distributions or a Poisson process). Part of the problem is computational; astronomers need methods and software for the oft-complex computations. Many such methods, such as Markov chain Monte Carlo, are already well-established and can be directly adopted for astronomy [13]. For example, astronomers are often not fully aware of the broad applicability of the EM Algorithm for maximizing likelihoods [21] (6).

5.4. Links between astrophysical theory and wavelets

Wavelet analysis has become a powerful and sophisticated tool for the study of features in data. Originally intended mainly for modelling time series, astronomers also use it increasingly for spatial analysis of images [11, 25]. In some ways it can be viewed as a generalization of Fourier analysis in which the basis function need not be sinusoidal in shape and, most importantly, the pattern need not extend over the entire dataset. Wavelets are thus effective in quantitatively describing complicated overlapping structures on many scales, and can also be used for signal denoising and compression. In addition, wavelets have a strong mathematical foundation.

Despite its increasing popularity in astronomical applications, wavelet analysis suffers a profound limitation in comparison with Fourier analysis. A peak in a Fourier spectrum is immediately interpretable as a vibrational, rotational or orbital rotation of solid bodies. A bump or a continuum slope in a wavelet decomposition often has no analogous physically intuitive interpretation. We therefore recommend that astrophysicists seek links between physical theory - often involving continuous media such as turbulent plasmas in the interstellar medium and hierarchical structure formation in the early Universe - and wavelets. One fascinating example is the demonstration that the wavelet spectrum and Lyapunov exponent of the quasi-periodic X-ray emission from Sco X-1, which reflects the processes in an accretion disk around a neutron star, exhibit a transient chaotic behavior similar to that of water condensing and dripping onto an automobile windshield or a dripping handrail [32].

5.5. Time series models for astrophysical phenomena

The quasi-periodic oscillation of Sco X-1 is only one of many examples of complex accretional behavior onto neutron stars and black holes seen in X-ray and gamma-ray astronomy. The accreting Galactic black hole GRS 1915+105 exhibits a bewildering variety of distinct states of stochastic, quasi-periodic and explosive behaviors. The prompt emission from gamma-ray bursts show a fantastic diversity of temporal behaviors from simple smooth fast-rise-exponential-decays to stochastic spiky profiles. Violent magnetic reconnection flares on the surfaces of the Sun and other magnetically active stars also show complex behaviors. Many of these datasets are multivariate with time series available in several spectral bands often showing lags or hardness ratio variations of astrophysical interest.

There are also important astronomical endeavors which seek astrophysically interesting signals amidst the oft-complex noise characteristics of the detectors. The Arecibo, Parkes and VLA radio telescopes, for example, conduct searches for new radio pulsars or for extraterrestrial intelligences in nearby planetary systems. The Laser Interferometer Gravitational-Wave Observatory (LIGO) and related detectors search for both continuing periodic signals and brief bursts from perturbations in space-time predicted by Einstein's General Relativity. Here the signals sought are orders of magnitude fainter than instrumental variations.



6 The seminal study of the EM Algorithm is Dempster, Laird & Rubin in 1977 [7], which is one of the most frequently cited papers in statistics. However, the method was independently derived three years earlier by astronomer Leon Lucy [18] as an "iterative technique for the rectification of observed distributions" based on Bayes' Theorem. This study is widely cited in the astronomical literature; its most frequent application is in image deconvolution where it is known as the Lucy-Richardson algorithm. Back.

Next Contents Previous