Digital Sky Surveys: Software Tools and Technologies

1.3. Scientific Analysis and Exploration

The end goal of digital sky surveys is to enable scientific investigations. In some cases, the goals are so specific that the best approach is to use analysis software designed and optimized to tackle them, e.g., as in the studies of the cosmic microwave background, in gravitational microlensing experiments, etc. However, most of the modern survey data sets are so information-rich, that a wide variety of different scientific studies can be done with the same data. This entails some general tools for the exploration, visualization, and analysis of large survey data sets.

The tools which will be utilized with the new datasets will need to employ cutting-edge techniques from computer science. For example, clustering techniques to detect rare, anomalous, or somehow unusual objects, e.g., as outliers in the parameter space, to be selected for further investigation (i.e., follow-up spectroscopy). Other examples include genetic algorithms to improve current detection and supervised classification methods, new data visualization and presentation techniques, which can convey most of the multidimensional information in a way more easily grasped by a human user, and the use of semi-autonomous AI or software agents to explore the large data parameter spaces and report on the occurrences of unusual instances or classes of objects.

The nature of sky surveys and the resulting data sets dictates the kinds of science one may wish to do with them. For example, most surveys would cover a large solid angle, but not go as deep as the typical ``pointed observations''. This, in turn, drives the kinds of data analysis software needed for their scientific exploration. Broadly speaking, the kinds of astronomical investigations for which surveys are well suited include the following:

Multiwavelength Astronomy

Combining surveys done at different wavelengths, for a more panchromatic view of the universe. Typically, optical IDs are needed for most of the follow-up work. An obvious example are the optical identifications of radio or x-ray sources. A typical basic software requirement is efficient and accurate matching of sources detected in different wavelengths. If positional accuracies and source densities in matched surveys are comparable this is a relatively straightforward task; however, if either of these two conditions is not satisfied, more sophisticated algorithms may be needed to assign matching probabilities to multiple candidates. These techniques can generally be optimized for certain classes of sources by utilizing as much a priori knowledge as possible.

Statistical Astronomy

Studies of the large-scale structure (if one detects galaxies in copious numbers), or studies of the Galactic structure (if one detects stars in copious numbers). The sheer numbers of detected sources make the Poissonian fluctuations unimportant, but systematic errors may limit the results. Accurate and uniform flux calibrations and source classifications are essential for such applications, and possible biases should be modeled. In addition, the sheer size of most surveys allows for further subdivision of various analysis (e.g., by morphological or spectral type) for more specific results.

Rare Object Searches

Searches for rare types of objects or groups of objects, whose frequency of occurrence is so low that huge input source catalogs are necessary in order to find them. This ``needle in a haystack'' search is facilitated by the vast number of sources in current surveys (i.e., a one in a million object would appear a thousand times in a billion source survey). Typically such rare objects are found as outliers in some parameter space, e.g., colors: the bulk of ``ordinary'' sources (normal stars, galaxies, etc.) would form well-defined clusters in the parameter space of observables, and some rare or peculiar types of objects may be found away from them. Unsupervised classification and cluster analysis techniques may be especially useful for this task. This data-mining also facilitates significantly higher efficiencies in performing follow-up spectroscopy of unusual objects. We note that rare objects may belong to known classes (e.g., distant quasars, brown dwarfs, etc.), which can be known to occupy some a priori determined portion of the given parameter space. Alternatively, there may be previously unknown or unexpected types of objects discovered in this manner.