1.1 Definitions and Caveats
Sky surveys are at the historical core of astronomy. Charting and monitoring the sky gave rise to our science, and today large digital sky surveys are transforming the ways astronomy is done. In this Chapter we review some of the general issues related to the strategic goals, planning, and execution of modern sky surveys, and describe some of the currently popular ones, at least as of this writing (late 2011). This is a rapidly evolving field, and the reader should consult the usual sources of information about the more recent work.
Some caveats are in order: The very term "sky surveys" is perhaps too broad and loosely used, encompassing a very wide range of the types of studies and methods. Thus, we focus here largely on the wide-field, panoramic sky surveys, as opposed, e.g., to specific studies of deep fields, or to heavily specialized surveys of particular objects, types of measurements, etc. We also have a bias towards the visible regime, reflecting, at least partly, the authors' expertise, but also a deeper structure of astronomy: most of the science often requires a presence of a visible counterpart, regardless of the wavelength coverage of the original detection. We also focus mainly on the imaging surveys, with a nod to the spectroscopic ones. However, many general features of surveys in terms of the methods, challenges, strategies, and so on, are generally applicable across the wavelengths and types of observations. There are often no sharp boundaries between different kinds of surveys, and the divisions can be somewhat arbitrary. Finally, while we outline very briefly the kinds of science that is done with sky surveys, we do not go into any depth for any particular kind of studies or objects, as those are covered elsewhere in these volumes.
It is tempting to offer a working definition of a survey in this context. By a wide-field survey, we mean a large data set obtained over areas of the sky that may be at least of the order of ~ 1% of the entire sky (admittedly an arbitrary choice), that can support a variety of scientific studies, even if the survey is devised with a very specific scientific goal in mind. However, there is often a balance between the depth and the area coverage. A deep survey (e.g., studies of various deep fields) may cover only a small area and contain a relatively modest number of sources by the standards of wide-field surveys, and yet it can represent a survey in its own right, feeding a multitude of scientific studies. A wide-field coverage by itself is not a defining characteristic: for example, all-sky studies of the CMBR may be better characterized as focused experiments, rather than as surveys, although that boundary is also getting fuzzier.
We also understand that "a large data set" is a very relative and rapidly changing concept, since data rates increase exponentially, following Moore's law, so perhaps one should always bear in mind a conditional qualifier "at that time". Some types of surveys are better characterized by a "large" number of sources (the same time-dependent conditional applies), which is also a heavily wavelength-dependent measure; for example, nowadays a thousand is still a large number of -ray sources, but a trivial number of visible ones.
Perhaps the one unifying characteristic is that surveys tend to support a broad variety of studies, many of which haven't been thought of by the survey's originators. Another unifying characteristic is the exploratory nature of surveys, which we address in more detail in Sec. 3 below. Both approaches can improve our knowledge of a particular scientific domain, and can lead to surprising new discoveries.
The meaning of the word "survey" in the astronomical context has also changed over the years. It used to refer to what we would now call a sky atlas (initially hand-drawn sky charts, and later photographic images), whereas catalogs of sources in them were more of a subsidiary or derived data product. Nowadays the word largely denotes catalogs of sources and their properties (positions, fluxes, morphology, etc.), with the original images provided almost as a subsidiary information, but with an understanding that sometimes they need to be reprocessed for a particular purpose. Also, as the complexity of data increased, we see a growing emphasis on carefully documented metadata ("data about the data") that are essential for the understanding of the coverage, quality, and limitations of the primary survey data.
1.2. The Types and Goals of Sky Surveys
We may classify surveys in regard to their scientific motivation and strategy, their wavelength regime, ground-based vs. space-based, the type of observations (e.g., imaging, spectroscopy, polarimetry, etc.), their area coverage and depth, their temporal character (one-time vs. multi-epoch), as panoramic (covering a given area of the sky with all sources therein) or targeted (observing a defined list of sources), and can have any combination of these characteristics. For example, radio surveys generally produce data cubes, with two spatial and one frequency dimension, and are thus both imaging and spectroscopic, and often including the polarization as well. X-ray and -ray images generally also provide some energy resolution. Slitless spectroscopy surveys (images taken through an objective prism, grating, or a grism) provide wavelength-dispersed images of individual sources. Surveys can be also distinguished by their angular, temporal, or energy resolution.
Surveys may be scientifically motivated by a census of particular type of sources, e.g., stars, galaxies, or quasars, that may be used for statistical studies such as the Galactic structure or the Large-Scale Structure (LSS) in the universe. They may be aimed to discover significant numbers of a particular type of objects, often relatively rare ones, for the follow-up studies, e.g., Supernovae (SNe), high-redshift galaxies or quasars, brown dwarfs, etc. When a new domain of an observable parameter space opens up, e.g., a previously unexplored wavelength regime, it usually starts with a panoramic survey, to see what kinds of objects or phenomena populate it.
Therein lies perhaps the key scientific distinction between surveys and the traditional, targeted astronomical observations: surveys aim to map and characterize the astrophysical contents of the sky or of the populations of objects of particular kinds in a systematic manner, whereas the traditional observations focus on detailed properties of individual sources or relatively small numbers of them. Surveys are often the ways to find such targets for detailed studies.
The first type of survey science - use of large, statistical samples of objects of some kind (stars, galaxies, etc.) as probes of some collective properties (e.g., Galactic structure, or LSS) - may be done with the survey data alone, or may be supplemented by additional data from other sources. The other two types of survey science - as a discovery mechanism for rare, unusual, or new types of objects or phenomena, of as a pure initial exploration of some new domain of the observable parameter space - require targeted follow-up observations. Thus surveys become a backbone of much of astronomical research today, forming a fundamental data infrastructure of astronomy. This may make them seem less glamorous than the successful targeted observations that may be enabled by surveys, but in does not diminish their scientific value.
Imaging surveys are commonly transformed into catalogs of detected sources and their properties, but in some cases images themselves represent a significant scientific resource, e.g., if they contain extended structures of diverse morphologies; for example, images of star-forming regions, or stellar bubbles and SN remnants in H images.
The process of detection and characterization of discrete sources in imaging surveys involves many challenges and inevitably introduces biases, since these processes always assume that the sources have certain characteristics in terms of a spatial extent, morphology, and so on. We discuss these issues further in Sec. 5.
Like most astronomical observations, surveys are often enabled by new technologies, and push them to their limits. Improved detector and telescope technologies can open new wavelength regimes, or more sensitivity or resolution, thus providing some qualitatively new view of the sky. A more recent phenomenon is that information and computation technologies (ICT) dramatically increased our ability to gather and process large quantities of data, and that quantitative change has led to some interesting qualitative changes in the ways we study the universe.
A direct manifestation of this is the advent of large synoptic sky surveys, that cover large areas of the sky repeatedly and often, thus opening the time domain as new arena for exploration and discovery. They are sometimes described as a transition from a panoramic cosmic photography to a panoramic cosmic cinematography. We describe some examples below.
Spectroscopic surveys, other than the data cubes generated in radio astronomy, typically target lists of objects selected from imaging surveys. In case of extragalactic surveys, the primary goal is typically to obtain redshifts, as well as to determine some physical properties of the targets, e.g., star formation rates, or presence and classification of active galactic nuclei (AGN), if any. If the targets are observed with long slit or multi-slit mask spectrographs, or integral field units (IFU), information can be obtained about the kinematics of resolved structures in galaxies, typically emission-line gas. In case of Galactic survey, the goals are typically to measure radial velocities, and sometimes also the chemical abundances of stars.
Spectroscopic surveys depend critically on the quality of the input catalogs from which the targets are selected, inheriting any biases that may be present. Their observing strategies in terms of the depth, source density, spectroscopic resolution, etc., are determined by the scientific goals. Since spectroscopy is far more expensive than imaging in terms of the observing time, some redshift surveys have adopted a sparse-sampling strategy, e.g., by observing every Nth (where N = 2, or 10, or ...) source in a sorted list of targets, thus covering a larger area, but with a corresponding loss of information.
Our observations of the sky are no longer confined to the electromagnetic window. Increasingly, sky is being monitored in high-energy cosmic rays (Kotera & Olinto 2011), neutrinos (Halzen & Klein 2010), and even gravitational waves (Centrella 2010). So far, these information channels have been characterized by a paucity of identified sources, largely due to the lack of a directional accuracy, with the exceptions of the Sun and SN 1987A in neutrinos, but they will likely play a significant role in the future.
Finally, as numerical simulations become ever larger and more complex, and theory is expressed as data (the output of simulations), we may start to see surveys of simulations, as means of characterizing and quantifying them. These surveys of theoretical universes would have to be compared to the measurements obtained in the surveys of the actual universe. New knowledge often arises as theories are confronted with data, and in the survey regime, we will be doing that on a large scale.
1.3. The Data Explosion
In 1990's, astronomy transitioned from a relatively data-poor science to an immensely data-rich one, and the principal agent of change were large digital sky surveys. They, in turn, were enabled by the rapid advances in ICT. Sky surveys became the dominant data sources in astronomy, and this trend continues (Brunner et al. 2001c). The data volume in astronomy doubles at Moore's law pace, every year to a year and a half (Szalay & Gray 2001, Gray & Szalay 2006), reflecting the growth of the technology that produces the data. (Obviously, the sheer size of data sets by itself does not imply a large scientific value; for example, very deep images from space-based observatories may have a modest size in bits, but an immense scientific value.)
In the past, surveys and their derived catalogs could be published as printed papers or small sets of volumes that can be looked up "by hand" (this is still true in some regimes, e.g., the -ray astronomy, or other nascent fields). But as the data volumes entered the Terascale regime in the 1990's, and the catalogs of sources started containing millions of objects, there was an inevitable transition to a purely electronic publication and dissemination, e.g., in the form of the web-accessible archives, that also provide access to the necessary metadata and other documentation. Databases, data mining, web services, and other computational tools and techniques, became a standard part of astronomy's tool chest, although the community is still gradually gaining their familiarity with them. This is an aspect of an inevitable culture change, as we enter the era of a data-rich, data-intensive science.
The growth of data quantity, coupled with an improved data homogeneity, enabled a new generation of statistical or population studies: with samples of hundreds of millions of sources, the Poissonian errors were no longer important, and one could look for subtle effects simply not accessible with the more limited data sets. Equally important was the growth of data quality and data complexity. The increased information content of the modern sky surveys enabled a profitable data mining: the data could be used for a much broader variety of studies than it was possible in the past.
For these reasons, survey-enabled astronomy became both popular and respectable. But it was obvious that data fusion across different surveys (e.g., over different wavelengths) has an even higher scientific potential, as it can reveal knowledge that is present in the combined data, but cannot be recognized in any individual data set, no matter how large. Historical examples from multi-wavelength cross-correlations abound, e.g., the discoveries of quasars, ultraluminous starbursts, interpretation of -ray bursts, etc. The new, data-rich astronomy promised to open this discovery arena wholesale.
There are many non-trivial challenges posed by the handling of large, complex data sets, and knowledge discovery in them: how to process, and calibrate the raw data; how to store, combine, and access them using modern computing hardware and networks; and how to visualize, explore and analyses these great data sets quickly and efficiently. This is a rapidly developing field, increasingly entails collaborative efforts between astronomers and computer scientists.
The rise of data centers was the response to dealing with individual large data sets, surveys, or data collections. However, their fusion and the scientific synthesis required more than just their interoperability. This prompted the rise of the Virtual Observatory (VO) concept, as a general, distributed research environment for astronomy with large and complex data sets (Brunner et al. 2001a, Hanisch 2001, 2010, Djorgovski & Williams 2005). Today, sky surveys are naturally included in an evolving world-wide ecosystem of astronomical data resources and services. The reader is directed to the VO-related websites or their future equivalents, for an up to date description of the data assets and services, and access to them.
Astronomy was not alone in facing the challenges and the opportunities of an exponential data growth. Virtual scientific organizations with similar mandates emerged in many other fields, and continue to do so. This entire arena of a computationally-enabled, data-driven science is sometimes referred to as Cyber-Infrastructure, or e-Science, unified by the common challenges and new scientific methodologies (Atkins et al. 2003, Hey & Trefethen 2003, 2005, Djorgovski 2005, Hey et al. 2009, Bell et al. 2009). Nowadays we also see the blossoming of "science informatics", e.g., Astroinformatics (by analogy with its bio-, geo-, etc., counterparts). These are broader concepts of scientific and methodological environments and communities of interest, that seek to develop and apply new tools for the data-rich science in the 21st century.