Next Contents Previous

2. CROSS-IDENTIFICATIONS AND DATA INTEGRATION SERVICES

2.1. Multi-wavelength Cross-Identifications and Statistical Associations

Cross-identification refers to the process of establishing which observation in a specific catalog (for example, the FIRST radio survey) corresponds to the same astrophysical source in surveys at other wavelengths (for example, the far-infrared IRAS Faint Source Catalog). The process is much more difficult that it may first appear, because observations taken with different telescopes and at various wavelengths often differ in substantial ways. Positional uncertainties may differ by factors of 2 to 10, or even more; and they may be ellipses with different dimensions and position angles for each source (e.g., IRAS) rather than a constant value across the sky. There are sometimes systematic errors in the astrometry, such that positions in one survey will be offset from those in another. If the angular resolution of one survey (e.g., X-rays from ROSAT) is much lower than another (e.g., near-infrared detections from 2MASS), more then one source in the second survey may contribute to the emission seen in the first survey. If only positional proximity is used to make matches, different flux sensitivity limits and calibration uncertainties (flux error bars) can lead to incorrect cross-identification between a source in one catalog with a physically unrelated source in another catalog that is nearby only in projection along the line of sight. Also, astrophysical sources can display very different structures at different wavelengths because different physical processes are being observed. For example, a radio source may have a core plus lobe emission located on one or both sides of the core, but only the core (galaxy nucleus) is typically detected in a survey at visual wavelengths. There is also a strong wavelength dependency on the amount of extinction due to dust in the interstellar medium of a galaxy; this can shift the centroid of a source measured at a visual wavelength from that measured at an ultraviolet or infrared wavelength. In extreme cases like the merging system Arp 220, a double central morphology in the blue band is produced by a dust lane, while true double nuclei reside inside the dust lane and are visible only at infrared and radio wavelengths. Finally, objects populate a hierarchical Universe: active nuclei, supernovae, and star-forming regions reside in their host galaxies; many galaxies are members of pairs and groups; galaxies, pairs and groups are typically members of clusters, and galaxy clusters reside in superclusters separated by vast voids.

For these reasons, complex relationships between objects are needed (e.g., one-to-one, one-to-many, many-to-one, many-to-many), in addition to statistical associations for cases in which simple, confident one-to-one relationships cannot be established. NED activities revolve around a systematic process of establishing realistic cross-identifications and statistical associations between millions of entries in multi-wavelength catalogs and publications. Cross-identifications are established in an iterative way, being refined as new information becomes available. The NED team works closely with the astronomical community in this process, often resolving disputes and documenting errors in extensive notes that are made available to users. It is sometimes assumed that the NED team establishes cross-IDs in an "old-fashioned", manual way. In fact, the process involves a fairly sophisticated computer program that cross-correlates the positional uncertainties of positions in NED with those in a new input catalog; the output is sorted into lists containing (a) sources that are obviously new to NED, (b) a list of secure matches (cross-identifications) with previously known sources, and (c) a list of "fuzzy" cases that need follow-up analysis of the statistical association parameters for the many reasons outlined above. The association parameters include the separation in arcseconds, the position angle in degrees, and dimensionless parameters r and p that represent measures of the "goodness of fit" of the convolved positional uncertainty ellipses of an input object and a nearby NED object. (2) The NED interface currently makes these catalog source association parameters available to users before they are worked off and turned into cross-identification (where possible); prior to October 2000 these parameters were utilized only internally by the NED team.

2.2. Database Management, Growth and Related Activities

NED does not simply ingest complete catalogs and maintain them in their original form and format. While fundamental data such as positions, redshifts, sizes, and flux/magnitude measurements are assimilated into NED for the purposes of cross-identification and construction of multi-wavelength SEDs, other data are connected in context using pointers to remote archives. For example, extended sources in the Two Micron All Sky Survey (2MASS) and the Sloan Digital Sky Survey (SDSS) catalogs have many types of magnitude measurements. Only the magnitudes from the recommended "default" method from each survey catalog are folded directly into NED; the rest are being made easily accessible using hyperlinks that query complete catalog entries at the survey/mission archive sites. NED provides hyperlinks that issue queries of many major archive services (e.g., High Energy Astrophysics Research Center [HEASARC], Multi-Mission Archive at STScI [MAST], Infrared Science Archive [IRSA] at IPAC, SIMBAD and VizieR services at CDS [Strasbourg, France], NRAO and the NASA Astrophysics Data System [ADS] abstract service and its linked journal Web sites) in a convenient "1-click" fashion that utilizes source names, survey/catalog cross-identifications, and sky positions as the "glue" between the distributed data sets. An illustration of this innovative capability (available online in NED since April 2000) is given in Figure 3. The database contents and relationship pointers are revised and augmented constantly to keep up with new online survey data and knowledge appearing in the literature. Updates to the public database occur approximately every three months after periods of data entry, quality assurance checks, and testing using an internal development version.

Important recent additions to NED's holdings include extragalactic supernovae, the Hubble Deep Fields (North & South), the FAINT IMAGES OF THE RADIO SKY AT TWENTY-CENTIMETERS (FIRST), Eighth Cambridge Radio Catalog (8C), Molonglo Reference Catalog (MRC), Texas, Westerbork and MIT-Greenbelt radio surveys, the Automated Plate Measurement (APM) Bright Galaxy Catalog, the Canadian Network for Observational Cosmology (CNOC) Catalog, and the Las Companas Redshift Survey (LCRS). In addition to folding in data appearing in the current literature, the team is establishing cross-identifications and probabilistic associations between new observations from large surveys (more than 106 objects) and previously known sources in NED. Large surveys which are being assimilated in an incremental fashion at the time of writing include the Two Micron All Sky Survey (2MASS), NRAO VLA Sky Survey (NVSS), Automated Plate Measurement/United Kingdom Schmidt (APM/UKS), and Sloan Digital Sky Survey (SDSS). The total data holdings have been roughly doubling each year. Larger database disks have recently been configured to accommodate a system containing about ten times the current holdings, enough for essential data and relationships for about 50 million objects. Ongoing upgrades to the data management and catalog cross-correlation and association processes, combined with the rapid rates of increasing computer speed and decreasing data storage costs, will allow NED to scale up its data integration activities to handle order of magnitude increases in the number of unique extragalactic sources (and candidates) with their cross-identifications and associations in coming years.

NED staff works in coordination with other NASA archive centers, referred to collectively as the Space Science Data System (SSDS), 3 the CDS (Strasbourg, France) 4, the AAA publications board, Journal editors, authors and referees, IAU Working Groups on Nomenclature, Data and Journals, and the broader astronomical community to improve data handling and archive services. The team provides extensive user support (Help Desk) to answer questions and take users' input for priorities on new developments. 5

2.3. DATABASE CONTENTS

The database contents of NED at the time of writing (July 2001) are summarized in Table 1. This information is updated periodically on the NED home page (Figure 1) after each update to the public database.

Table 1: NED database contents as of July 2001.
4.7 million cross-identifications in thousands of multi-wavelength surveys and journal articles
3.7 million unique extragalactic objects
3.4 million photometric measurements covering gamma-rays through radio wavelengths (with uncertainties) and dynamic spectral energy distribution plots
2.0 million detailed position measurements with uncertainties
1.5 million bibliographic references to 48,000 articles
167,000 redshifts from the published literature
628,000 science-grade FITS images and remote links with "thumbnail" previews 50,000 detailed notes from catalogs and journal publications
26,000 journal article abstracts and 1,150 Ph.D. thesis abstracts

The essential data for sources in NED include positions, redshifts, morphological types, nuclear spectral types, panchromatic photometry, and images. When available, uncertainties in the measurements are also stored and provided by the interface. Photometry data are stored in original units and converted to common frequency (Hz) units and flux density units (W m-2 Hz-1) for construction of Spectral Energy Distributions (see Figure 5); the data are also tagged with their aperture sizes or status as a "total flux" measurement. The extragalactic objects types available in NED are summarized in Table 2.

Table 2: Extragalactic object types in NED.
Galaxies QSOs Radio Sources
Galaxy Pairs QSO Groups Infrared Sources
Galaxy Triples Gravitational Lenses Visual Sources
Galaxy Groups Absorption Line Systems UV Excess Sources
Galaxy Clusters Emission Line Sources X-ray Sources
Supernovae Gamma-ray Sources



2 The first parameter, r, is the distance between the two sources in units of the standard deviations of the convolved uncertainties on the principal axes of the error ellipsoid; mathematically, this is the chi-square parameter with two degrees of freedom evaluated for the observed separation and catalog uncertainties, assuming Gaussian errors, and is dimensionless. The second parameter, p (stored in NED as a base-10 logarithm), is the expected-error density function evaluated for the observed separation. The expected-error density function is the convolution of the error density functions for the two catalogs involved, assuming Gaussian errors; the density function has units of probability mass per steradian. More information about these parameters is available in NED's online documentation. Back.

3 http://ssds.nasa.gov/ Back.

4 http://cdsweb.u-strasbg.fr Back.

5 Inquiries may be emailed to ned@ipac.caltech.edu Back.

Next Contents Previous