Published in "Proceedings of SPIE: Astronomical Data Analysis", eds. J. Starck & F. D. Murtagh, 2001, Volume 4477, pages 20-34.

For a Postscript version of the article, click here.


Joseph M. Mazzarella, Barry F. Madore, George Helou, and the NED Team a

California Institute of Technology, Jet Propulsion Laboratory, MS 100-22, Pasadena, CA 91125

Abstract. We review the capabilities of the NASA/IPAC Extragalactic Database (NED, for information retrieval and knowledge discovery in the context of a globally distributed virtual observatory. Since its inception in 1990, NED has provided astronomers world-wide with the results of a systematic cross-correlation of catalogs covering all wavelengths, along with thousands of extragalactic observations culled from published journal articles. NED is continuously being expanded and revised to include new catalogs and published observations, each undergoing a process of cross-identification to capture the current state of knowledge about extragalactic sources in a panchromatic fashion. In addition to assimilating data from the literature, the team is incrementally folding in millions of observations from new large-scale sky surveys such as 2MASS, NVSS, APM, and SDSS. At the time of writing the system contains over 3.3 million unique objects with 4.2 million cross-identifications. We summarize the recent evolution of NED from its initial emphasis on object name-, position-, and literature-based queries into a research environment that also assists statistical data exploration and discovery using large samples of objects. Newer capabilities enable intelligent "Web mining" of entries in geographically distributed astronomical archives that are indexed by object names and positions in NED, sample building using constraints on redshifts, object types and other parameters, as well as image and spectral archives for targeted or serendipitous discoveries. A pilot study demonstrates how NED is being used in conjunction with linked survey archives to characterize the properties of galaxy classes to form a training set for machine learning algorithms; an initial goal is production of statistical likelihoods that newly discovered sources belong to known classes, represent statistical outliers, or candidates for fundamentally new types of objects. Challenges and opportunities for tighter integration of NED capabilities into data mining tools for astronomy archives are also discussed.

Keywords: databases, information retrieval, knowledge discovery, classification, data mining

Table of Contents

a For additional information send correspondence to J.M.M. ( or B.F.M. (