Published in "Proceedings of SPIE: Astronomical Data Analysis", eds. J. Starck & F. D. Murtagh, 2001, Volume 4477, pages 20-34.


CAPABILITIES OF THE NASA/IPAC EXTRAGALACTIC DATABASE IN THE ERA OF A GLOBAL VIRTUAL OBSERVATORY

Joseph M. Mazzarella, Barry F. Madore, George Helou, and the NED Team a


California Institute of Technology, Jet Propulsion Laboratory, MS 100-22, Pasadena, CA 91125


Abstract. We review the capabilities of the NASA/IPAC Extragalactic Database (NED, http://ned.ipac.caltech.edu) for information retrieval and knowledge discovery in the context of a globally distributed virtual observatory. Since its inception in 1990, NED has provided astronomers world-wide with the results of a systematic cross-correlation of catalogs covering all wavelengths, along with thousands of extragalactic observations culled from published journal articles. NED is continuously being expanded and revised to include new catalogs and published observations, each undergoing a process of cross-identification to capture the current state of knowledge about extragalactic sources in a panchromatic fashion. In addition to assimilating data from the literature, the team is incrementally folding in millions of observations from new large-scale sky surveys such as 2MASS, NVSS, APM, and SDSS. At the time of writing the system contains over 3.3 million unique objects with 4.2 million cross-identifications. We summarize the recent evolution of NED from its initial emphasis on object name-, position-, and literature-based queries into a research environment that also assists statistical data exploration and discovery using large samples of objects. Newer capabilities enable intelligent "Web mining" of entries in geographically distributed astronomical archives that are indexed by object names and positions in NED, sample building using constraints on redshifts, object types and other parameters, as well as image and spectral archives for targeted or serendipitous discoveries. A pilot study demonstrates how NED is being used in conjunction with linked survey archives to characterize the properties of galaxy classes to form a training set for machine learning algorithms; an initial goal is production of statistical likelihoods that newly discovered sources belong to known classes, represent statistical outliers, or candidates for fundamentally new types of objects. Challenges and opportunities for tighter integration of NED capabilities into data mining tools for astronomy archives are also discussed.


Keywords: databases, information retrieval, knowledge discovery, classification, data mining


Table of Contents

INTRODUCTION

CROSS-IDENTIFICATIONS AND DATA INTEGRATION SERVICES
Multi-wavelength Cross-Identifcations and Statistical Associations
Database Management, Growth and Related Activities
DATABASE CONTENTS

INTERFACES AND QUERY SERVICES
Web Query Services
Objects
Global Archive Connectivity
Sample Building with `By Parameter' Queries
Data
Literature
Tools
Information
Batch Mode
Client/Server Mode Connectivity
Current and Future Technologies

NED IN THE ERA OF A GLOBAL VIRTUAL OBSERVATORY
Current Capabilities
Future Enhancements

NED AS A RESOURCE FOR KDD
Fusion and Classification Using Large Astronomical Databases
A Pilot Study

SUMMARY

REFERENCES

For a Postscript version of the article, click here.



a For additional information send correspondence to J.M.M. (mazz@ipac.caltech.edu) or B.F.M. (barry@ipac.caltech.edu).

Next