Published in the International Journal of Modern Physics D,
Volume 19, Issue 07, pp. 1049-1106 (2010).
For a PDF version of the article, click here.
astro-ph/0906.2173
Abstract: We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.
Table of Contents
INTRODUCTION
Why Data Mining?
OVERVIEW OF DATA MINING AND MACHINE LEARNING METHODS
Data Collection
Preprocessing of Data
Attribute Selection
Selection and Use of Machine Learning Algorithms
Improving Results
Application of Algorithms and Some Limitations
USES IN ASTRONOMY
Object classification
Photometric redshifts
Other Astrophysical Applications
THE FUTURE
Probability Density Functions
Real-Time Processing and the Time Domain
Petascale Computing
Parallel and Distributed Data Mining
The Virtual Observatory
Visualization
Novel Supercomputing Hardware
CONCLUSIONS
REFERENCES