In this chapter a brief summary is given of the main methods available for univariate density estimation. Some of the methods will be discussed in greater detail in later chapters, but it is helpful to have a general view of the subject before examining any particular method in detail. Many of the important applications of density estimation are to multivariate data, but since all the multivariate methods are generalizations of univariate methods, it is worth getting a feel for the univariate case first.
Two data sets will be used to help illustrate some of the methods. The first comprises the lengths of 86 spells of psychiatric treatment undergone by patients used as controls in a study of suicide risks reported by Copas and Fryer (1980). The data are given in Table 2.1. The second data set, observations of eruptions of Old Faithful geyser in Yellowstone National Park, USA, is taken from Weisberg (1980), and is reproduced in Table 2.2. I am most grateful to John Copas and to Sanford Weisberg for making these data sets available to me.
1 | 25 | 40 | 83 | 123 | 256 |
1 | 27 | 49 | 84 | 126 | 257 |
1 | 27 | 49 | 84 | 129 | 311 |
5 | 30 | 54 | 84 | 134 | 314 |
7 | 30 | 56 | 90 | 144 | 322 |
8 | 31 | 56 | 91 | 147 | 369 |
8 | 31 | 62 | 92 | 153 | 415 |
13 | 32 | 63 | 93 | 163 | 573 |
14 | 34 | 65 | 93 | 167 | 609 |
14 | 35 | 65 | 103 | 175 | 640 |
17 | 36 | 67 | 103 | 228 | 737 |
18 | 37 | 75 | 111 | 231 | |
21 | 38 | 76 | 112 | 235 | |
21 | 39 | 79 | 119 | 242 | |
22 | 39 | 82 | 122 | 256 | |
4.37 | 3.87 | 4.00 | 4.03 | 3.50 | 4.08 | 2.25 |
4.70 | 1.73 | 4.93 | 1.73 | 4.62 | 3.43 | 4.25 |
1.68 | 3.92 | 3.68 | 3.10 | 4.03 | 1.77 | 4.08 |
1.75 | 3.20 | 1.85 | 4.62 | 1.97 | 4.50 | 3.92 |
4.35 | 2.33 | 3.83 | 1.88 | 4.60 | 1.80 | 4.73 |
1.77 | 4.57 | 1.85 | 3.52 | 4.00 | 3.70 | 3.72 |
4.25 | 3.58 | 3.80 | 3.77 | 3.75 | 2.50 | 4.50 |
4.10 | 3.70 | 3.80 | 3.43 | 4.00 | 2.27 | 4.40 |
4.05 | 4.25 | 3.33 | 2.00 | 4.33 | 2.93 | 4.58 |
1.90 | 3.58 | 3.73 | 3.73 | 1.82 | 4.63 | 3.50 |
4.00 | 3.67 | 1.67 | 4.60 | 1.67 | 4.00 | 1.80 |
4.42 | 1.90 | 4.63 | 2.93 | 3.50 | 1.97 | 4.28 |
1.83 | 4.13 | 1.83 | 4.65 | 4.20 | 3.93 | 4.33 |
1.83 | 4.53 | 2.03 | 4.18 | 4.43 | 4.07 | 4.13 |
3.95 | 4.10 | 2.72 | 4.58 | 1.90 | 4.50 | 1.95 |
4.83 | 4.12 | |||||
It is convenient to define some standard notation. Except where otherwise stated, it will be assumed that we are given a sample of n real observations X1, ... , Xn whose underlying density is to be estimated. The symbol will be used to denote whatever density estimator is currently being considered.