Next Contents Previous

2. SURVEY OF EXISTING METHODS

2.1. Introduction

In this chapter a brief summary is given of the main methods available for univariate density estimation. Some of the methods will be discussed in greater detail in later chapters, but it is helpful to have a general view of the subject before examining any particular method in detail. Many of the important applications of density estimation are to multivariate data, but since all the multivariate methods are generalizations of univariate methods, it is worth getting a feel for the univariate case first.

Two data sets will be used to help illustrate some of the methods. The first comprises the lengths of 86 spells of psychiatric treatment undergone by patients used as controls in a study of suicide risks reported by Copas and Fryer (1980). The data are given in Table 2.1. The second data set, observations of eruptions of Old Faithful geyser in Yellowstone National Park, USA, is taken from Weisberg (1980), and is reproduced in Table 2.2. I am most grateful to John Copas and to Sanford Weisberg for making these data sets available to me.

Table 2.1 Lengths of treatment spells (in days) of control patients in suicide study.

1 25 40 83 123 256
1 27 49 84 126 257
1 27 49 84 129 311
5 30 54 84 134 314
7 30 56 90 144 322
8 31 56 91 147 369
8 31 62 92 153 415
13 32 63 93 163 573
14 34 65 93 167 609
14 35 65 103 175 640
17 36 67 103 228 737
18 37 75 111 231  
21 38 76 112 235  
21 39 79 119 242  
22 39 82 122 256  

Table 2.2 Eruption lengths (in minutes) of 107 eruptions of Old Faithful geyser.

4.37 3.87 4.00 4.03 3.50 4.08 2.25
4.70 1.73 4.93 1.73 4.62 3.43 4.25
1.68 3.92 3.68 3.10 4.03 1.77 4.08
1.75 3.20 1.85 4.62 1.97 4.50 3.92
4.35 2.33 3.83 1.88 4.60 1.80 4.73
1.77 4.57 1.85 3.52 4.00 3.70 3.72
4.25 3.58 3.80 3.77 3.75 2.50 4.50
4.10 3.70 3.80 3.43 4.00 2.27 4.40
4.05 4.25 3.33 2.00 4.33 2.93 4.58
1.90 3.58 3.73 3.73 1.82 4.63 3.50
4.00 3.67 1.67 4.60 1.67 4.00 1.80
4.42 1.90 4.63 2.93 3.50 1.97 4.28
1.83 4.13 1.83 4.65 4.20 3.93 4.33
1.83 4.53 2.03 4.18 4.43 4.07 4.13
3.95 4.10 2.72 4.58 1.90 4.50 1.95
4.83 4.12          

It is convenient to define some standard notation. Except where otherwise stated, it will be assumed that we are given a sample of n real observations X1, ... , Xn whose underlying density is to be estimated. The symbol hat f will be used to denote whatever density estimator is currently being considered.

Next Contents Previous