Suppose we were to toss an unbiased coin 4 times in succession. What is the probability, P(k), of obtaining k heads? There are 16 different ways the coins might land; each is equally probable. Let's write down all 16 but group them according to how many heads appear, using the binary notation 1 = heads, 0 = tails:
Sequence | No. heads | P(k) |
0000 | 0 | P(0) = 1/16 |
1000 | 1 | |
0100 | 1 | P(1) = 4/16 = 1/4 |
0010 | 1 | |
0001 | 1 | |
1100 | 2 | |
1010 | 2 | |
1001 | 2 | |
0110 | 2 | P(2) = 6/16 = 3/8 |
0101 | 2 | |
0011 | 2 | |
1110 | 3 | |
1101 | 3 | |
1011 | 3 | P(3) = 4/16 = 1/4 |
0111 | 3 | |
1111 | 4 | P(4) = 1/16 |
From Eq. A. 1, the probability of a given sequence is 1/16. From Eq. A.2, the probability of obtaining the sequence 1000, 0100, 0010, or 0001, i.e., a sequence in which 1 head appears, is 1/16 + 1/16 + 1/16 + 1/16 = 1/4. The third column lists these probabilities, P(k).
The results of these calculations can be summarized by plotting P(k) as a function of k, as shown in Fig. A.1. Such a plot is called a theoretical probability distribution. The peak value, k = 2, is the most probable value. Since the curve is symmetric about k = 2, this value also must be the mean value. One can compute the mean or expectation value of k by adding the number of heads obtained for each of the sequences shown in the table and dividing by 16, or - and this is the same thing - by weighting each possible value of k by the probability of obtaining that value, P(k), and computing the sum:
The value of this sum is (0)(1/16) + (1)(1/4) + (2)(3/8) + (3)(1/4) + (4)(1/16) = 0 + 1/4 + 3/4 + 3/4 + 1/4 = 2, as expected. Note that
The probability of obtaining 0, 1, 2, 3, or 4 heads is 1. The distribution is properly normalized.
Note also that if a is a constant (does not depend on the variable k),
and
It is useful to have some measure of the width or spread of a distribution about its mean. One might compute <k - µ>, the expectation value of the deviation of k from the mean µ = <k>, but the answer always comes out 0. It makes more sense to compute the expectation value of the square of the deviation, namely
This quantity is called the variance. Its square root, , is called the standard deviation. Since µ = <k> and µ^{2} = <k>^{2} are constants, Eq. A.13 can be simplified. It follows from Eqs. A.11 and A.12 that
where
For the distribution of Fig. A.1, <k^{2}> = (0)(1/16) + (1) (1/4) + (4)(3/8) + (9)(1/4) + (16)(1/16) = 0 + 1/4 + 6/4 + 9/4 + 1 = 5, <k>^{2} = 4, and ^{2} = 5 - 4 = 1.
It is instructive to sit down and actually flip a coin 4 times in succession, count the number of heads, and then repeat the experiment a large number of times. One can then construct an experimental probability distribution, with P(0) equal to the number of experiments that give 0 heads divided by the total number of experiments, P(1) equal to the number of experiments that give 1 head divided by the total number of experiments, etc. Two such distributions are shown in Fig. A.2. In the first (x), the total number of experiments was 10. In the second (o), the total number was 100.
Figure A.2. Two experimental probability distributions. In one (x), a coin was flipped 4 times in 10 successive experiments; the mean value is 2.30, the standard deviation is 1.22. In the other (o), a coin was flipped 4 times in 100 successive experiments; the mean value is 2.04, the standard deviation 1.05. The dashed curve is the theoretical probability distribution of Fig. A.1. |
If you do not like flipping coins, you can build a probability machine of the sort shown in Fig. A.3. If the machine is level and well made, the probability that a ball bounces to the right or the left on striking the next pin is 1/2. The number of successive trials is equal to the number of rows of pins. If you drop 100 balls through this machine, they will pile up in the bins at the bottom, forming a distribution like the one shown in Fig. A.2. Or do the experiments on a computer: ask for a random number uniformly distributed between 0 and 1; if the number is less than or equal to 1/2, call it a head; if it is greater than 1/2, call it a tail. The data shown in Fig. A.2 were generated in this way.
The theoretical expectations are more closely met the larger the number of samples. What is the likelihood that the deviations between the experimental distributions and the theoretical distribution, evident in Fig. A.2, occur by chance? By how much are the mean values of the experimental distributions likely to differ from the mean value predicted by the theory? Questions of this kind often are encountered in the laboratory. They are dealt with in books on data reduction and error analysis and will not be pursued here.