A probability distribution defines the characteristics of generating random numbers. One way to characterize this is with a cumulative distribution function (CDF), defined as: F(x) = the probability that a random number is less than or equal to x for all possible x.
Think about a uniform distribution whose minimum value is a and whose maximum value is b. What is F(b)? It is one, since all values are less than or equal to b. The mean of a and b is (a+b)/2. So what about F((b+a)/2)? It is 0.5, since half of the numbers will be below and mean and half will be above the mean.
We plot CDF's with the x values on the x-axis, and the F(x) values on the Y axis. Note, the Y axis therefore only goes from zero to one, and the CDF has to be a monotonically increasing function. Here is the CDF for the uniform distribution whose minimum value is a and whose maximum value is b:
![]() |
Mathematically, this is:
We can use a CDF to generate random numbers using drand48(). Let y be a number generated with drand48(). What we do to generate the number according to a CDF is to find the value of x for which F(x) = y, and x is our random number.
Try it for the uniform distribution above. Suppose we generate y = drand48(). We need to find the value of x for which:
Using algebra we can massage that properly:
So when we want to emit random numbers, we do:
number = drand48() * (b - a) + a;
The CDF for an exponential distribution is defined for x ≥ 0, and is:
Here are CDF's for three values of λ:
![]() |
Note a few things -- the function is asymptotic, meaning any value of x, from one to a gazillion can occur. It's just that the gazillion has a really low probability of occurring.
The mean value is 1/λ. Note that well over half of the values generated will be below the mean - roughly 60 percent. This means that when you buy a light bulb that is rated to last for 1000 hours, chances are 60 percent that it will fail before 1000 hours. However, there's also a chance that the light bulb will last for 1000 years. It's just a really small chance.
We can generate values from an exponential distribution by using the CDF, just as we did above:
So, to generate random numbers according to an exponential, we do:
number = -1.0 * log(1.0 - drand48()) / lambda;How about we try this out? Look at expon.cpp:
#include <stdio.h>
#include <iostream>
#include <math.h>
#include <string>
#include <set>
#include <map>
using namespace std;
typedef set <string> fnset;
main(int argc, char **argv)
{
double lambda;
int iterations;
int i;
if (argc != 3 || sscanf(argv[1], "%lf", &lambda) != 1 || lambda <= 0 || sscanf(argv[2], "%d", &iterations) != 1) {
cerr << "usage: expon lambda iterations\n";
exit(1);
}
srand48(time(0));
for (i = 0; i < iterations; i++) {
cout << log(1-drand48())/lambda*-1 << endl;
}
}
|
Let's try it out on a simulation scenario. Suppose we buy a crate of 1000 light bulbs, each of which has an average lifetime of 1000 hours. We can simulate the failure time of each bulb by choosing 1000 numbers from an exponential distribution where λ = 1/1000 = 0.001:
UNIX> expon 0.001 1000 > bulb.txt UNIX> head bulb.txt 286.928 120.251 1234.99 185.829 4899.49 2493.85 3980.23 1912.57 697.718 687.042 UNIX> sort -n bulb.txt | head -n 5 1.06986 1.20739 2.45384 2.62195 2.73788 UNIX> sort -n bulb.txt | tail -n 5 5608.55 5671.58 5792.11 7189.54 8034.5 UNIX>You can see from the first head statement that the random numbers are indeed all over the place. The two sort statements show you the five bulbs that fail the soonest, and the five that last the longest. It's amazing, no? One unlucky bulb fails after a little over an hour, while one lasts almost a year: 8035 hours = 335 days.
Suppose we sort bulb.txt, and plot the numbers as a graph: the y-axis will be the line number of the file, and the x-axis will have the value on that line:
![]() |
Think to yourself -- how is that related to the CDF for an exponential? Well, divide the values on the y-axis by 1000 and what do you get? You should get an approximation for the CDF for λ = 1000. To show that, the graph below graphs two lines: the previous graph with the y-axis divided by 1000, and the CDF function y = 1 - e-0.001 x:
![]() |
Awesome, no?