CS302 Lecture Notes - Uniform and Exponential Random Numbers


When we generate a sequence of random numbers, there need to be parameters involved to say what the properties that sequence has. We usually assume a "uniform" distribution between two numbers. For example, drand48() returns a number uniformly distributed between 0 and 1. This means that all numbers between 0 and 1 have an equal chance of being generated. When we roll a die, we are generating a random integer uniformly distributed between 1 and 6, meaning that each of those values has an equal chance of being generated.

A probability distribution defines the characteristics of generating random numbers. One way to characterize this is with a cumulative distribution function (CDF), defined as: F(x) = the probability that a random number is less than or equal to x for all possible x.

Think about a uniform distribution whose minimum value is a and whose maximum value is b. What is F(b)? It is one, since all values are less than or equal to b. The mean of a and b is (a+b)/2. So what about F((b+a)/2)? It is 0.5, since half of the numbers will be below and mean and half will be above the mean.

We plot CDF's with the x values on the x-axis, and the F(x) values on the Y axis. Note, the Y axis therefore only goes from zero to one, and the CDF has to be a monotonically increasing function. Here is the CDF for the uniform distribution whose minimum value is a and whose maximum value is b:

Mathematically, this is:

F(x) = (x-a)/(b-a)

We can use a CDF to generate random numbers using drand48(). Let y be a number generated with drand48(). What we do to generate the number according to a CDF is to find the value of x for which F(x) = y, and x is our random number.

Try it for the uniform distribution above. Suppose we generate y = drand48(). We need to find the value of x for which:

y = F(x) = (x-a)/(b-a)

Using algebra we can massage that properly:

y = (x-a)/(b-a)
(b-a)y = x-a
(b-a)y + a = x.

So when we want to emit random numbers, we do:

   number = drand48() * (b - a) + a;


Exponential Distributions

Exponential Distributions come up in real-life scenarios quite a bit. Examples are arrival events, like people showing up to a store, or cars on a highway, and failure events, such as light bulbs failing. The uniform distribution above was defined by a maximum and a minimum. An exponential distribution is parameterized by a variable λ, which is the mean arrival rate/failure rate. For example, light bulbs may fail at a rate of 1 every 1000 hours, or people may enter a store at a rate of 30 people an hour.

The CDF for an exponential distribution is defined for x ≥ 0, and is:

F(x,λ) = 1- e-λx

Here are CDF's for three values of λ:

Note a few things -- the function is asymptotic, meaning any value of x, from one to a gazillion can occur. It's just that the gazillion has a really low probability of occurring.

The mean value is 1/λ. Note that well over half of the values generated will be below the mean - roughly 60 percent. This means that when you buy a light bulb that is rated to last for 1000 hours, chances are 60 percent that it will fail before 1000 hours. However, there's also a chance that the light bulb will last for 1000 years. It's just a really small chance.

We can generate values from an exponential distribution by using the CDF, just as we did above:

y = F(x,&lambda)
y = 1- e-λx
y - 1 = -e-λx
1 - y = e-λx
ln(1 - y) = ln(e-λx)
ln(1 - y) = -λx
-ln(1 - y)/λ = x

So, to generate random numbers according to an exponential, we do:

  number = -1.0 * log(1.0 - drand48()) / lambda;
How about we try this out? Look at expon.cpp:

#include <stdio.h>
#include <iostream>
#include <math.h>
#include <string>
#include <set>
#include <map>
using namespace std;

typedef set <string> fnset;

main(int argc, char **argv)
{
  double lambda;
  int iterations;
  int i;
  
  if (argc != 3 || sscanf(argv[1], "%lf", &lambda) != 1 || lambda <= 0 || sscanf(argv[2], "%d", &iterations) != 1) {
    cerr << "usage: expon lambda iterations\n";
    exit(1);
  }

  srand48(time(0));

  for (i = 0; i < iterations; i++) {
    cout << log(1-drand48())/lambda*-1 << endl;
  }
}
    

Let's try it out on a simulation scenario. Suppose we buy a crate of 1000 light bulbs, each of which has an average lifetime of 1000 hours. We can simulate the failure time of each bulb by choosing 1000 numbers from an exponential distribution where λ = 1/1000 = 0.001:

UNIX> expon 0.001 1000 > bulb.txt
UNIX> head bulb.txt
286.928
120.251
1234.99
185.829
4899.49
2493.85
3980.23
1912.57
697.718
687.042
UNIX> sort -n bulb.txt | head -n 5
1.06986
1.20739
2.45384
2.62195
2.73788
UNIX> sort -n bulb.txt | tail -n 5
5608.55
5671.58
5792.11
7189.54
8034.5
UNIX> 
You can see from the first head statement that the random numbers are indeed all over the place. The two sort statements show you the five bulbs that fail the soonest, and the five that last the longest. It's amazing, no? One unlucky bulb fails after a little over an hour, while one lasts almost a year: 8035 hours = 335 days.

Suppose we sort bulb.txt, and plot the numbers as a graph: the y-axis will be the line number of the file, and the x-axis will have the value on that line:

Think to yourself -- how is that related to the CDF for an exponential? Well, divide the values on the y-axis by 1000 and what do you get? You should get an approximation for the CDF for λ = 1000. To show that, the graph below graphs two lines: the previous graph with the y-axis divided by 1000, and the CDF function y = 1 - e-0.001 x:

Awesome, no?