Generating Other Distributions from Uniform Random Variables

Introduction

In one of my previous post, we discussed how to generate random variables using Linear Congruential Generators (LCGs). The LCGs usually generate uniform random variables (as we shall see shortly), but in many applications, we need to generate random variables from other distributions, such as normal, exponential, or Poisson distributions. In this post, we will explore various techniques to generate random variables from different distributions using uniform random variables.

Uniform Random Variables

Before we dive into generating other distributions, let’s briefly review how to generate uniform random variables using LCGs. An LCG is defined by the recurrence relation:

\[ x_{n+1} = (ax_n+c)\text{ mod } m, \quad n=0, 1, 2, 3, \ldots \]

One of the good set of choices for \(a\), \(c\), and \(m\) is to set

\[ a=7^5, \quad c=0,\quad m=2^{31}-1 \]

Let us implement this basic LCG in Python first:

class LCG:
    def __init__(self, seed, a=7**5, c=0, m=2**31 - 1):
        self.state = seed
        self.a = a
        self.c = c
        self.m = m

    def next(self):
        # The LCG formula: x_{n+1} = (ax_n + c) % m
        self.state = (self.a * self.state + self.c) % self.m
        return self.state/self.m  # Return a value in the range [0, 1]

prng = LCG(seed=42)
generate_seq = [prng.next() for i in range(10)]
print(generate_seq)

[0.00032870750889587566, 0.5245871020129822, 0.7354235321913956, 0.26330554078487006, 0.37622397131110724, 0.19628582577979464, 0.9758738810084173, 0.512318108469396, 0.5304490451377114, 0.2571016295147602]

This will generate a sequence of 10 random numbers uniformly distributed in the range [0, 1]. We can generate a larger sequence and visualize it to check for uniformity and randomness. But let’s save that for the R implementation, which is more concise for this purpose. In fact, the rest of the code in this post will be implemented in R, so let’s switch to R now.

Implement the LCG in R:

prngr <- function(seed, a, c = 0, m, n) {
    current_state <- seed
    seq = numeric(n)
    
    for (i in 1:n) {
        current_state <- (a * current_state + c) %% m
        seq[i] <- current_state/m
    }
    return(seq)
}

rand_seq <- prngr(seed=42, a=7**5, m = 2^31 - 1, n = 10000)
print(rand_seq[1:10])

 [1] 0.0003287075 0.5245871020 0.7354235322 0.2633055408 0.3762239713
 [6] 0.1962858258 0.9758738810 0.5123181085 0.5304490451 0.2571016295

The histogram shows that the generated values have pretty uniform distribution while the time series plot shows that there are no obvious patterns in the generated sequence, which is a good sign for a pseudorandom number generator.

hist(rand_seq,  main = "Histogram of Generated Uniform Random Variables", xlab = "Value", ylab = "Frequency", col = "lightblue")

plot(rand_seq[1:1000], type = "l", main = "Time Series of Generated Uniform Random Variables", xlab = "Index", ylab = "Value")

Adjusting the Range

Let’s consider a version of the LCG which generates integer values in the range 0 to \(m\) instead of floating-point numbers in the range [0, 1]. This can be achieved by simply returning the state without dividing by \(m\):

prngrInt <- function(seed, a, c = 0, m, n) {
    current_state <- seed
    seq = numeric(n)
    
    for (i in 1:n) {
        current_state <- (a * current_state + c) %% m
        seq[i] <- current_state
    }
    return(seq)
}

rand_int <- prngrInt(seed=42, a=7**5, m = 2^31 - 1, n = 10000)
print(rand_int[1:10])

 [1]     705894 1126542223 1579310009  565444343  807934826  421520601
 [7] 2095673201 1100194760 1139130650  552121545

But we want to play a game of dice, which means we want to generate random integers between 1 and 6. How do we take an integer in the range 0 to \(m\) and convert it to an integer in the range 1 to 6? We can use the modulo operator to achieve this:

rand_dice <- (rand_int %% 6) + 1
print(rand_dice[1:10])

 [1] 1 2 6 6 3 4 6 3 3 4

This will give us random integers between 1 and 6, simulating the roll of a die. The histogram of the generated dice rolls should show that each number from 1 to 6 appears with roughly equal frequency, indicating that our method is working correctly.

hist(rand_dice, breaks = seq(0.5, 6.5, by = 1), main = "Histogram of Simulated Dice Rolls", xlab = "Die Value", ylab = "Frequency", col = "lightblue")

U(0,1) to U(a,b)

In general, we can use the formula generating random numbers between 0 and 1 to generate random numbers between \(a\) and \(b\) as follows: \[ X = a + (b-a)U \] Where \(U\) is a random variable uniformly distributed in the range [0, 1]. For example, to generate random numbers between 5 and 10, we can use the following code:

a <- 5
b <- 10
rand_uniform_ab <- a + (b - a) * rand_seq
print(rand_uniform_ab[1:10])

 [1] 5.001644 7.622936 8.677118 6.316528 6.881120 5.981429 9.879369 7.561591
 [9] 7.652245 6.285508

This will give us random numbers uniformly distributed between 5 and 10. The histogram of these numbers should show a uniform distribution between 5 and 10.

hist(rand_uniform_ab, main = "Histogram of Uniform Random Variables between 5 and 10", xlab = "Value", ylab = "Frequency", col = "lightblue")

Generating Other Distributions

Now that we have a good understanding of how to generate uniform random variables, we can use them to generate random variables from other distributions. In the remainder of this post, we will explore some common techniques for doing this, including the Inverse Transform Method, Acceptance-Rejection Method, and the Box-Muller Transform.

Inverse Transform Method

The Inverse Transform Method is one of the most straightforward ways to generate random variables from a continuous distribution, provided that the cumulative distribution function (CDF) is invertible.

Let \(U\) be a uniform random variable on the interval (0, 1). For any continuous probability distribution with CDF \(F\), the random variable \(X\) defined by: \[ X = F^{-1}(U) \]

has the distribution \(F\). Let’s apply this to the exponential distribution. The CDF of an exponential distribution with rate parameter \(\lambda\) is \[ F(x) = 1 - e^{-\lambda x} \]

Setting \(U = 1 - e^{-\lambda x}\) and solving for \(x\) gives: \[ x = -\frac{1}{\lambda} \ln(1 - U) \] Since \(U\) is uniformly distributed on (0, 1), \(1 - U\) is also uniformly distributed on (0, 1). Thus, we can simplify this to \[ x = -\frac{1}{\lambda} \ln(U) \]

Here is how we can implement this in R using our previously generated uniform sequence:

lambda <- 2  # Rate parameter
# Using rand_seq from our LCG
rand_exp <- -(1 / lambda) * log(rand_seq)

hist(rand_exp, breaks = 50, freq = FALSE, 
     main = "Generated Exponential Distribution", 
     xlab = "Value", col = "lightgreen")
# Overlay the theoretical density curve
curve(dexp(x, rate = lambda), add = TRUE, col = "red", lwd = 2)

The Acceptance-Rejection Method

Sometimes the inverse CDF is mathematically difficult or impossible to compute explicitly. In these cases, the Acceptance-Rejection Method is incredibly useful.Suppose we want to simulate a continuous random variable with a target probability density function (PDF) \(f(x)\). We choose a “proposal” density \(g(x)\) which we already know how to simulate, and we find a constant \(c\) such that \(f(x) \leq c \cdot g(x)\) for all \(x\).

The algorithm works as follows:

Generate \(Y\) having density \(g\).
Generate a uniform random number \(U \sim U(0,1)\).
If \(U \leq \dfrac{f(Y)}{c \, g(Y)}\), accept \(X = Y\). Otherwise, reject \(Y\) and repeat.

Let’s simulate a Beta(2, 2) distribution. The PDF is

\[ f(x) = 6x(1-x), \quad 0 \leq x \leq 1 \]

Since the domain is [0, 1], we can use the uniform distribution as our proposal distribution, meaning \(g(x) = 1\). The maximum value of \(f(x)\) occurs at \(x = 0.5\), which gives \(f(0.5) = 1.5\). Therefore, we can set our bounding constant \(c = 1.5\).

n_samples <- 10000
accepted_samples <- numeric(0)
c_val <- 1.5

# We will generate pairs of uniforms to act as Y and U
for (i in 1:n_samples) {
  # In practice, you'd pull from your LCG sequence. 
  # Using runif here for clarity in the loop structure.
  Y <- runif(1) 
  U <- runif(1)
  
  f_Y <- 6 * Y * (1 - Y)
  g_Y <- 1
  
  if (U <= f_Y / (c_val * g_Y)) {
    accepted_samples <- c(accepted_samples, Y)
  }
}

hist(accepted_samples, breaks = 40, freq = FALSE, 
     main = "Generated Beta(2,2) via Acceptance-Rejection", 
     xlab = "Value", col = "lightpink")
curve(dbeta(x, 2, 2), add = TRUE, col = "darkred", lwd = 2)

The Box-Muller Transform

The Box-Muller transform is an elegant mathematical trick designed specifically to generate independent, standard, normally distributed random variables.

Given two independent uniform random variables \(U_1\) and \(U_2\) on the interval (0, 1), the Box-Muller transform generates two independent standard normal variables \(Z_1\) and \(Z_2\):

\[ Z_1 = \sqrt{-2 \ln U_1} \cos(2\pi U_2) \] \[ Z_2 = \sqrt{-2 \ln U_1} \sin(2\pi U_2) \]

We can split our rand_seq of 10,000 uniform variables into two sets of 5,000 to represent \(U_1\) and \(U_2\):

# Split the sequence in half
U1 <- rand_seq[1:5000]
U2 <- rand_seq[5001:10000]

# Apply the transform
Z1 <- sqrt(-2 * log(U1)) * cos(2 * pi * U2)
Z2 <- sqrt(-2 * log(U1)) * sin(2 * pi * U2)

# Combine them for a total of 10,000 standard normal variables
rand_normal <- c(Z1, Z2)

hist(rand_normal, breaks = 50, freq = FALSE, 
     main = "Standard Normal Distribution (Box-Muller)", 
     xlab = "Value", col = "lavender")
curve(dnorm(x), add = TRUE, col = "purple", lwd = 2)

Conclusion

Starting with nothing but a basic Linear Congruential Generator, we’ve walked through how to generate uniform variables and then mathematically bend them into entirely new shapes. Whether you are using the Inverse Transform for simple inverses, Acceptance-Rejection for complex bounded densities, or Box-Muller for normals, the \(U(0,1)\) random variable remains the fundamental building block of statistical computing and Monte Carlo simulations.