import plotly.graph_objects as go
import numpy as np
# Unix rand() parameters
a = 65539
c = 0
m = 2**31
def generate_points(n, seed=42):
state = seed
vals = []
for _ in range(n + 2):
state = (a * state + c) % m
vals.append(state / m)
return vals
# Generate 10,000 points
data = generate_points(10000)
x = data[:-2]
y = data[1:-1]
z = data[2:]
# Create interactive scatter plot
fig = go.Figure(data=[go.Scatter3d(
x=x, y=y, z=z,
mode='markers',
marker=dict(size=2, color=z, colorscale='Viridis', opacity=0.6)
)])
fig.update_layout(
title="Interactive 3D Spectral Test: RANDU",
scene=dict(xaxis_title='x_i', yaxis_title='x_i+1', zaxis_title='x_i+2')
)Randomness in Computing
Computers are deterministic machines. Given the same input and state, they are designed to produce the same output. This predictability is vital for stability, but it poses a challenge for tasks that require unpredictability, such as cryptography, statistical simulations (Monte Carlo methods), and gaming. To bridge this gap, computers typically use Pseudo-Random Number Generators (PRNGs). These are algorithms that use mathematical formulas to produce sequences of numbers that appear random but are actually determined by an initial value called a seed. If you know the seed and the algorithm, you can replicate the entire sequence perfectly.
But before we dive into PRNGs, let’s briefly discuss what randomness is and why it’s important.
What is Randomness?
Randomness refers to the lack of pattern or predictability in events. In a truly random process, each outcome is independent of the previous ones, and there is no way to predict future outcomes based on past events. For example, rolling a fair six-sided die is a random process because each roll is independent and has an equal probability of landing on any of the six faces. In computing, we often need random numbers for various applications, such as:
- Cryptography: Generating secure keys and nonces.
- Simulations: Modeling complex systems in physics, finance, and other fields.
- Gaming: Creating unpredictable game mechanics and outcomes.
For a sequence of numbers to be called random, they should satisfy some properties:
- A random sequence should not be biased. As the sequence length increases, no element should be any more frequent than others. For example, an unfair coin is biased (so long sequences of coin tosses may have more heads):
H T T H H T H H T T H T H H H T H …
However, it is not good enough to be only unbiased. For example, the sequence
1 0 1 0 1 0 1 0 1 0 1 0 1 0 …
is unbiased, but it is very predictable. So it cannot be said to be random.
The sequence should possess high information content (Shannon Entropy). Unpredictable sequences have high entropy.
The sequence should be incompressible, whose shortest possible description is approximately the same length as the sequence itself. The sequence contains no patterns or regularities that could be used to predict or generate the rest of the sequence more concisely than just listing the sequence itself verbatim.
Pseudorandomness
Since true randomness is difficult to achieve in a deterministic machine, we rely on pseudorandomness. A pseudorandom number generator or PRNG in short, produces a sequence of numbers that appears random but cannot be really random as it is generated by a deterministic process. The quality of a PRNG is often evaluated based on how well it mimics true randomness and how long it takes for the sequence to repeat (the period).
A PRNG is a machine (software) that produces a sequence \(x_1\), \(x_2\), \(x_3\), \(x_4\), \(x_5\), … from an initial seed \(x_0\). The basic idea is that the internal state determines the next number.
One of the simplest and most widely used types of PRNGs is the Linear Congruential Generator (LCG).
Linear Congruential Generators (LCGs)
An LCG generates a sequence of pseudorandom numbers using the following formula:
\[ x_{n+1} = (ax_n+c) \mod m \]
In this formula, \(a\), \(c\), \(m\) are positive integer constants where \(n=0\), \(1\), \(2\), \(3\), \(\ldots\) with \(x_0\) being the seed. Being a modulo operation, the numbers will repeat after some period, but it is good enough for many purposes provided \(a\), \(c\), and \(m\) are chosen suitably.
Let’s first implement a basic LCG in Python:
class LCG:
def __init__(self, seed, a=1, c=7, m=12):
self.state = seed
self.a = a
self.c = c
self.m = m
def next(self):
# The LCG formula: x_{n+1} = (ax_n + c) % m
self.state = (self.a * self.state + self.c) % self.m
return self.state
# Initialize with seed 6 and the default a, c, m
prng1 = LCG(seed=6)
sequence = [prng1.next() for _ in range(12)]
print(f"Sequence: {sequence}")This produces
Sequence: [1, 8, 3, 10, 5, 0, 7, 2, 9, 4, 11, 6]Since this topic came up as part of a course in Statistical Computing which uses R, let’s implement a variant in R:
create_lcg <- function(seed, a, c, m) {
current_state <- seed
function() {
current_state <<- (a * current_state + c) %% m
return(current_state)
}
}
# Create a PRNG
prng2 <- create_lcg(seed=6, a=1, c=7, m=12)
results <- replicate(12, prng2())
print(results)This of course produces the same output:
[1] 1 8 3 10 5 0 7 2 9 4 11 6Does the sequence look random? Notice that it prints all the integers from 0 to 11 in this sequence in some seemingly random order. But since the operations are modulo 12, a longer sequence will give repetitions:
results <- replicate(20, prng2())
print(results)This produces
[1] 1 8 3 10 5 0 7 2 9 4 11 6 1 8 3 10 5 0 7 2Notice that the sequence repeats, as expected. So this generator has a period that is too short.
In the above example, the period is the maximum possible for \(m=12\). But this only comes with suitable choice for \(a\) and \(c\). Consider another PRNG with \(a=1\), \(c=8\):
prng3 <- create_lcg(seed=6, a=1, c=8, m=12)
results <- replicate(12, prng3())
print(results)This gives:
[1] 2 10 6 2 10 6 2 10 6 2 10 6This sequence has a period of only 3, not even the theoretical 12.
Moral: The choice of \(a\), \(c\), and \(m\) matters.
Picking the constants a, c, m
The maximum period is \(m\). So picking up large value for \(m\) and appropriate values for \(a\) and \(c\) that works with this \(m\) ensures a very long sequence before numbers begin to repeat.
The LCG will have the maximum possible period of \(m\) if and only if:
\(c\) and \(m\) are relatively prime,
\((a-1)\) is divisible by all prime factors of \(m\), and
if \(m\) is a multiple of 4, then \((a-1)\) is also a multiple of 4.
Let’s check for prng1, that was defined earlier (a = 1, c = 7, m = 12)
7 and 12 are relatively prime ✔️
\(a-1 = 0\) which is divisible by all prime factors of \(m =12\), which are 2 and 3 ✔️
\(m=12\) is a multiple of 4 and \(a-1 = 0\) is also a multiple of 4 ✔️
Since all the conditions are fulfilled, prng1 generates a sequence which has a period of 12 (the maximum possible).
For prng2, since \(c = 8\) and \(m = 12\) are not relatively prime, the first condition breaks and the period is no longer the maximum possible value (it is in fact only 3).
Two Suitable Choices
Two useful uniform random number generators of this type which have been found to be useful are
\(a=7^5=16\ 807\), \(c=0\), \(m = 2^{31}-1=2\ 147\ 483\ 647\)
\(a=29\ 903\ 947\), \(c=0\), \(m=2^{31}-1=2\ 147\ 483\ 647\)
A comparative computational study of various random number generators can be found at the following citation (Ribeiro, Souza, and Vieira 2005).
A Cautionary Tale: Netscape Security Flaw (1995)
In the early days of the Netscape Navigator browser, the implementation of the SSL (Secure Sockets Layer) protocol relied on a PRNG to generate “random” encryption keys. However, the developers used a seed derived from three easily discoverable system variables:
The System Time: Specifically, the time in seconds and microseconds.
The Process ID (PID): A unique identifier for the running browser process.
The Parent Process ID (PPID): The identifier for the process that launched the browser.
Researchers Ian Goldberg and David Wagner discovered that because these variables were not truly secret, an attacker could narrow down the possible seeds significantly. Since the system time was only accurate to a certain degree and PIDs are often assigned sequentially, the “search space” (the number of possible keys) was reduced from billions to just a few million. A standard workstation at the time could “brute force” the encryption key in less than a minute by simply guessing seeds based on the approximate time the connection started.