Inside the Pseudorandom Number Generator
The Mersenne Twister is a strong pseudorandom number generator (PRNG). A version available in many programming languages, MT19937, has a period of 2^19937 – 1. A sequence's period defines how long it continues before repeating itself. Sequences with too short of a period can be observed, recorded, and reused by an attacker. Sequences with long periods force the adversary to select alternate attack methods to passive observation. The period of MT19937 far outlasts the number of seconds until our world ends in fire or ice (or is wiped out by a Vogon construction fleetA for that matter). The strength of MT19937 also lies in the fact that one 32-bit value produced by it cannot be used to predict the subsequent 32-bit value. This ensures a certain degree of unpredictability.
Yet, all is not perfect in terms of nonpredictability. The MT19937 algorithm keeps track of its state in 624 32-bit values. If an attacker were able to gather 624 sequential values, then the entire sequence – forward and backward – could be reverse engineered. This feature is not specific to the Mersenne Twister; most PRNGs have a state mechanism that is used to generate the next value in the sequence. Knowledge of the state effectively compromises the sequence's predictability.
Linear congruential generators (LCG) use a different approach to creating numeric sequences. They predate the Internet, going as far back as 1948.B Simple LCG algorithms create a sequence from a formula based on a constant multiplier, a constant additive value, and a constant modulo. The details of an LCG aren't important at the moment, but here is an example of the formula. The values of a, k, and m must be secret to preserve the unpredictability of the sequence.
The period of an LCG is far shorter than MT19937. However, an effective attack does not need to observe more than a few sequential values. George Marsaglia describes an algorithm for identifying and cracking a PRNG based on a congruential generator.C The crack requires fewer than two dozen sequential samples from the sequence. The description of the cracking algorithm may sound complicated to math-averse ears, but rest assured the execution is simple. Briefly, the attack determines the modulo m of the LCG by finding the greatest common divisor (GCD) of the volumes of parallelepipedsD described by vectors taken from the LCG sequence. This translates into the following Python script.
#!/usr/bin/python
import array
from fractions import gcd
from itertools import imap, product
from numpy.linalg import det
from operator import mul, sub
values = array.array('l', [308,785,930,695,864,237,1006,819,204,777, 378,495,376,357,70,747,356])
vectors = [ [values[i] − values[0], values[i+1] − values[1]] for i in range(1, len(values)−1) ]
volumes = []
for i in range(0, len(vectors)−2, 2):
v = abs(det([ vectors[i], vectors[i+1] ]))
volumes.insert(−1, v)
print gcd(volumes[0], volumes[1])
The GCD reported by this script will be the modulo m used in the LCG (in some cases, more than one GCD may need to be calculated before reaching the correct value). We already have a series of values for x, so all that remains is to solve for a and k. The values are easily found by solving two equations for two unknowns.
This section should not be misread as a suggestion to create your own PRNG. The Mersenne Twister is a strong PRNG. A similarly strong algorithm is called the Lagged Fibonacci. Instead, this section highlights some very simple ways that a generator may inadvertently leak its internal state. Enumerating 624 sequential 32-bit values might not be feasible against a busy Web site, different requests may use different seeds, or maybe numbers in the sequence are randomly skipped over. In any case, it's important that the site be aware of how it is generating random numbers and where those numbers are being used. The generation should come from a well-accepted method as opposed to home-brewed algorithms. The values should not be used such that the internal state of a PRNG can be reproduced.
We shouldn't end this section without recommending a book more salient to random numbers: The Art of Computer Programming, Volume 2 by Donald Knuth. It is a canonical resource regarding the generation and analysis of random numbers.