Back to Code organization. Forward to Random integers. Up to Contents.

The state of MT19937 is given by a set of 19937 bits. (The discussion here is illustrated with the MT19937 generator. The properties of the SFMT19937 generator are similar.) Over the course of the period of the generator all possible states are visited. (The state consisting of all zeros is disallowed.) Thus the sequence consists of X_i where X_{i + L} = X_i and L = 2¹⁹⁹³⁷ − 1. When using the generator we need to specify a starting state j so that the sequence is then Y_{i; j} = X_{j + i}.

However, rather than specified an unwieldy set of 19937 bits, we instead specify a vector s of 32-bit integers. The length of s is arbitrary (it can even be zero), however in most applications, its length will be small—often a length of 1 provides sufficient "seed space". The process of seeding MT19937 consists of "mixing" the s in some way to provide the necessary starting state j. Thus the random sequence is now viewed as Z_{i; s} = Y_{i; j(s)} = X_{j(s) + i}.

Now the user's view of a typical random number generator is that Z_{i; s} and Z_{i; s'} are independent provided s and s' are distinct. A necessary condition for independence is that |j(s) − j(s')| > R where R is the maximum number of random numbers needed for a particular seed. If we assume that the seeding function j(s) produces randomly distributed starting positions and if maximum number of seeds we might use is S, then the probability of overlapping sequences, i.e., that |j(s) − j(s')| < R for some s and s' is S²R / L.

Suppose we take R = 10²⁰⁰ and S = 10¹⁰⁰, then the probability of overlap is a tiny 10⁻⁵⁶⁰⁰. (On the other hand with rand() for which the period L is 2³², we have an appreciable probability of overlap with R = 2000 and S = 1000.) This means that we can safely assume that the sequences Z_{i; s} are independent and this then means that from the user's perspective the most useful representation of the state of the generator is [i, s] which is given by [Count(), Seed()].

It's frequently desirable to start each run of a code with a different "arbitrary" seed. The current time (in seconds) is frequently used for this purpose. However, if many runs are started simultaneously, many are likely to use the same seed. Random::SeedWord() can instead be called. This returns an unsigned long generated from various sources (/dev/urandom, the microsecond clock, etc.).

Because Random::SeedWord() returns a result in [0, 2³²), there's a strong probability of collisions after 2¹⁶ invocations. If you expect that your code will be invoked more often than that, then instead use Random::SeedVector() to seed the generator. This returns a vector of unsigned longs which is almost certainly unique. However successive calls to Random::SeedVector() may return the same result. If multiple random number sequences are required, for example in a multi-threaded application, then Random::SeedVector() can be called once by the master thread to define a master seed and each slave thread would set its seed to a vector obtained by appending a thread index to the master seed. The default constructor for Random sets the seed to the vector [Random::SeedVector()].

Whenever random numbers are used it is important to record the seed used. Without this information, it will be impossible to repeated (e.g., to track down a bug). Random::SeedString() returns the seed vector as a string allowing it to be printed on standard output easily. Thus

RandomLib::Random r;
r.Reseed();            // sets seed to [Random::SeedVector()]
std::cout << "Random seed set to " << r.SeedString() << \n";

The seed may be set with the constructor as follows:

using RandomLib::Random;
Random r1;                  // set seed to []
Random r2(1234);            // set seed to [1234]
Random r3("[1,2,3,4]");     // set seed to [1,2,3,4]
unsigned v[] = {1,2,3,4};
Random r4(v, v+4);          // seed set via iterators
Random r5(std::vector<unsigned>(v, v+4)); // seed set via vector
Random r6(Random::SeedWord());            // use a "random" integer seed
Random r7(Random::SeedVector());          // use a "unique" vector seed

After a random object is created, you can change its seed with Reseed(...) with

using RandomLib::Random;
Random r;                   // created with seed []
r.Reseed();                 // use a "unique" vector seed
r.Reseed(1234);             // set seed to [1234]
r.Reseed("[1,2,3,4]");      // set seed to [1,2,3,4]
unsigned v[] = {1,2,3,4};
r.Reseed(v, v+4);           // seed set via iterators
r.Reseed(std::vector<unsigned>(v, v+4)); // seed set via vector
r.Reseed(Random::SeedWord());            // use a "random" integer seed
r.Reseed(Random::SeedVector());          // same as r.Reseed()
r.Reseed(std::vector<unsigned>(0));      // set seed to []

Tools are provided to convert between the string and vector representations of a seed

using RandomLib::Random;
std::vector<unsigned long> seed(Random::StringToVector("[1,2,3,4]");
std::cout << Random::VectorToString(seed) << "\n";

The original C interface for the Mersenne Twister provided two seed-setting interfaces: (a) init_genrand, which took a single unsigned long as argument and (b) init_by_array, which took an array (length > 0) of unsigned longs as an argument. Thus, the set of allowed seeds was {a, [a], [a, b], [a, b, c], ...}, where a and [a] were distinct. But this then presents a confusing interface to the user. In addition, it's not clear how best to report back the seed to the user.

In this implementation, seeds are always vectors (of arbitrary length, including zero). Thus the set of allowable seeds is {[], [a], [a, b], [a, b, c], ...} which is easily and unambiguously represented by the STL vector container. Note also that [], [0], [0, 0], etc. are all distinct as are [a], [a, 0], [a, 0, 0]. Seeding with Reseed(n) merely sets the seed to a vector of length one, [n].

Also, since the init_by_array routine in the original MT19937 implementation has some weaknesses (the most serious of which is distinct short seeds can result in the same state, this library uses, by default, SFMT's method for mixing the seed into the random generator state. (This method has been adopted by the proposed C++11 standard, see P. Becker, Working Draft, Standard for Programming Language C++, Oct. 2007, Sec. 26.4.7.1.) In this implementation, this mixing class is called MixerSFMT.

Two other mixing classes MixerMT0 and MixerMT1 are provided. MixerMT0 implements the mixing methods in MT19937 and MT19937_64. These methods have some defects which are partially corrected by the MixerMT1 classes with are described in the (superseded) proposal to the C++11 standards, Brown, et al., Random Number Generation in C++0x: A Comprehensive Proposal, version 3.

Back to Code organization. Forward to Random integers. Up to Contents.