RandomLib
1.10

The state of MT19937 is given by a set of 19937 bits. (The discussion here is illustrated with the MT19937 generator. The properties of the SFMT19937 generator are similar.) Over the course of the period of the generator all possible states are visited. (The state consisting of all zeros is disallowed.) Thus the sequence consists of X_{i} where X_{i + L} = X_{i} and L = 2^{19937} − 1. When using the generator we need to specify a starting state j so that the sequence is then Y_{i; j} = X_{j + i}.
However, rather than specified an unwieldy set of 19937 bits, we instead specify a vector s of 32bit integers. The length of s is arbitrary (it can even be zero), however in most applications, its length will be small—often a length of 1 provides sufficient "seed space". The process of seeding MT19937 consists of "mixing" the s in some way to provide the necessary starting state j. Thus the random sequence is now viewed as Z_{i; s} = Y_{i; j(s)} = X_{j(s) + i}.
Now the user's view of a typical random number generator is that Z_{i; s} and Z_{i; s'} are independent provided s and s' are distinct. A necessary condition for independence is that j(s) − j(s') > R where R is the maximum number of random numbers needed for a particular seed. If we assume that the seeding function j(s) produces randomly distributed starting positions and if maximum number of seeds we might use is S, then the probability of overlapping sequences, i.e., that j(s) − j(s') < R for some s and s' is S^{2}R / L.
Suppose we take R = 10^{200} and S = 10^{100}, then the probability of overlap is a tiny 10^{−5600}. (On the other hand with rand() for which the period L is 2^{32}, we have an appreciable probability of overlap with R = 2000 and S = 1000.) This means that we can safely assume that the sequences Z_{i; s} are independent and this then means that from the user's perspective the most useful representation of the state of the generator is [i, s] which is given by [Count(), Seed()].
It's frequently desirable to start each run of a code with a different "arbitrary" seed. The current time (in seconds) is frequently used for this purpose. However, if many runs are started simultaneously, many are likely to use the same seed. Random::SeedWord() can instead be called. This returns an unsigned long generated from various sources (/dev/urandom, the microsecond clock, etc.).
Because Random::SeedWord() returns a result in [0, 2^{32}), there's a strong probability of collisions after 2^{16} invocations. If you expect that your code will be invoked more often than that, then instead use Random::SeedVector() to seed the generator. This returns a vector of unsigned longs which is almost certainly unique. However successive calls to Random::SeedVector() may return the same result. If multiple random number sequences are required, for example in a multithreaded application, then Random::SeedVector() can be called once by the master thread to define a master seed and each slave thread would set its seed to a vector obtained by appending a thread index to the master seed. The default constructor for Random sets the seed to the vector [Random::SeedVector()].
Whenever random numbers are used it is important to record the seed used. Without this information, it will be impossible to repeated (e.g., to track down a bug). Random::SeedString() returns the seed vector as a string allowing it to be printed on standard output easily. Thus
The seed may be set with the constructor as follows:
After a random object is created, you can change its seed with Reseed(...) with
Tools are provided to convert between the string and vector representations of a seed
The original C interface for the Mersenne Twister provided two seedsetting interfaces: (a) init_genrand, which took a single unsigned long as argument and (b) init_by_array, which took an array (length > 0) of unsigned longs as an argument. Thus, the set of allowed seeds was {a, [a], [a, b], [a, b, c], ...}, where a and [a] were distinct. But this then presents a confusing interface to the user. In addition, it's not clear how best to report back the seed to the user.
In this implementation, seeds are always vectors (of arbitrary length, including zero). Thus the set of allowable seeds is {[], [a], [a, b], [a, b, c], ...} which is easily and unambiguously represented by the STL vector container. Note also that [], [0], [0, 0], etc. are all distinct as are [a], [a, 0], [a, 0, 0]. Seeding with Reseed(n) merely sets the seed to a vector of length one, [n].
Also, since the init_by_array routine in the original MT19937 implementation has some weaknesses (the most serious of which is distinct short seeds can result in the same state, this library uses, by default, SFMT's method for mixing the seed into the random generator state. (This method has been adopted by the proposed C++11 standard, see P. Becker, Working Draft, Standard for Programming Language C++, Oct. 2007, Sec. 26.4.7.1.) In this implementation, this mixing class is called MixerSFMT.
Two other mixing classes MixerMT0 and MixerMT1 are provided. MixerMT0 implements the mixing methods in MT19937 and MT19937_64. These methods have some defects which are partially corrected by the MixerMT1 classes with are described in the (superseded) proposal to the C++11 standards, Brown, et al., Random Number Generation in C++0x: A Comprehensive Proposal, version 3.