Skip to content

Commit dcd83ac

Browse files
committed
Rework MersenneTwisterEngine using Ilya Yaroshenko's faster algorithm
This patch updates MersenneTwisterEngine's implementation to use the new and much faster implementation Ilya prepared for `mir.random`. The original Mersenne Twister algorithm (as per the reference C code by Matsumoto and Nishimura) first regenerates the contents of the entire state array, before looping over the resulting values and tempering them in order to generate the actual variates. Once the entire state array has been iterated over, the array's values are then regenerated, and so on and so forth. Ilya's implementation reworks this idea to intertwine update of a single entry in the state array with the tempering of another value to produce the next variate. This ensures that all the registers concerned stay 'hot' in the CPU's memory and hence ensures significant speedup. (Just as a mention: as an experiment while adapting this code for phobos I tried splitting apart the lines responsible for updating the internal state array from those responsible for tempering the next variate, so that they could be run sequentially instead of mixed up together; this resulted in a fairly significant speed hit.) In contrast to most (all?) other Mersenne Twister implementations, this one iterates backwards through the state array, which allows for a few extra compiler optimizations that speed up the algorithm. It has thus been necessary to rework the range-based `seed` method to take account of this. Besides the algorithmic change, this patch introduces two new template variables: an extra tempering variable `d`, and the initialization multiplier `f`, which brings the template implementation in line with that of the C++11 standard. These extra template variables are needed in order to effectively implement the standard 64-bit version of the Mersenne Twister, which will be added in a follow-up patch. Finally, this implementation introduces corrections to the handling of the word size `w` to ensure that the right sequences are produced when the word size is less than the number of bits in `UIntType`. Unittests will be added for this in a follow-up patch. ------------------------------------------------------------------------ Where this implementation differs from that in `mir.random`: Ilya's original design has been reworked as a drop-in replacement for the existing Phobos implementation, which is currently implemented as a forward range (although it should ideally be an input range, but that design flaw should be fixed as a separate issue). There appears to be no significant speed hit from reworking Ilya's code as a range rather than as a functor. However, some other aspects of the original design have required rather intrusive changes in order to get the full speed benefits without introducing breaking change. The original MersenneTwisterEngine allows for implicit instantiation of generator instances without providing a seed. This is handled by checking in the `front` and `popFront` method whether the generator has been properly initialized, and seeding it with the default seed if not. However, these runtime checks on every single call result in a massive speed hit. The current implementation therefore takes a number of steps in order to ensure that the internal state of the generator can be set to its default-seeded values at compile time: * all the internal state variables have been wrapped up in a nested `State` struct to facilitate generation; * the internals of the `seed` and `popFront` methods have been separated out into CTFE'able private static methods (`seedImpl` and `popFrontImpl`) which take a reference to a `State` instance as input; * a CTFE'able private static `defaultState` method has been added which generates a `State` instance matching that generated by instantiating the generator with the `defaultSeed`. The `defaultState` method is then used to initialize the default value of the internal `State` instance at compile-time, replicating the effect of the original runtime seeding check. These latter workarounds could be removed in future if the generator were updated to `@disable this();` and therefore always require explicit seeding, but this would be a breaking change and so is avoided for now.
1 parent 73e262a commit dcd83ac

File tree

1 file changed

+184
-86
lines changed

1 file changed

+184
-86
lines changed

std/random.d

Lines changed: 184 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ License: $(HTTP www.boost.org/LICENSE_1_0.txt, Boost License 1.0).
4444
Authors: $(HTTP erdani.org, Andrei Alexandrescu)
4545
Masahiro Nakagawa (Xorshift random generator)
4646
$(HTTP braingam.es, Joseph Rushton Wakeling) (Algorithm D for random sampling)
47+
Ilya Yaroshenko (Mersenne Twister implementation, adapted from $(HTTPS github.com/libmir/mir-random, mir.random))
4748
Credits: The entire random number library architecture is derived from the
4849
excellent $(HTTP open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2461.pdf, C++0X)
4950
random number facility proposed by Jens Maurer and contributed to by
@@ -527,16 +528,17 @@ alias MinstdRand = LinearCongruentialEngine!(uint, 48271, 0, 2147483647);
527528
The $(LUCKY Mersenne Twister) generator.
528529
*/
529530
struct MersenneTwisterEngine(UIntType, size_t w, size_t n, size_t m, size_t r,
530-
UIntType a, size_t u, size_t s,
531+
UIntType a, size_t u, UIntType d, size_t s,
531532
UIntType b, size_t t,
532-
UIntType c, size_t l)
533+
UIntType c, size_t l, UIntType f)
533534
if (isUnsigned!UIntType)
534535
{
535536
static assert(0 < w && w <= UIntType.sizeof * 8);
536537
static assert(1 <= m && m <= n);
537538
static assert(0 <= r && 0 <= u && 0 <= s && 0 <= t && 0 <= l);
538539
static assert(r <= w && u <= w && s <= w && t <= w && l <= w);
539540
static assert(0 <= a && 0 <= b && 0 <= c);
541+
static assert(n <= sizediff_t.max);
540542

541543
///Mark this as a Rng
542544
enum bool isUniformRandom = true;
@@ -549,21 +551,79 @@ Parameters for the generator.
549551
enum size_t shiftSize = m; /// ditto
550552
enum size_t maskBits = r; /// ditto
551553
enum UIntType xorMask = a; /// ditto
552-
enum UIntType temperingU = u; /// ditto
554+
enum size_t temperingU = u; /// ditto
555+
enum UIntType temperingD = d; /// ditto
553556
enum size_t temperingS = s; /// ditto
554557
enum UIntType temperingB = b; /// ditto
555558
enum size_t temperingT = t; /// ditto
556559
enum UIntType temperingC = c; /// ditto
557560
enum size_t temperingL = l; /// ditto
561+
enum UIntType initializationMultiplier = f; /// ditto
558562

559563
/// Smallest generated value (0).
560564
enum UIntType min = 0;
561565
/// Largest generated value.
562566
enum UIntType max = UIntType.max >> (UIntType.sizeof * 8u - w);
563-
static assert(a <= max && b <= max && c <= max);
567+
// note, `max` also serves as a bitmask for the lowest `w` bits
568+
static assert(a <= max && b <= max && c <= max && f <= max);
569+
564570
/// The default seed value.
565571
enum UIntType defaultSeed = 5489u;
566572

573+
// Bitmasks used in the 'twist' part of the algorithm
574+
private enum UIntType lowerMask = (cast(UIntType) 1u << r) - 1,
575+
upperMask = (~lowerMask) & this.max;
576+
577+
/*
578+
Collection of all state variables
579+
used by the generator
580+
*/
581+
private struct State
582+
{
583+
/*
584+
State array of the generator. This
585+
is iterated through backwards (from
586+
last element to first), providing a
587+
few extra compiler optimizations by
588+
comparison to the forward iteration
589+
used in most implementations.
590+
*/
591+
UIntType[n] data;
592+
593+
/*
594+
Cached copy of most recently updated
595+
element of `data` state array, ready
596+
to be tempered to generate next
597+
`front` value
598+
*/
599+
UIntType z;
600+
601+
/*
602+
Most recently generated random variate
603+
*/
604+
UIntType front;
605+
606+
/*
607+
Index of the entry in the `data`
608+
state array that will be twisted
609+
in the next `popFront()` call
610+
*/
611+
size_t index;
612+
}
613+
614+
/*
615+
State variables used by the generator;
616+
initialized to values equivalent to
617+
explicitly seeding the generator with
618+
`defaultSeed`
619+
*/
620+
private State state = defaultState();
621+
/* NOTE: the above is a workaround to ensure
622+
backwards compatibility with the original
623+
implementation, which permitted implicit
624+
construction. With `@disable this();`
625+
it would not be necessary. */
626+
567627
/**
568628
Constructs a MersenneTwisterEngine object.
569629
*/
@@ -572,36 +632,60 @@ Parameters for the generator.
572632
seed(value);
573633
}
574634

635+
/**
636+
Generates the default initial state for a Mersenne
637+
Twister; equivalent to the internal state obtained
638+
by calling `seed(defaultSeed)`
639+
*/
640+
private static State defaultState() @safe pure nothrow @nogc
641+
{
642+
if (!__ctfe) assert(false);
643+
State mtState;
644+
seedImpl(defaultSeed, mtState);
645+
return mtState;
646+
}
647+
575648
/**
576649
Seeds a MersenneTwisterEngine object.
577650
Note:
578-
This seed function gives 2^32 starting points. To allow the RNG to be started in any one of its
579-
internal states use the seed overload taking an InputRange.
651+
This seed function gives 2^w starting points (the lowest w bits of
652+
the value provided will be used). To allow the RNG to be started
653+
in any one of its internal states use the seed overload taking an
654+
InputRange.
580655
*/
581656
void seed()(UIntType value = defaultSeed) @safe pure nothrow @nogc
582657
{
583-
static if (w == UIntType.sizeof * 8)
584-
{
585-
mt[0] = value;
586-
}
587-
else
658+
this.seedImpl(value, this.state);
659+
}
660+
661+
/**
662+
Implementation of the seeding mechanism, which
663+
can be used with an arbitrary `State` instance
664+
*/
665+
private static void seedImpl(UIntType value, ref State mtState)
666+
{
667+
mtState.data[$ - 1] = value;
668+
static if (this.max != UIntType.max)
588669
{
589-
static assert(max + 1 > 0);
590-
mt[0] = value % (max + 1);
670+
mtState.data[$ - 1] &= this.max;
591671
}
592-
for (mti = 1; mti < n; ++mti)
672+
673+
foreach_reverse (size_t i, ref e; mtState.data[0 .. $ - 1])
593674
{
594-
mt[mti] =
595-
cast(UIntType)
596-
(1812433253UL * (mt[mti-1] ^ (mt[mti-1] >> (w - 2))) + mti);
597-
/* See Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier. */
598-
/* In the previous versions, MSBs of the seed affect */
599-
/* only MSBs of the array mt[]. */
600-
/* 2002/01/09 modified by Makoto Matsumoto */
601-
//mt[mti] &= ResultType.max;
602-
/* for >32 bit machines */
675+
e = f * (mtState.data[i + 1] ^ (mtState.data[i + 1] >> (w - 2))) + cast(UIntType)(n - (i + 1));
676+
static if (this.max != UIntType.max)
677+
{
678+
e &= this.max;
679+
}
603680
}
604-
popFront();
681+
682+
mtState.index = n - 1;
683+
684+
/* double popFront() to guarantee both `mtState.z`
685+
and `mtState.front` are derived from the newly
686+
set values in `mtState.data` */
687+
MersenneTwisterEngine.popFrontImpl(mtState);
688+
MersenneTwisterEngine.popFrontImpl(mtState);
605689
}
606690

607691
/**
@@ -612,14 +696,26 @@ Parameters for the generator.
612696
The number of elements required is the 'n' template parameter of the MersenneTwisterEngine struct.
613697
*/
614698
void seed(T)(T range) if (isInputRange!T && is(Unqual!(ElementType!T) == UIntType))
699+
{
700+
this.seedImpl(range, this.state);
701+
}
702+
703+
/**
704+
Implementation of the range-based seeding mechanism,
705+
which can be used with an arbitrary `State` instance
706+
*/
707+
private static void seedImpl(T)(T range, ref State mtState)
708+
if (isInputRange!T && is(Unqual!(ElementType!T) == UIntType))
615709
{
616710
size_t j;
617711
for (j = 0; j < n && !range.empty; ++j, range.popFront())
618712
{
619-
mt[j] = range.front;
713+
sizediff_t idx = n - j - 1;
714+
mtState.data[idx] = range.front;
620715
}
621716

622-
mti = n;
717+
mtState.index = n - 1;
718+
623719
if (range.empty && j < n)
624720
{
625721
import core.internal.string : UnsignedStringBuf, unsignedToTempString;
@@ -630,77 +726,83 @@ Parameters for the generator.
630726
throw new Exception(s);
631727
}
632728

633-
popFront();
634-
}
635-
636-
///
637-
@safe unittest
638-
{
639-
import std.algorithm.iteration : map;
640-
import std.range : repeat;
641-
642-
Mt19937 gen;
643-
gen.seed(map!((a) => unpredictableSeed)(repeat(0)));
729+
/* double popFront() to guarantee both `mtState.z`
730+
and `mtState.front` are derived from the newly
731+
set values in `mtState.data` */
732+
MersenneTwisterEngine.popFrontImpl(mtState);
733+
MersenneTwisterEngine.popFrontImpl(mtState);
644734
}
645735

646736
/**
647737
Advances the generator.
648738
*/
649739
void popFront() @safe pure nothrow @nogc
650740
{
651-
if (mti == size_t.max) seed();
652-
enum UIntType
653-
upperMask = ~((cast(UIntType) 1u <<
654-
(UIntType.sizeof * 8 - (w - r))) - 1),
655-
lowerMask = (cast(UIntType) 1u << r) - 1;
656-
static immutable UIntType[2] mag01 = [0x0UL, a];
657-
658-
ulong y = void;
741+
this.popFrontImpl(this.state);
742+
}
659743

660-
if (mti >= n)
661-
{
662-
/* generate N words at one time */
744+
/*
745+
Internal implementation of `popFront()`, which
746+
can be used with an arbitrary `State` instance
747+
*/
748+
private static void popFrontImpl(ref State mtState)
749+
{
750+
/* This function blends two nominally independent
751+
processes: (i) calculation of the next random
752+
variate `mtState.front` from the cached previous
753+
`data` entry `z`, and (ii) updating the value
754+
of `data[index]` and `mtState.z` and advancing
755+
the `index` value to the next in sequence.
663756
664-
int kk = 0;
665-
const limit1 = n - m;
666-
for (; kk < limit1; ++kk)
667-
{
668-
y = (mt[kk] & upperMask)|(mt[kk + 1] & lowerMask);
669-
mt[kk] = cast(UIntType) (mt[kk + m] ^ (y >> 1)
670-
^ mag01[cast(UIntType) y & 0x1U]);
671-
}
672-
const limit2 = n - 1;
673-
for (; kk < limit2; ++kk)
674-
{
675-
y = (mt[kk] & upperMask)|(mt[kk + 1] & lowerMask);
676-
mt[kk] = cast(UIntType) (mt[kk + (m -n)] ^ (y >> 1)
677-
^ mag01[cast(UIntType) y & 0x1U]);
678-
}
679-
y = (mt[n -1] & upperMask)|(mt[0] & lowerMask);
680-
mt[n - 1] = cast(UIntType) (mt[m - 1] ^ (y >> 1)
681-
^ mag01[cast(UIntType) y & 0x1U]);
757+
By interweaving the steps involved in these
758+
procedures, rather than performing each of
759+
them separately in sequence, the variables
760+
are kept 'hot' in CPU registers, allowing
761+
for significantly faster performance. */
762+
sizediff_t index = mtState.index;
763+
sizediff_t next = index - 1;
764+
if (next < 0)
765+
next = n - 1;
766+
auto z = mtState.z;
767+
sizediff_t conj = index - m;
768+
if (conj < 0)
769+
conj = index - m + n;
682770

683-
mti = 0;
771+
static if (d == UIntType.max)
772+
{
773+
z ^= (z >> u);
774+
}
775+
else
776+
{
777+
z ^= (z >> u) & d;
684778
}
685779

686-
y = mt[mti++];
687-
688-
/* Tempering */
689-
y ^= (y >> temperingU);
690-
y ^= (y << temperingS) & temperingB;
691-
y ^= (y << temperingT) & temperingC;
692-
y ^= (y >> temperingL);
780+
auto q = mtState.data[index] & upperMask;
781+
auto p = mtState.data[next] & lowerMask;
782+
z ^= (z << s) & b;
783+
auto y = q | p;
784+
auto x = y >> 1;
785+
z ^= (z << t) & c;
786+
if (y & 1)
787+
x ^= a;
788+
auto e = mtState.data[conj] ^ x;
789+
z ^= (z >> l);
790+
mtState.z = mtState.data[index] = e;
791+
mtState.index = next;
693792

694-
_y = cast(UIntType) y;
793+
/* technically we should take the lowest `w`
794+
bits here, but if the tempering bitmasks
795+
`b` and `c` are set correctly, this should
796+
be unnecessary */
797+
mtState.front = z;
695798
}
696799

697800
/**
698801
Returns the current random value.
699802
*/
700-
@property UIntType front() @safe pure nothrow @nogc
803+
@property UIntType front() @safe const pure nothrow @nogc
701804
{
702-
if (mti == size_t.max) seed();
703-
return _y;
805+
return this.state.front;
704806
}
705807

706808
///
@@ -713,10 +815,6 @@ Parameters for the generator.
713815
Always $(D false).
714816
*/
715817
enum bool empty = false;
716-
717-
private UIntType[n] mt;
718-
private size_t mti = size_t.max; /* means mt is not initialized */
719-
UIntType _y = UIntType.max;
720818
}
721819

722820
/**
@@ -728,9 +826,9 @@ generation unless memory is severely restricted, in which case a $(D
728826
LinearCongruentialEngine) would be the generator of choice.
729827
*/
730828
alias Mt19937 = MersenneTwisterEngine!(uint, 32, 624, 397, 31,
731-
0x9908b0df, 11, 7,
829+
0x9908b0df, 11, 0xffffffff, 7,
732830
0x9d2c5680, 15,
733-
0xefc60000, 18);
831+
0xefc60000, 18, 1812433253);
734832

735833
///
736834
@safe unittest
@@ -811,9 +909,9 @@ alias Mt19937 = MersenneTwisterEngine!(uint, 32, 624, 397, 31,
811909
@safe pure nothrow unittest //11690
812910
{
813911
alias MT(UIntType, uint w) = MersenneTwisterEngine!(UIntType, w, 624, 397, 31,
814-
0x9908b0df, 11, 7,
912+
0x9908b0df, 11, 0xffffffff, 7,
815913
0x9d2c5680, 15,
816-
0xefc60000, 18);
914+
0xefc60000, 18, 1812433253);
817915

818916
foreach (R; std.meta.AliasSeq!(MT!(uint, 32), MT!(ulong, 32), MT!(ulong, 48), MT!(ulong, 64)))
819917
auto a = R();

0 commit comments

Comments
 (0)