GPU-Based Random Number Generators

Ying Tan, in Gpu-Based Parallel Implementation of Swarm Intelligence Algorithms, 2016

10.2.2.3 Mersenne Twister (MT)

MT [117] is one of the most widely respected RNGs, it is a twisted GFSR. The underlying algorithm of MT is as follows:

Set r w-bit numbers (xi, i = 1,2,…, r) randomly as initial values.

Let

(10.6)A=0Iw1awaw1a1,

where Iw−1 is the (w − 1) × (w − 1) identity matrix and ai,i = 1,…, w take values of either 0 or 1. Define

(10.7)xi+r=xi+sxi(w:(l+1))|xi+1(l:1)A,

where xi(w:(l+1))|xi+1(l:1) indicates the concatenation of the most significant (upper) wl bits of xi and the least significant l bits of xi+1.

Perform the following operations sequentially:

(10.8)z=xi+r(xi+rt1)z=z((zt2)&m1)z=z((zt3)&m2)z=z(xt4)ui+r=z/(2w1),

where t1, t2, t3, and t4 are integers and m1 and m2 are bit-masks and “&” is a bit-wise and operation.

ui+r,i = 1,2,…, form the required sequence on interval (0,1].

With proper parameter values, MT can generate sequence with a period as long as 219,937 and extremely good statistical properties [117]. Strategies for selecting good initial values are studied in [164] while Saito and Matsumoto [165] proposed an efficient implementation for fast execution on GPUs.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780128093627500108

Leveraging Platform Weaknesses

Mike Shema, in Hacking Web Apps, 2012

Inside the Pseudo-Random Number Generator (PRNG)

The Mersenne Twister is a strong pseudo-random number generator. In non-rigorous terms, a strong PRNG has a long period (how many values it generates before repeating itself) and a statistically uniform distribution of values (bits 0 and 1 are equally likely to appear regardless of previous values). A version of the Mersenne Twister available in many programming languages, MT19937, has an impressive period of 219937-1. Sequences with too short a period can be observed, recorded, and reused by an attacker. Sequences with long periods force the adversary to select alternate attack methods. The period of MT19937 far outlasts the number of seconds until our world ends in fire or ice (or is wiped out by a Vogon construction fleet1 for that matter). The strength of MT19937 also lies in the fact that one 32-bit value produced by it cannot be used to predict the subsequent 32-bit value. This ensures a certain degree of unpredictability.

Yet all is not perfect in terms of non-predictability. The MT19937 algorithm keeps track of its state in 624 32-bit values. If an attacker were able to gather 624 sequential values, then the entire sequence—forward and backward—could be reverse-engineered. This feature is not specific to the Mersenne Twister, most PRNG have a state mechanism that is used to generate the next value in the sequence. Knowledge of the state effectively compromises the sequence’s predictability. This is another example of where using a PRNG incorrectly can lead to its compromise. It should be impossible for an attacker to enumerate.

Linear congruential generators (LCG) use a different approach to creating numeric sequences. They predate the Internet, going as far back as 1948 [D.H. Lehmer. Mathematical methods in large-scale computing units. In Proc. 2nd Sympos. on Large-Scale Digital Calculating Machinery, Cambridge, MA, 1949, pages 141–146, Cambridge, MA, 1951. Harvard University Press.]. Simple LCG algorithms create a sequence from a formula based on a constant multiplier, a constant additive value, and a constant modulo. The details of an LCG aren’t important at the moment, but here is an example of the formula. The values of a, k, and m must be secret in order to preserve the unpredictability of the sequence.xn=a*xn-1 +kmodm

The period of an LCG is far shorter than MT19937. However, an effective attack does not need to observe more than a few sequential values. In the Journal of Modern Applied Statistical Methods, May 2003, Vol. 2, No. 1,2–280 George Marsaglia describes an algorithm for identifying and cracking a PRNG based on a congruential generator (http://education.wayne.edu/jmasm/toc3.pdf). The crack requires less than two dozen sequential samples from the sequence. The description of the cracking algorithm may sound complicated to math-averse ears, but rest assured the execution is simple. In fancy terms, the attack determines the modulo m of the LCG by finding the greatest common divisor (GCD) of the volumes of parallelepipeds2 described by vectors taken from the LCG sequence. This translates into the following Python script.

#!/usr/bin/env python

import array

from fractions import gcd

from itertools import imap, product

from numpy.linalg import det

from operator import mul, sub

values = array.array(‘l’, [308,785,930,695,864,237,1006,819,204,777,378,495,376,357,70,747,356])

vectors = [ [values[i] - values[0], values[i+1] - values[1]] for i in range(1, len(values)-1) ]

volumes = []

for i in range(0, len(vectors)-2, 2):

v = abs(det([ vectors[i], vectors[i+1] ]))

volumes.insert(-1, v)

print gcd(volumes[0], volumes[1])

The GCD reported by this script will be the modulo m used in the LCG (in some cases more than one GCD may need to be calculated before reaching the correct value). We already have a series of values for x so all that remains is to solve for a and k. The values are easily found by solving two equations for two unknowns.

This section should not be misread as a suggestion to create your own PRNG. The Mersenne Twister is a strong pseudo-random number generator. A similarly strong algorithm is called the Lagged Fibonacci. Instead this section highlights some very simple ways that a generator may inadvertently leak its internal state. Enumerating 624 sequential 32-bit values might not be feasible against a busy web site, or different requests may use different seeds, or may be numbers in the sequence are randomly skipped over. In any case it’s important that the site be aware of how it is generating random numbers and where those numbers are being used. The generation should come from a well-accepted method as opposed to home-brewed algorithms. The values should not be used such that the internal state of a PRNG can be reproduced.

We shouldn’t end this section without recommending a book more salient to random numbers: The Art of Computer Programming, Volume 2 by Donald Knuth. It is a canonical resource regarding the generation and analysis of random numbers.

Note

The rise of virtualized computing, whether called cloud or other trendy moniker, poses interesting questions about the underlying sources of entropy that operating systems rely upon for PRNG. The abstraction of CPUs, disk drives, video cards, etc. affects assumptions about a system’s behavior. It’s a narrow topic to watch, but there could be subtle attacks in the future that take advantage of possibly weaker or more predictable entropy in such systems.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9781597499514000072

Server Misconfiguration and Predictable Pages

Mike Shema, in Seven Deadliest Web Application Attacks, 2010

Inside the Pseudorandom Number Generator

The Mersenne Twister is a strong pseudorandom number generator (PRNG). A version available in many programming languages, MT19937, has a period of 2^19937 – 1. A sequence's period defines how long it continues before repeating itself. Sequences with too short of a period can be observed, recorded, and reused by an attacker. Sequences with long periods force the adversary to select alternate attack methods to passive observation. The period of MT19937 far outlasts the number of seconds until our world ends in fire or ice (or is wiped out by a Vogon construction fleetA for that matter). The strength of MT19937 also lies in the fact that one 32-bit value produced by it cannot be used to predict the subsequent 32-bit value. This ensures a certain degree of unpredictability.

Yet, all is not perfect in terms of nonpredictability. The MT19937 algorithm keeps track of its state in 624 32-bit values. If an attacker were able to gather 624 sequential values, then the entire sequence – forward and backward – could be reverse engineered. This feature is not specific to the Mersenne Twister; most PRNGs have a state mechanism that is used to generate the next value in the sequence. Knowledge of the state effectively compromises the sequence's predictability.

Linear congruential generators (LCG) use a different approach to creating numeric sequences. They predate the Internet, going as far back as 1948.B Simple LCG algorithms create a sequence from a formula based on a constant multiplier, a constant additive value, and a constant modulo. The details of an LCG aren't important at the moment, but here is an example of the formula. The values of a, k, and m must be secret to preserve the unpredictability of the sequence.

xn=a*xn1+kmodm

The period of an LCG is far shorter than MT19937. However, an effective attack does not need to observe more than a few sequential values. George Marsaglia describes an algorithm for identifying and cracking a PRNG based on a congruential generator.C The crack requires fewer than two dozen sequential samples from the sequence. The description of the cracking algorithm may sound complicated to math-averse ears, but rest assured the execution is simple. Briefly, the attack determines the modulo m of the LCG by finding the greatest common divisor (GCD) of the volumes of parallelepipedsD described by vectors taken from the LCG sequence. This translates into the following Python script.

#!/usr/bin/python

import array

from fractions import gcd

from itertools import imap, product

from numpy.linalg import det

from operator import mul, sub

values = array.array('l', [308,785,930,695,864,237,1006,819,204,777, 378,495,376,357,70,747,356])

vectors = [ [values[i] − values[0], values[i+1] − values[1]] for i in range(1, len(values)−1) ]

volumes = []

for i in range(0, len(vectors)−2, 2):

 v = abs(det([ vectors[i], vectors[i+1] ]))

 volumes.insert(−1, v)

print gcd(volumes[0], volumes[1])

The GCD reported by this script will be the modulo m used in the LCG (in some cases, more than one GCD may need to be calculated before reaching the correct value). We already have a series of values for x, so all that remains is to solve for a and k. The values are easily found by solving two equations for two unknowns.

This section should not be misread as a suggestion to create your own PRNG. The Mersenne Twister is a strong PRNG. A similarly strong algorithm is called the Lagged Fibonacci. Instead, this section highlights some very simple ways that a generator may inadvertently leak its internal state. Enumerating 624 sequential 32-bit values might not be feasible against a busy Web site, different requests may use different seeds, or maybe numbers in the sequence are randomly skipped over. In any case, it's important that the site be aware of how it is generating random numbers and where those numbers are being used. The generation should come from a well-accepted method as opposed to home-brewed algorithms. The values should not be used such that the internal state of a PRNG can be reproduced.

We shouldn't end this section without recommending a book more salient to random numbers: The Art of Computer Programming, Volume 2 by Donald Knuth. It is a canonical resource regarding the generation and analysis of random numbers.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9781597495431000049

Parallelization Techniques for Random Number Generators

Thomas Bradley, ... Paul Woodhams, in GPU Computing Gems Emerald Edition, 2011

16.4.2 Parallelization

The state size of the Mersenne Twister is too big for each thread to have its own copy. Therefore, the per-thread parallelization strategy used for the MRG32k3a is ruled out, as is the strided generation strategy used for the Sobol generator. Instead, threads within a block have to cooperate to update state and generate numbers, and the level to which this can be achieved determines the performance of the generator. Note from Eq. 16.5 that the process of advancing state is quite cheap, involving three state elements and 7 bit operations.

We follow a hybrid strategy. Each block will skip the state ahead to a given offset, and the threads will then generate a contiguous segment of points from the MT19937 sequence by striding (or leapfrogging). There are three main procedures: skipping ahead to a given point, advancing the state, and generating points from the state. We will examine how to parallelize the latter two procedures first, and then return to the question of skipping ahead.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780123849885000164

Generating random numbers

Manfred Gilli, ... Enrico Schumann, in Numerical Methods and Optimization in Finance (Second Edition), 2019

6.2.2 Mersenne Twister

Software like R and, in the more recent versions, MATLAB® provide the Mersenne Twister as standard. The name already indicates that one of its key ingredients is a Mersenne number, Mp=2p1, which, when written in binary notation, is a bit string of length p with all bits set to 1. For example, the binary representation of M4=241=15 is 11112. In other words, Mp is the highest integer that can be represented with a bit string of length p. This method uses more than one previously generated number and performs bitwise operations. Its properties are similar to those of linear congruential generators, yet its period length is Mp. The commonly used version for 32-bit platforms, MT19937, has a period length of 219,937; in other words, it takes a sequence of more than 106000 draws until it becomes repetitive. Also, sequences of all lengths up to 623 are equally probable, so there should be no obvious lattice issues. For most applications (and certainly for those discussed in this book), this seems to be a safe enough choice.

There exists a large variety of approaches that test the quality of random numbers. One of the well-known sets of such tests are the diehard tests by George Marsaglia (see Marsaglia, 1995).

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780128150658000170

27th European Symposium on Computer Aided Process Engineering

Viviani C. Onishi, ... José A. Caballero, in Computer Aided Chemical Engineering, 2017

4 Correlated Scenarios Generation

Correlated feeding scenarios are generated via Monte Carlo sampling technique from a multivariate normal (Gaussian) distribution by a random number generator algorithm. The latter was implemented in MATLAB based on the Mersenne twister algorithm proposed by Matsumoto and Nishimura (1998). The probability density function for continuous random variables (X1, X2, …, Xd) can be expressed as follows:

(3)fXX1X2Xd=1/2π2|Σ|exp12XμTΣ1Xμ

In which μ represents the dimensional vector d with the expected mean (nominal) values of each random variable. |Σ| is the determinant of the dXd covariance matrix Σ. While the diagonal element of the matrix Σ is a symmetric positive matrix that contain the variances σ2 for each variable, non-diagonal elements of Σ indicate the covariances between variables σij obtained from the correlation symmetric matrix ρij:

(4)ρij=σij/σi2σj2

This symmetric matrix relates each pair of correlated random variables, wherein non-diagonal elements can assume values between − 1 and 1. Based on real information, we assume the uncertain correlated parameters have negative correlation (with values ranging between − 1 and 0). The generated feeding scenarios are depicted in Figure 1.

Figure 1

Figure 1. Correlated feeding scenarios generated via Monte Carlo sampling technique by multivariate normal (Gaussian) distribution, considering 10% of standard deviation from expected mean (nominal) values and matrix correlation of − 0.7 (200 scenarios).

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780444639653501598

High-Performance Iterated Function Systems

Christoph Schied, ... HendrikP.A. Lensch, in GPU Computing Gems Emerald Edition, 2011

18.3.4 Static Data

Some static data is needed during runtime; this data is calculated by the host during program start-up. A set of Hammersley Points [4] is used as the starting-point pattern to ensure a stratified distribution, which increases the quality of the generated pictures. The presorting algorithm needs permutations that are generated by creating multiple arrays containing the array index and a random number calculated by the Mersenne Twister [5]. These arrays are then sorted by the random numbers, which can be discarded afterwards. The resulting set of permutations is uploaded onto the device. Since some of the implemented functions need random numbers, a set of random numbers also is created and uploaded onto the device. All of this generated data is never changed during runtime.

In the runtime phase, it is necessary to specify various parameters that characterize the fractal. Every function needs to be evaluated in p(fi) ⋅ num_threads threads, where p(fi) is the user-specified probability that function fi is chosen in the Chaos Game. The number of functions is small compared to the number of threads, and therefore, it doesn't matter if this mapping is done in warp size granularity instead of thread granularity. Each thread is mapped to a function by an index array that is compressed into a prefix sum. A thread can find out which function it has to evaluate by performing a binary search on this array. This search does not introduce a performance penalty in our experiments, but reduces the memory requirements significantly. The parameters and the prefix sum are accumulated in a struct that is uploaded into the constant memory each frame.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780123849885000188

Network Coding in the Real World

Janus Heide, ... Torben Larsen, in Network Coding, 2012

4 Practical Problems

Most operating systems can provide “random” number generators. Unfortunately these numbers are not always sufficiently random (see Fig. 4.8). This can lead to a higher probability of linear dependency which will affect the performance of the system. The easiest way to avoid such problems is to use a random number generator that has been shown to have acceptable performance [7]. See for example [8] for an implementation of the Mersenne Twister.

Figure 4.8. Dilbert on randomness. © United Feature Syndicate INC./Dist. by PIB Copenhagen 2010.

In the binary extension field, addition and subtraction operations are identical to the bitwise XOR operation, which can be performed very efficiently. Multiplication and division are more complicated. One way to implement operations over higher order fields is by using look-up tables. In this case, multiplication and division operations are performed by a number of table look-ups together with some other basic operations. In [9] several different look-up approaches are presented and small code examples are provided. For these approaches to achieve good performance the tables should reside in the cache of the CPU, otherwise frequent memory accesses can reduce the coding throughput. To this end, it is desirable to reduce the size of the tables, which depends on the size of the underlying finite field.

Alternatively, multiplication and division can be implemented by multiplying two polynomials (that represent symbols of a finite field), modulo an irreducible polynomial. Division can be performed with the Euclidean algorithm (see [10, p. 14,122] for details and code examples). In these algorithms, several simple operations are performed in order to calculate the result. The performance of both approaches depends on the platform on which they are deployed, in particular the relative speed of the CPU and memory access. Several additional optimization techniques can be deployed as well.

Matrix inversion can be performed with several different algorithms, the commonly used one is the Gauss-Jordan algorithm. This algorithm is simple and can be used to decode data partially, which is an advantage because packets arrive one at a time and hence the final decoding delay can be reduced. Other algorithms exist, see for example [11]. However, these often require that the matrix that is to be inverted has some specific properties, or only performs better under some conditions.

To compare different choices of methods and optimization techniques we look at the existing literature on implementing encoding and decoding operations for NC. As decoding is more computationally demanding than encoding we will focus on reported decoding throughput. We present the results in approximately chronological order to provide an overview of the recent results in the field.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780123809186000044

Computer networks performance modeling and simulation

Francisco J. Suárez, ... Daniel F. García, in Modeling and Simulation of Computer Networks and Systems, 2015

4.1 Random value generation

To develop simulations properly, it is mandatory to generate random values with predefined statistical distributions [6,28]. Generally, this is accomplished in two steps, which are briefly introduced in the following paragraphs.

First, a step called random-number generation, in which a sequence of random numbers between 0 and 1 is generated with a uniform distribution.

Second, a step called random-variate generation, in which the previous sequence is used to generate a new sequence with the desired distribution.

4.1.1 Random number generation

A common technique to generate a sequence of random numbers is using a recursive function (xn), where the last number generated is a function of a set of the previous numbers generated. As an example:

(7.1)xn=f(xn1,xn2,)

The sequence generated is not fully random, but pseudorandom, which is better for simulation because a sequence of pseudorandom numbers can be precisely repeated in several simulations. The type and properties of random number generators mainly depend on the generation function. Common types are Linear Congruential Generators (LGCs) [29,30] and Tausworthe Generators [31]. Currently, Mersenne Twister [32] is the common generator used in many software packages. The length of the period of the pseudorandom sequence corresponds to a prime number of Mersenne. The most used version of this generator is based on the Mersenne prime 219937−1. This generator is based on a matrix linear recurrence, which is an evolution of the Generalized Feedback Shift Register pseudorandom number generators. There is a standard implementation in portable C-code that uses 32-bit word length called MT19937 and another variant using 64-bit word length called MT19937-6, which generates a different sequence.

Before using a pseudorandom sequence in a simulation, a thorough analysis of the randomness of the sequence should be carried out. The first test involves plotting the distribution of the sequence to check if it seems uniformly distributed. Additionally, several statistical tests can be applied to the sequence to check if it matches a uniform distribution. The most common are the Chi-square [33] and the Kolmogorov-Smirnov [34] goodness of fit tests.

4.1.2 Random-variate generation

A sequence of numbers with a uniform distribution can be transformed into another sequence with a specific distribution using several methods. The most common is the inverse transformation method. It is based on the following observation: for any random variable x, with cumulated distribution function (CDF) F(x), variable u=F(x) is distributed uniformly between 0 and 1. Therefore, x can be generated from u using the inverse of F(x):

(7.2)x=F1(u)

This inverse transformation can be carried out analytically for many distributions, like exponential, geometric, logistic, Pareto or Weibull. The typical example is for the exponential distribution:

(7.3)f(x)=λeλxF(x)=1eλx=ux=(1λ)Ln(1u)=(1λ)Ln(u)

But the inverse transformation can also be done numerically, generating random uniform values u* and finding the correspondent values x* that satisfy the equation u*=F(x*), typically using bisection or Newton methods. Figure 7.8 illustrates this technique which is a good approach when an analytic inversion of F(x) is not possible.

Figure 7.8. Numeric inverse transformation.

There are additional techniques that can be applied in special cases. The composition technique can be used when the desired F(x) can be expressed as a weighted sum of other F(x)’s, and the convolution technique can be used when the random variable x can be expressed as a sum of other random variables.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780128008874000079

Path Regeneration for Random Walks

Jan Novák, ... Carsten Dachsbacher, in GPU Computing Gems Emerald Edition, 2011

26.4 Implementation Details

The source code accompanying this chapter includes a naive implementation as well as an optimized version of a path tracer using the regeneration technique. For accelerating the ray-triangle intersection tests, we use a kd-tree that is built in a preprocessing step on the CPU. The GPU threads traverse the kd-tree using a limited-size stack stored in the local thread memory. For generating random numbers, each thread maintains its own instance of a Mersenne Twister random number generator.

Naive path tracing: The straightforward implementation of path tracing is based on the following concept: for each image pixel, we create a single GPU thread that constructs the required number of paths to obtain a (largely) noise-free image. The computation is partitioned into a sequence of N batches in which each thread computes exactly one path. A batch starts with generating an initial primary ray and then the algorithm enters a loop for constructing the path. In every iteration we find the closest intersection of the ray with the scene, sample direct illumination, create a new ray for constructing the next path segment, and possibly terminate the path using Russian roulette. All the previously described tasks can be concentrated in a single large kernel; however, we achieved higher throughput with a number of specialized kernels. Such kernels benefit from higher occupancy, and the performance penalty resulting from additional global memory accesses for exchanging data between kernels is negligible in our case. Multiple lightweight kernels are also easier to debug and profile. The pipeline of the naive algorithm is show in Figure 26.5 on the left. Kernels suffering from low utilization (e.g., Trace rays) are emphasized by dark background.

Figure 26.5. Pipelines of a naive path tracing (left), path tracing that restarts in the origin (middle), and a version that regenerates the paths at the first hit point (right). The demonstration source code combines both regeneration strategies to trade between computation of direct and indirect illumination. Light and dark background emphasizes entire and low utilization of processing units, respectively.

Path regeneration: In order to regenerate terminated paths, we have to slightly adjust the loop over the kernels. We will need to execute an additional kernel after the Russian roulette that regenerates the terminated paths. Supposing that we always want to restart the paths in the origin (camera), we can call a slightly modified Generate Camera Rays kernel that first stores the contribution of the terminated path and then generates a new primary ray for each inactive thread. Once we exit the loop, the accumulated contribution is divided by the total number of paths traced for the respective pixel.

Our second strategy — restart the paths at the first hit point — is suited to devote the majority of computation time to the main source of noise, that is, to indirect illumination. This option requires executing a simple kernel that stores the sampled direct illumination and vertex parameters when the first vertex of a path is found. Instead of generating new primary rays, we will restore these values and sample the BSDF to find the new outgoing direction for gathering indirect illumination. The illustration in Figure 26.5 shows pipelines of both regeneration options.

Because it is often required to multisample direct illumination (e.g., in scenes with area light sources), using only a single sample estimate of direct illumination is not sufficient. To this end, we combine both regeneration strategies: according to a user-defined rate, we create new paths (by executing the kernel for creating primary rays), whereas in all the other cases, we regenerate the paths at the previously stored first hit point. This approach allows the user to adjust the ratio of complete paths to those built only by appending a path suffix.

The last important modification in the optimized implementation is the number of iterations that the algorithm loops over the kernels. The naive path tracer executes N batches (one for each sample) consisting of K iterations required to unbiasedly terminate C percent of paths. With the regeneration scheme we replace the two nested loops by a single loop that creates all the paths. The algorithm iterates over the kernels in two phases. The first regeneration phase is responsible for creating the majority of paths required for synthesizing the final image. Any path terminated during this phase is immediately replaced by a new path using one of the previously described strategies. As the paths are terminated only by the Russian roulette, no bias is introduced.

The length of the regeneration phase is computed as a product of the average path length and the number of per-pixel samples. Subsequently, the algorithm enters a closing phase, which omits the regeneration to let the existing paths gradually terminate. The length of the closing phase is set to the length of one batch from the naive implementation. The total number of iterations M for any regeneration technique therefore equals the total length of both phases.

In practical applications, the regeneration phase is several magnitudes longer than the closing phase. The utilization of processing units with the optimized algorithm is illustrated in Figure 26.6a. Figure 26.6b shows a rendering of a chair in a room, where the left half of the image is computed by restarting the paths in the origin only, and the right half, using the regeneration at the first vertex. In both cases, the number of iterations is identical. Notice the difference in the amount of noise from direct lighting (chair shadow) and indirect illumination (reflections on the floor and area below the chair).

Figure 26.6. Figure 26.6a (left): Utilization of processing units when the regeneration of paths is used (the color coding distinguishes different path generations). Figure 26.6b (right): Regenerating at the origin (left half) helps to reduce the noise from direct illumination and minimizes aliasing artifacts. Regenerating at the first vertex (right half) is better suited to reduce the noise that is due to indirect illumination.

Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780123849885000267