NAG Numerical Routines for GPUs
The new many-core GPU computing architecture, which is designed to be multi-threaded and massively parallel, is ideally suited to Monte Carlo applications.
Monte Carlo Simulation
Monte Carlo simulation methods are one of the main numerical techniques used in finance for derivative pricing and risk management. The major drawback of such techniques is that they are computationally intensive often requiring simulations of hundreds of thousands of sample paths in order to compute a fair option value within a certain level of standard error. Using this technique on a conventional Central Processing Unit (CPU) architecture, even with the fastest desktop machines available today, may take hours to compute the current risk exposure of complex financial derivative contracts.
Random Number Generators
Random Number Generators (RNGs) are an essential building block in Monte Carlo simulations. It important to select a generator that not only has a long period and good statistical properties, but also one that is computationally efficient.
Parallelizing the RNGs
In order to make use of the high performance capabilities of GPUs, we need to utilize parallel algorithms. Specifically, we need to have an algorithm capable of producing random samples in parallel whilst preserving statistical independence.
Three well-known generators were selected: L'Ecuyer's MRG32k3a [1,2] generator, the Mersenne Twister MT19937 of Matsumoto and Nishimura, and the Sobol generator. The Sobol and MRG32k3a generators are well-suited to the GPU's architecture and are very efficient. Parallelizing the Mersenne Twister can be more demanding due to the large state size. This implementation provides good performance.
The uniform samples produced by these generators are then transformed to samples from other distributions including Normal, exponential and Gamma.
Device Function Generators
The generators are provided as compiled GPU kernels. Calling a kernel will fill a buffer in GPU memory with random numbers. When users write their own GPU pricing kernels, these numbers can be read back from the buffer and used to generate sample paths.
NAG has created GPU device function generators which avoid this round trip to GPU memory. Device function generators are CUDA functions which users can call in their own CUDA code, allowing them to embed the random number generator directly into their kernel. Numbers can then be read off the generator, used and discarded without incurring the cost of memory traffic. This is particularly well suited to kernels which require a high ratio of random numbers compared to floating point calculations.
The NAG Numerical Routines for GPUs are currently implemented in C using the C extensions provided by NVIDIA’s CUDA interface. We plan to support the emerging OpenCL standard.