In terms of performance and scalability the NAG Library for SMP & multicore excels over many of its equivalents, without compromising on accuracy. The NAG Library for SMP & multicore was produced to enable developers and programmers to make optimal use of the multicore CPUs and processing power and shared memory parallelism of the Symmetric Multi-Processor (SMP) systems.
As part of the stringent testing and quality assurance process at NAG, the SMP Library is tested against the companion vendor libraries on each system, to ensure that using both NAG Library for SMP & multicore and the vendor library together delivers the optimal combination of performance, functionality and accuracy. NAG also encourages those working with SMP systems to conduct their own tests using the NAG Benchmark Suite.
The NAG Library for SMP & multicore is available on a wide range of SMP systems, and is designed to complement the vendor math library on each system. As well as providing a vastly greater range of numerical functionality, key LAPACK routines are optimised, often provided superior performance to that achieved using the vendor math library alone. Some examples are shown below. Differences in performance for each algorithm and problem size directly reflect the differences in wallclock runtime.
Figure 1: Cholesky factorization of a real symmetric positive definite matrix (DPOTRF)
Figure 1 shows the improved scaling in performance on multiple processors of the NAG Library for SMP & multicore routine on a Sun UltraSPARC-III system.
Figure 2: All eigenvalues and eigenvectors of real symmetric tridiagonal matrix, reduced from real symmetric matrix, using implicit QL or QR method (DSTEQR)
The QR algorithm for solving symmetric eigenproblems has been extensively modified in the NAG Library for SMP & multicore to improve serial performance and take advantage of the parallelism inherent in the problem. For the largest problem size shown in Figure 2, the NAG Library for SMP & multicore, running on an IBM POWER5 system, is over 9 times faster on 4 processors than the standard LAPACK version.
Figure 3: SVD of real bidiagonal matrix reduced from real general matrix (DBDSQR)
The QR algorithm for solving SVD problems has been modified in the NAG Library for SMP & multicore in a similar manner to that for the symmetric eigenproblem (Figure 2). In this case, the performance advantages over the standard implementation of this algorithm are even greater. For the largest problem size shown in Figure 3, the NAG Library for SMP & multicore, running on an IBM POWER5 system, is over 89 times faster on 4 processors than the standard LAPACK version.