# NAG FL Interfacec06saf (fast_​gauss)

## ▸▿ Contents

Settings help

FL Name Style:

FL Specification Language:

## 1Purpose

c06saf calculates the multidimensional fast Gauss transform.

## 2Specification

Fortran Interface
 Subroutine c06saf ( d, srcs, n, trgs, m, q, p1, p2, k, hin, lhin, tol, v, term,
 Integer, Intent (In) :: d, n, m, lhin Integer, Intent (Inout) :: p1, p2, k, ifail Real (Kind=nag_wp), Intent (In) :: srcs(d,n), trgs(d,m), q(n), hin(lhin), tol Real (Kind=nag_wp), Intent (Out) :: v(m), term(m)
#include <nag.h>
 void c06saf_ (const Integer *d, const double srcs[], const Integer *n, const double trgs[], const Integer *m, const double q[], Integer *p1, Integer *p2, Integer *k, const double hin[], const Integer *lhin, const double *tol, double v[], double term[], Integer *ifail)
The routine may be called by the names c06saf or nagf_sum_fast_gauss.

## 3Description

c06saf calculates the $d$-dimensional fast Gauss transform (FGT), $\stackrel{^}{G}\left(y\right)$, that approximates the discrete Gauss transform (DGT), $G\left(y\right)$, evaluated at a set of target points ${y}_{\mathit{j}}$, for $\mathit{j}=1,2,\dots ,m\in {ℝ}^{d}$. The DGT is defined as:
 $G(yj)= ∑ i=1 n qi e - ‖yj-xi‖ 2 2 / h i 2 , j=1,…,m$
where ${x}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n\in {ℝ}^{d}$, are the Gaussian source points, ${q}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n\in {ℝ}^{+}$, are the source weights and ${h}_{\mathit{i}}$, for $\mathit{i}=1,2,\dots ,n\in {ℝ}^{+}$, are the source standard deviations (alternatively source scales or source bandwidths).
This subroutine implements the improved FGT algorithm presented in Raykar and Duraiswami (2005). The algorithm clusters the sources into $k$ distinct clusters and then computes two Taylor series approximations per cluster with ${p}_{1}$ and ${p}_{2}$ terms respectively. You must provide ${p}_{1}$, ${p}_{2}$ and $k$ when calling the subroutine. See Section 7 below for a further discussion on accuracy when choosing their values.
The input array ${\mathbf{hin}}$ of this routine is designed to allow maximum flexibility in the supply of the standard deviation arguments by reusing, in a cyclic manner, elements of the array when it is less than $n$ elements long. For example, if all Gaussian sources have the same standard deviation then it is only necessary to set ${\mathbf{lhin}}$ to $1$ and to provide the value of the standard deviation in ${\mathbf{hin}}\left(1\right)$; the routine will then automatically expand ${\mathbf{hin}}$ to be of length $n$. For further details please see Section 2.6 in the G01 Chapter Introduction.
Greengard L and Strain J (1991) The Fast Gauss Transform SIAM J. Sci. Statist. Comput. 12(1) 79–94
Raykar V C and Duraiswami R (2005) Improved Fast Gauss Transform With Variable Source Scales University of Maryland Technical Report CS-TR-4727/UMIACS-TR-2005-34

## 5Arguments

1: $\mathbf{d}$Integer Input
On entry: $d$, the number of dimensions.
Constraint: ${\mathbf{d}}>0$.
2: $\mathbf{srcs}\left({\mathbf{d}},{\mathbf{n}}\right)$Real (Kind=nag_wp) array Input
On entry: $x$, the locations of the Gaussian sources.
3: $\mathbf{n}$Integer Input
On entry: $n$, the number of Gaussian sources.
Constraint: ${\mathbf{n}}>0$.
4: $\mathbf{trgs}\left({\mathbf{d}},{\mathbf{m}}\right)$Real (Kind=nag_wp) array Input
On entry: $y$, the locations of the target points at which the FGT will be evaluated.
5: $\mathbf{m}$Integer Input
On entry: $m$, the number of target points.
Constraint: ${\mathbf{m}}>0$.
6: $\mathbf{q}\left({\mathbf{n}}\right)$Real (Kind=nag_wp) array Input
On entry: $q$, the weights of the Gaussian sources.
7: $\mathbf{p1}$Integer Input/Output
On entry: ${p}_{1}$, the number of terms of the first Taylor series to be evaluated.
On exit: p1 is unchanged.
Constraint: ${\mathbf{p1}}>0$.
8: $\mathbf{p2}$Integer Input/Output
On entry: ${p}_{2}$, the number of terms of the second Taylor series to be evaluated.
On exit: p2 is unchanged.
Constraint: ${\mathbf{p2}}>0$.
9: $\mathbf{k}$Integer Input/Output
On entry: $k$, the number of clusters into which the source points will be aggregated.
On exit: k is unchanged.
Constraint: $1\le {\mathbf{k}}\le {\mathbf{n}}$.
10: $\mathbf{hin}\left({\mathbf{lhin}}\right)$Real (Kind=nag_wp) array Input
On entry: $h$, the standard deviations of the Gaussian sources. If ${\mathbf{lhin}}<{\mathbf{n}}$, the array will be expanded automatically by repeating hin until it is of length n. See Section 2.6 in the G01 Chapter Introduction for further information.
Constraint: ${\mathbf{hin}}\left(\mathit{i}\right)>0.0$, for $\mathit{i}=1,2,\dots ,{\mathbf{lhin}}$.
11: $\mathbf{lhin}$Integer Input
On entry: the length of the array hin.
Constraint: $1\le {\mathbf{lhin}}\le {\mathbf{n}}$.
12: $\mathbf{tol}$Real (Kind=nag_wp) Input
On entry: $\epsilon$, the desired accuracy of the FGT approximation of the DGT. Determines the radius of the source clusters: the contribution of a source point to the FGT approximation at a target point is disregarded if the source is outside the corresponding cluster radius.
Constraint: ${\mathbf{tol}}>0.0$.
13: $\mathbf{v}\left({\mathbf{m}}\right)$Real (Kind=nag_wp) array Output
On exit: $\stackrel{^}{G}\left(y\right)$, the value of the FGT evaluated at $y$.
14: $\mathbf{term}\left({\mathbf{m}}\right)$Real (Kind=nag_wp) array Output
On exit: ${\mathbf{term}}\left(j\right)$ contains the absolute value of the final Taylor series term that is largest, relative to the size of the sum of the corresponding series, across all clusters that contribute to the FGT at target point ${\mathbf{v}}\left(j\right)$.
15: $\mathbf{ifail}$Integer Input/Output
On entry: ifail must be set to $0$, $-1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $-1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $-1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $0$ is recommended. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

## 6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
${\mathbf{ifail}}=1$
On entry, ${\mathbf{d}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{d}}>0$.
${\mathbf{ifail}}=2$
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}>0$.
${\mathbf{ifail}}=3$
On entry, ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{m}}>0$.
${\mathbf{ifail}}=4$
On entry, ${\mathbf{p1}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{p1}}>0$.
${\mathbf{ifail}}=5$
On entry, ${\mathbf{p2}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{p2}}>0$.
${\mathbf{ifail}}=6$
On entry, ${\mathbf{k}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: $1\le {\mathbf{k}}\le {\mathbf{n}}$.
${\mathbf{ifail}}=7$
On entry, ${\mathbf{hin}}\left(⟨\mathit{\text{value}}⟩\right)=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{hin}}\left(\mathit{i}\right)>0.0$, for $\mathit{i}=1,2,\dots ,{\mathbf{lhin}}$.
${\mathbf{ifail}}=8$
On entry, ${\mathbf{lhin}}=⟨\mathit{\text{value}}⟩$.
Constraint: $1\le {\mathbf{lhin}}\le {\mathbf{n}}$.
${\mathbf{ifail}}=9$
On entry, ${\mathbf{tol}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{tol}}>0.0$.
${\mathbf{ifail}}=10$
On exit, ${\mathbf{p1}}=⟨\mathit{\text{value}}⟩$, ${\mathbf{p2}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{k}}=⟨\mathit{\text{value}}⟩$.
p1, p2 or k may have been too small to calculate ${\mathbf{v}}\left({\mathbf{m}}\right)$ to the required accuracy tol.
${\mathbf{ifail}}=-99$
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

## 7Accuracy

The routine does not currently implement the procedure described in Raykar and Duraiswami (2005) for automatically determining values for p1, p2 and k. Nonzero values must, therefore, be provided for these parameters when calling the routine.
For a given set of source and target points and a specified tolerance, there is an interaction between the number of clusters, k, and the number of Taylor series terms, p1 and p2: if the sources are clustered together in fewer clusters (small k) then more terms will be needed in each cluster's Taylor series (large p1 and p2) to capture the effect of the source points further from the cluster centres. Increasing the number of clusters reduces their individual radii and requires fewer terms in their Taylor series, but increases the number of Taylor series that must be evaluated overall.
If the source and target points are uniformly distributed in a unit hypercube, Raykar and Duraiswami (2005) advise users to select ${\mathbf{k}}\sim ⌈{\left({h}_{\mathrm{max}}+{h}_{\mathrm{min}}/2\right)}^{-d}⌉$. If the points are not uniformly distributed then more clusters than this will be needed to calculate the FGT to within the specified tol without requiring prohibitively large values for p1 and p2.
There is less guidance available for selecting good values for p1 and p2. As the number of Taylor series terms is a major factor on the computation time taken by this routine, you are advised to initially try a small number, e.g., $20$ or so, and then tune p1 and p2 up or down based on the values returned. Note that p1 and p2 are not required to have identical values.
To aid the selection of values for p1, p2 and k, the routine returns in ${\mathbf{term}}\left(j\right)$ the absolute value of the final Taylor series term that is largest, relative to the size of the sum of the corresponding series, across all clusters that contribute to the FGT at target point $j$. If this value is larger than the requested tol, the routine will additionally return a nonzero ifail value and you are advised to re-run the routine with larger p1, p2 or k.

## 8Parallelism and Performance

c06saf is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
c06saf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

The time complexity of the algorithm implemented by this subroutine is $O\left(M+N\right)$, versus the $O\left(MN\right)$ time complexity of evaluating the DGT directly.

## 10Example

In this example values for $x$, $y$, ${p}_{1}$, ${p}_{2}$, $k$ and $\epsilon$ are read in, $\stackrel{^}{G}\left(y\right)$ calculated and the results displayed.

### 10.1Program Text

Program Text (c06safe.f90)

### 10.2Program Data

Program Data (c06safe.d)

### 10.3Program Results

Program Results (c06safe.r)