NAG Library Routine Document

e02bff (dim1_spline_deriv_vector)

1
Purpose

e02bff evaluates a cubic spline and up to its first three derivatives from its B-spline representation at a vector of points. e02bff can be used to compute the values and derivatives of cubic spline fits and interpolants produced by reference to e01baf, e02baf and e02bef.

2
Specification

Fortran Interface
Subroutine e02bff ( start, ncap7, lamda, c, deriv, xord, x, ixloc, nx, s, lds, iwrk, liwrk, ifail)
Integer, Intent (In):: start, ncap7, deriv, xord, nx, lds, liwrk
Integer, Intent (Inout):: ixloc(nx), iwrk(liwrk), ifail
Real (Kind=nag_wp), Intent (In):: lamda(ncap7), c(ncap7), x(nx)
Real (Kind=nag_wp), Intent (Inout):: s(lds,*)
C Header Interface
#include <nagmk26.h>
void  e02bff_ (const Integer *start, const Integer *ncap7, const double lamda[], const double c[], const Integer *deriv, const Integer *xord, const double x[], Integer ixloc[], const Integer *nx, double s[], const Integer *lds, Integer iwrk[], const Integer *liwrk, Integer *ifail)

3
Description

e02bff evaluates the cubic spline sx and optionally derivatives up to order 3 for a vector of points xj, for j=1,2,,nx. It is assumed that sx is represented in terms of its B-spline coefficients ci, for i=1,2,,n-+3, and (augmented) ordered knot set λi, for i=1,2,,n-+7, (see e02baf and e02bef), i.e.,
sx = i=1q ci Nix .  
Here q=n-+3, n- is the number of intervals of the spline and Nix denotes the normalized B-spline of degree 3 (order 4) defined upon the knots λi,λi+1,,λi+4. The knots λ5,λ6,,λn-+3 are the interior knots. The remaining knots, λ1, λ2, λ3, λ4 and λn-+4, λn-+5, λn-+6, λn+7- are the exterior knots. The knots λ4 and λn-+4 are the boundaries of the spline.
Only abscissae satisfying,
λ4 xj λn-+4 ,  
will be evaluated. At a simple knot λi (i.e., one satisfying λi-1<λi<λi+1), the third derivative of the spline is, in general, discontinuous. At a multiple knot (i.e., two or more knots with the same value), lower derivatives, and even the spline itself, may be discontinuous. Specifically, at a point x=u where (exactly) r knots coincide (such a point is termed a knot of multiplicity r), the values of the derivatives of order 4-j, for j=1,2,,r, are, in general, discontinuous. (Here 1r4; r>4 is not meaningful.) The maximum order of the derivatives to be evaluated Dord, and the left- or right-handedness of the computation when an abscissa corresponds exactly to an interior knot, are determined by the value of deriv.
Each abscissa (point at which the spline is to be evaluated) xj contained in x has an associated enclosing interval number, ixlocj either supplied or returned in ixloc (see argument start). A simple call to e02bff would set start=0 and the contents of ixloc need never be set nor referenced, and the following description on modes of operation can be ignored. However, where efficiency is an important consideration, the following description will help to choose the appropriate mode of operation.
The interval numbers are used to determine which B-splines must be evaluated for a given abscissa, and are defined as
ixlocj = 0 xj < λ1 4 λ4 = xj k λk < xj < λk+1 k λ4 < λk = xj left derivatives k xj = λk+1 < λ n-+4 right derivatives or no derivatives n-+4 λn-+4 = xj >n-+7 xj > λn-+7 (1)
The algorithm has two modes of vectorization, termed here sorted and unsorted, which are selectable by the argument start.
Furthermore, if the supplied abscissae are sufficiently ordered, as indicated by the argument xord, the algorithm will take advantage of significantly faster methods for the determination of both the interval numbers and the subsequent spline evaluations.
The sorted mode has two phases, a sorting phase and an evaluation phase. This mode is recommended if there are many abscissae to evaluate relative to the number of intervals of the spline, or the abscissae are distributed relatively densely over a subsection of the spline. In the first phase, ixlocj is determined for each xj and a permutation is calculated to sort the xj by interval number. The first phase may be either partially or completely by-passed using the argument start if the enclosing segments and/or the subsequent ordering are already known a priori, for example if multiple spline coefficients c are to be evaluated over the same set of knots lamda.
In the second phase of the sorted mode, spline approximations are evaluated by segment, so that non-abscissa dependent calculations over a segment may be reused in the evaluation for all abscissae belonging to a specific segment. For example, all third derivatives of all abscissae in the same segment will be identical.
In the unsorted mode of vectorization, no a priori segment sorting is performed, and if the abscissae are not sufficiently ordered, the evaluation at an abscissa will be independent of evaluations at other abscissae; also non-abscissa dependent calculations over a segment will be repeated for each abscissa in a segment. This may be quicker if the number of abscissa is small in comparison to the number of knots in the spline, and they are distributed sparsely throughout the domain of the spline. This is effectively a direct vectorization of e02bbf and e02bcf, although if the enclosing interval numbers ixlocj are known, these may again be provided.
If the abscissae are sufficiently ordered, then once the first abscissa in a segment is known, an efficient algorithm will be used to determine the location of the final abscissa in this segment. The spline will subsequently be evaluated in a vectorized manner for all the abscissae indexed between the first and last of the current segment.
If no derivatives are required, the spline evaluation is calculated by taking convex combinations due to de Boor (1972). Otherwise, the calculation of sx and its derivatives is based upon,
(i) evaluating the nonzero B-splines of orders 1, 2, 3 and 4 by recurrence (see Cox (1972) and Cox (1978)),
(ii) computing all derivatives of the B-splines of order 4 by applying a second recurrence to these computed B-spline values (see de Boor (1972)),
(iii) multiplying the fourth-order B-spline values and their derivative by the appropriate B-spline coefficients, and summing, to yield the values of sx and its derivatives.
The method of convex combinations is significantly faster than the recurrence based method. If higher derivatives of order 2 or 3 are not required, as much computation as possible is avoided.

4
References

Cox M G (1972) The numerical evaluation of B-splines J. Inst. Math. Appl. 10 134–149
Cox M G (1978) The numerical evaluation of a spline from its B-spline representation J. Inst. Math. Appl. 21 135–143
de Boor C (1972) On calculating with B-splines J. Approx. Theory 6 50–62

5
Arguments

1:     start – IntegerInput
On entry: indicates the completion state of the first phase of the algorithm.
start=0
The enclosing interval numbers ixlocj for the abscissae xj contained in x have not been determined, and you wish to use the sorted mode of vectorization.
start=1
The enclosing interval numbers ixlocj have been determined and are provided in ixloc, however the required permutation and interval related information has not been determined and you wish to use the sorted mode of vectorization.
start=2
You wish to use the sorted mode of vectorization, and the entire first phase has been completed, with the enclosing interval numbers supplied in ixloc, and the required permutation and interval related information provided in iwrk (from a previous call to e02bff).
start=10
The enclosing interval numbers ixlocj for the abscissae xj contained in x have not been determined, and you wish to use the unsorted mode of vectorization.
start=11
The enclosing interval numbers ixlocj for the abscissae xj contained in x have been supplied in ixloc, and you wish to use the unsorted mode of vectorization.
Constraint: start=0, 1, 2, 10 or 11.
Additional: start=0 or 10 should be used unless you are sure that the knot set is unchanged between calls.
2:     ncap7 – IntegerInput
On entry: n-+7, where n- is the number of intervals of the spline (which is one greater than the number of interior knots, i.e., the knots strictly within the range λ4 to λn-+4 over which the spline is defined). Note that if e02bef was used to generate the knots and spline coefficients then ncap7 should contain the same value as returned in n by e02bef.
Constraint: ncap78.
3:     lamdancap7 – Real (Kind=nag_wp) arrayInput
On entry: lamdaj must be set to the value of the jth member of the complete set of knots, λj, for j=1,2,,n-+7.
Constraint: the lamdaj must be in nondecreasing order with lamdancap7-3>lamda4.
4:     cncap7 – Real (Kind=nag_wp) arrayInput
On entry: the coefficient ci of the B-spline Nix, for i=1,2,,n-+3. The remaining elements of the array are not referenced.
5:     deriv – IntegerInput
On entry: determines the maximum order of derivatives required, Dord.
If deriv<0 left derivatives are calculated, otherwise right derivatives are calculated. For abscissae satisfying xj=λ4 or xj=λn-+4 only right-handed or left-handed computation will be used respectively. For abscissae which do not coincide exactly with a knot, the handedness of the computation is immaterial.
deriv=0
No derivatives required.
deriv=±1
Only sx and its first derivative are required.
deriv=±2
Only sx and its first and second derivatives are required.
deriv=±3
sx and its first, second and third derivatives are required.
Note: if deriv is greater than 3 only the derivatives up to and including 3 will be returned.
6:     xord – IntegerInput
On entry: indicates whether x is supplied in a sufficiently ordered manner. If x is sufficiently ordered e02bff will complete faster.
xord0
The abscissae in x are ordered at least by ascending interval, in that any two abscissae contained in the same interval are only separated by abscissae in the same interval, and the intervals are arranged in ascending order. For example, xj<xj+1, for j=1,2,,nx-1.
xord=0
The abscissae in x are not sufficiently ordered.
7:     xnx – Real (Kind=nag_wp) arrayInput
On entry: the abscissae xj, for j=1,2,,nx. If start=0 or 10 then evaluations will only be performed for these xj satisfying λ4xjλn-+4. Otherwise evaluation will be performed unless the corresponding element of ixloc contains an invalid interval number. Please note that if the ixlocj is a valid interval number then no check is made that xj actually lies in that interval.
Constraint: at least one abscissa must fall between lamda4 and lamdancap7-3.
8:     ixlocnx – Integer arrayInput/Output
On entry: if start=1, 2 or 11, if you wish xj to be evaluated, ixlocj must be the enclosing interval number ixlocj of the abscissae xj (see (1)). If you do not wish xj to be evaluated, you may set the interval number to be either less than 4 or greater than n-+4.
Otherwise, ixloc need not be set.
On exit: if start=1, 2 or 11, ixloc is unchanged on exit.
Otherwise, ixlocj, contains the enclosing interval number ixlocj, for the abscissa supplied in xj, for j=1,2,,nx. Evaluations will only be performed for abscissae xj satisfying λ4xjλn-+4. If evaluation is not performed ixlocj is set to 0 if xj<λ4 or n-+7 if xj>λn-+4.
Constraint: if start=1, 2 or 11, at least one element of ixloc must be between 4 and ncap7-3.
9:     nx – IntegerInput
On entry: nx, the total number of abscissae contained in x, including any that will not be evaluated.
Constraint: nx1.
10:   slds* – Real (Kind=nag_wp) arrayOutput
Note: the second dimension of the array s must be at least Dord +1 , see deriv for the definition of Dord.
On exit: if xj is valid, sjd will contain the (d-1)th derivative of sx, for d=1,2,,Dord+1 and j=1,2,,nx. In particular, sj1 will contain the approximation of sxj for all legal values in x.
11:   lds – IntegerInput
On entry: the first dimension of the array s as declared in the (sub)program from which e02bff is called.
Constraint: ldsnx, regardless of the acceptability of the elements of x.
12:   iwrkliwrk – Integer arrayInput/Output
On entry: if start=2, iwrk must be unchanged from a previous call to e02bff with start=0 or 1.
Otherwise, iwrk need not be set.
On exit: if start=10 or 11, iwrk is unchanged on exit.
Otherwise, iwrk contains the required permutation of elements of x, if any, and information related to the division of the abscissae xj between the intervals derived from lamda.
13:   liwrk – IntegerInput
On entry: the dimension of the array iwrk as declared in the (sub)program from which e02bff is called.
Constraint: if start=0, 1 or 2, liwrk3+3×nx.
14:   ifail – IntegerInput/Output
On entry: ifail must be set to 0, -1 or 1. If you are unfamiliar with this argument you should refer to Section 3.4 in How to Use the NAG Library and its Documentation for details.
For environments where it might be inappropriate to halt program execution when an error is detected, the value -1 or 1 is recommended. If the output of error messages is undesirable, then the value 1 is recommended. Otherwise, because for this routine the values of the output arguments may be useful even if ifail0 on exit, the recommended value is -1. When the value -1 or 1 is used it is essential to test the value of ifail on exit.
On exit: ifail=0 unless the routine detects an error or a warning has been flagged (see Section 6).

6
Error Indicators and Warnings

If on entry ifail=0 or -1, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Note: e02bff may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the routine:
ifail=1
On entry, at least one element of x has an enclosing interval number in ixloc outside the set allowed by the provided spline. The spline has been evaluated for all x with enclosing interval numbers inside the allowable set.
value entries of x were indexed below the lower bound value.
value entries of x were indexed above the upper bound value.
ifail=2
On entry, all elements of x had enclosing interval numbers in ixloc outside the domain allowed by the provided spline.
value entries of x were indexed below the lower bound value.
value entries of x were indexed above the upper bound value.
ifail=11
On entry, start=value.
Constraint: start=0, 1, 2, 10 or 11.
ifail=12
On entry, start=2 and nx is not consistent with the previous call to e02bff.
On entry, nx=value.
Constraint: nx=value.
ifail=21
On entry, ncap7=value.
Constraint: ncap78.
ifail=31
On entry, lamda4=value, ncap7=value and lamdancap7-3=value.
Constraint: lamda4< lamdancap7-3.
ifail=91
On entry, nx=value.
Constraint: nx1.
ifail=111
On entry, lds=value.
Constraint: ldsnx=value.
ifail=131
On entry, liwrk=value.
Constraint: liwrk3×nx+3=value.
ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
See Section 3.9 in How to Use the NAG Library and its Documentation for further information.
ifail=-399
Your licence key may have expired or may not have been installed correctly.
See Section 3.8 in How to Use the NAG Library and its Documentation for further information.
ifail=-999
Dynamic memory allocation failed.
See Section 3.7 in How to Use the NAG Library and its Documentation for further information.

7
Accuracy

The computed value of sx has negligible error in most practical situations. Specifically, this value has an absolute error bounded in modulus by 18×cmax×machine precision, where cmax is the largest in modulus of cj, cj+1, cj+2 and cj+3, and j is an integer such that λj+3<xλj+4. If cj, cj+1, cj+2 and cj+3 are all of the same sign, then the computed value of sx has relative error bounded by 20×machine precision. For full details see Cox (1978).
No complete error analysis is available for the computation of the derivatives of sx. However, for most practical purposes the absolute errors in the computed derivatives should be small. Note that this is in comparison to the derivatives of the spline, which may or may not be comparable to the derivatives of the function that has been approximated by the spline.

8
Parallelism and Performance

e02bff is threaded by NAG for parallel execution in multithreaded implementations of the NAG Library.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

9
Further Comments

If using the sorted mode of vectorization, the time required for the first phase to determine the enclosing intervals is approximately proportional to Onx logn-. The time required to then generate the required permutations and interval information is Onx if x is ordered sufficiently, or at worst O nx minnx,n- log minnx,n-  if x is not ordered. The time required by the second phase is then proportional to Onx.
If using the unsorted mode of vectorization, the time required is proportional to O nx logn-  if the enclosing interval numbers are not provided, or Onx  if they are provided. However, the repeated calculation of various quantities will typically make this slower than the sorted mode when the ratio of abscissae to knots is high, or the abscissae are densely distributed over a relatively small subset of the intervals of the spline.
Note: the routine does not test all the conditions on the knots given in the description of lamda in Section 5, since to do this would result in a computation time with a linear dependency upon n- instead of logn-. All the conditions are tested in e02baf and e02bef, however.

10
Example

This example fits a spline through a set of data points using e02bef and then evaluates the spline at a set of supplied abscissae.

10.1
Program Text

Program Text (e02bffe.f90)

10.2
Program Data

Program Data (e02bffe.d)

10.3
Program Results

Program Results (e02bffe.r)