hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_glm_predict (g02gp)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_correg_glm_predict (g02gp) allows prediction from a generalized linear model fit via nag_correg_glm_normal (g02ga), nag_correg_glm_binomial (g02gb), nag_correg_glm_poisson (g02gc) or nag_correg_glm_gamma (g02gd).

Syntax

[eta, seeta, pred, sepred, ifail] = g02gp(errfn, link, mean_p, x, isx, b, covar, vfobs, 'n', n, 'm', m, 'ip', ip, 't', t, 'off', off, 'wt', wt, 's', s, 'a', a)
[eta, seeta, pred, sepred, ifail] = nag_correg_glm_predict(errfn, link, mean_p, x, isx, b, covar, vfobs, 'n', n, 'm', m, 'ip', ip, 't', t, 'off', off, 'wt', wt, 's', s, 'a', a)
Note: the interface to this routine has changed since earlier releases of the toolbox:
At Mark 23: wt, off, s and a were made optional; weight and offset were removed from the interface; t was made optional (default to vector of 1s)

Description

A generalized linear model consists of the following elements:
(i) A suitable distribution for the dependent variable y.
(ii) A linear model, with linear predictor η=Xβ, where X is a matrix of independent variables and β a column vector of p parameters.
(iii) A link function g. between the expected value of y and the linear predictor, that is Ey=μ=gη.
In order to predict from a generalized linear model, that is estimate a value for the dependent variable, y, given a set of independent variables X, the matrix X must be supplied, along with values for the parameters β and their associated variance-covariance matrix, C. Suitable values for β and C are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example nag_correg_glm_normal (g02ga), nag_correg_glm_binomial (g02gb), nag_correg_glm_poisson (g02gc) or nag_correg_glm_gamma (g02gd). The predicted variable, and its standard error can then be obtained from:
y^ = g-1η ,   se y^ = δg-1x δx η seη + Ifobs Vary  
where
η=o+Xβ ,   seη = diagXCXT ,  
o is a vector of offsets and Ifobs=0, if the variance of future observations is not taken into account, and 1 otherwise. Here diagA indicates the diagonal elements of matrix A.
If required, the variance for the ith future observation, Varyi, can be calculated as:
Varyi = ϕ Vθ wi  
where wi is a weight, ϕ is the scale (or dispersion) parameter, and Vθ is the variance function. Both the scale parameter and the variance function depend on the distribution used for the y, with:
Poisson Vθ=μi, ϕ=1
binomial Vθ=μiti-μiti, ϕ=1
Normal Vθ=1
gamma Vθ=μi2
In the cases of a Normal and gamma error structure, the scale parameter (ϕ), is supplied by you. This value is usually obtained from the function used to fit the prediction model. In many cases, for a Normal error structure, ϕ=σ^2, i.e., the estimated variance.

References

McCullagh P and Nelder J A (1983) Generalized Linear Models Chapman and Hall

Parameters

Compulsory Input Parameters

1:     errfn – string (length ≥ 1)
Indicates the distribution used to model the dependent variable, y.
errfn='B'
The binomial distribution is used.
errfn='G'
The gamma distribution is used.
errfn='N'
The Normal (Gaussian) distribution is used.
errfn='P'
The Poisson distribution is used.
Constraint: errfn='B', 'G', 'N' or 'P'.
Indicates which link function to be used.
link='C'
A complementary log-log link is used.
link='E'
An exponent link is used.
link='G'
A logistic link is used.
link='I'
An identity link is used.
link='L'
A log link is used.
link='P'
A probit link is used.
link='R'
A reciprocal link is used.
link='S'
A square root link is used.
Details on the functional form of the different links can be found in the G02 Chapter Introduction.
Constraints:
  • if errfn='B', link='C', 'G' or 'P';
  • otherwise link='E', 'I', 'L', 'R' or 'S'.
3:     mean_p – string (length ≥ 1)
Indicates if a mean term is to be included.
mean_p='M'
A mean term, intercept, will be included in the model.
mean_p='Z'
The model will pass through the origin, zero-point.
Constraint: mean_p='M' or 'Z'.
4:     xldx: – double array
The first dimension of the array x must be at least n.
The second dimension of the array x must be at least m.
xij must contain the ith observation for the jth independent variable, for i=1,2,,n and j=1,2,,m.
5:     isxm int64int32nag_int array
Indicates which independent variables are to be included in the model.
If isxj>0, the jth independent variable is included in the regression model.
Constraints:
  • isxj0, for i=1,2,,m;
  • if mean_p='M', exactly ip-1 values of isx must be >0;
  • if mean_p='Z', exactly ip values of isx must be >0.
6:     bip – double array
The model parameters, β.
If mean_p='M', b1 must contain the mean parameter and bi+1 the coefficient of the variable contained in the jth independent x, where isxj is the ith positive value in the array isx.
If mean_p='Z', bi must contain the coefficient of the variable contained in the jth independent x, where isxj is the ith positive value in the array isx.
7:     covarip×ip+1/2 – double array
The upper triangular part of the variance-covariance matrix, C, of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters βi and βj, that is the values stored in bi and bj, should be supplied in covarj×j-1/2+i, for i=1,2,,ip and j=i,,ip.
Constraint: the matrix represented in covar must be a valid variance-covariance matrix.
8:     vfobs – logical scalar
If vfobs=true, the variance of future observations is included in the standard error of the predicted variable (i.e., Ifobs=1), otherwise Ifobs=0.

Optional Input Parameters

1:     n int64int32nag_int scalar
Default: the dimension of the arrays t, off, wt and the first dimension of the array x. (An error is raised if these dimensions are not equal.)
n, the number of observations.
Constraint: n1.
2:     m int64int32nag_int scalar
Default: the dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
m, the total number of independent variables.
Constraint: m1.
3:     ip int64int32nag_int scalar
Default: the dimension of the array b.
The number of independent variables in the model, including the mean or intercept if present.
Constraint: ip>0.
4:     t: – double array
The dimension of the array must be at least n if errfn='B', and at least 1 otherwise
If errfn='B', ti must contain the binomial denominator, ti, for the ith observation.
Otherwise t is not referenced.
Constraint: if errfn='B', ti0.0, for i=1,2,,n.
5:     off: – double array
The dimension of the array must be at least n if offset='Y', and at least 1 otherwise
If offset='Y', offi must contain the offset oi, for the ith observation.
Otherwise off is not referenced.
6:     wt: – double array
The dimension of the array must be at least n if weight='W' and vfobs=true, and at least 1 otherwise
If weight='W' and vfobs=true, wti must contain the weight, wi, for the ith observation.
If the variance of future observations is not included in the standard error of the predicted variable, wt is not referenced.
Constraint: if vfobs=true and weight='W', wti0., for i=1,2,,i.
7:     s – double scalar
Default: 0
If errfn='N' or 'G' and vfobs=true, the scale parameter, ϕ.
Otherwise s is not referenced and ϕ=1.
Constraint: if errfn='N' or 'G' and vfobs=true, s>0.0.
8:     a – double scalar
Default: 0
If link='E', a must contain the power of the exponential.
If link'E', a is not referenced.
Constraint: if link='E', a0.0.

Output Parameters

1:     etan – double array
The linear predictor, η.
2:     seetan – double array
The standard error of the linear predictor, seη.
3:     predn – double array
The predicted value, y^.
4:     sepredn – double array
The standard error of the predicted value, sey^. If predi could not be calculated, then nag_correg_glm_predict (g02gp) returns ifail=22, and sepredi is set to -99.0.
5:     ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Note: nag_correg_glm_predict (g02gp) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

   ifail=1
On entry, errfn is invalid.
   ifail=2
On entry, errfn and link combination is invalid.
On entry, link is invalid.
   ifail=3
On entry, mean_p is invalid.
   ifail=4
On entry, offset is invalid.
   ifail=5
On entry, weight is invalid.
   ifail=6
Constraint: n1.
   ifail=8
Constraint: ldxn.
   ifail=9
Constraint: m1.
   ifail=10
On entry, isx not consistent with ip.
   ifail=11
Constraint: ip>0.
   ifail=12
Constraint: ti0.0, for all i.
   ifail=14
Constraint: wti0.0, for all i.
   ifail=15
Constraint: s>0.0.
   ifail=16
On entry, a=0.0.
   ifail=18
On entry, covari<0.0 for at least one diagonal element.
W  ifail=22
At least one predicted value could not be calculated as required. sepred is set to -99.0 for affected predicted values.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

Not applicable.

Further Comments

None.

Example

The model
y = 1 β1 + β2 x + ε  
is fitted to a training dataset with five observations. The resulting model is then used to predict the response for two new observations.
function g02gp_example


fprintf('g02gp example results\n\n');

x = [ 1;  2; 3; 4; 5];
y = [25; 10; 6; 4; 3];

isx = [int64(1)];
ip  = int64(2);

link   = 'R';
mean_p = 'M';
s      = 0;

% Fit generalized linear model, with Normal errors to training data
[s, rss, idf, b, irank, se, covar, v, ifail] = ...
  g02ga( ...
         link, mean_p, x, isx, ip, y, s);

% Display parameter estimates for training data
fprintf('\nResidual sum of squares =  %12.4e\n', rss);
fprintf('Degrees of freedom      =  %d\n', idf);
fprintf('\n      Estimate     Standard error\n');
for i = 1:ip
  fprintf('%14.4f %14.4f\n', b(i), se(i));
end

% Prediction data
x = [32; 18];

% compute redicted values
errfn  = 'N';
vfobs = true;
[eta, seeta, pred, sepred, ifail] = ...
  g02gp( ...
         errfn, link, mean_p, x, isx, b, covar, vfobs, 's', s);

% Display predicted values
fprintf('\n  i      eta          se(eta)      predicted    se(predicted)\n');
for i = 1:ip
  fprintf('%3d%13.5f%13.5f%13.5f%13.5f\n', i, eta(i), seeta(i), ...
          pred(i), sepred(i));
end


g02gp example results


Residual sum of squares =    3.8717e-01
Degrees of freedom      =  3

      Estimate     Standard error
       -0.0239         0.0028
        0.0638         0.0026

  i      eta          se(eta)      predicted    se(predicted)
  1      2.01807      0.08168      0.49552      0.35981
  2      1.12472      0.04476      0.88911      0.36098

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015