Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

Chapter Contents
Chapter Introduction
NAG Toolbox

# NAG Toolbox: nag_correg_glm_predict (g02gp)

## Purpose

nag_correg_glm_predict (g02gp) allows prediction from a generalized linear model fit via nag_correg_glm_normal (g02ga), nag_correg_glm_binomial (g02gb), nag_correg_glm_poisson (g02gc) or nag_correg_glm_gamma (g02gd).

## Syntax

[eta, seeta, pred, sepred, ifail] = g02gp(errfn, link, mean_p, x, isx, b, covar, vfobs, 'n', n, 'm', m, 'ip', ip, 't', t, 'off', off, 'wt', wt, 's', s, 'a', a)
[eta, seeta, pred, sepred, ifail] = nag_correg_glm_predict(errfn, link, mean_p, x, isx, b, covar, vfobs, 'n', n, 'm', m, 'ip', ip, 't', t, 'off', off, 'wt', wt, 's', s, 'a', a)
Note: the interface to this routine has changed since earlier releases of the toolbox:
 At Mark 23: wt, off, s and a were made optional; weight and offset were removed from the interface; t was made optional (default to vector of 1s)

## Description

A generalized linear model consists of the following elements:
 (i) A suitable distribution for the dependent variable $y$. (ii) A linear model, with linear predictor $\eta =X\beta$, where $X$ is a matrix of independent variables and $\beta$ a column vector of $p$ parameters. (iii) A link function $g\left(.\right)$ between the expected value of $y$ and the linear predictor, that is $E\left(y\right)=\mu =g\left(\eta \right)$.
In order to predict from a generalized linear model, that is estimate a value for the dependent variable, $y$, given a set of independent variables $X$, the matrix $X$ must be supplied, along with values for the parameters $\beta$ and their associated variance-covariance matrix, $C$. Suitable values for $\beta$ and $C$ are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example nag_correg_glm_normal (g02ga), nag_correg_glm_binomial (g02gb), nag_correg_glm_poisson (g02gc) or nag_correg_glm_gamma (g02gd). The predicted variable, and its standard error can then be obtained from:
 $y^ = g-1η , se y^ = δg-1x δx η seη + Ifobs Vary$
where
 $η=o+Xβ , seη = diag⁡XCXT ,$
$o$ is a vector of offsets and ${I}_{\mathrm{fobs}}=0$, if the variance of future observations is not taken into account, and $1$ otherwise. Here $\mathrm{diag}A$ indicates the diagonal elements of matrix $A$.
If required, the variance for the $i$th future observation, $\mathrm{Var}\left({y}_{i}\right)$, can be calculated as:
 $Varyi = ϕ Vθ wi$
where ${w}_{i}$ is a weight, $\varphi$ is the scale (or dispersion) parameter, and $V\left(\theta \right)$ is the variance function. Both the scale parameter and the variance function depend on the distribution used for the $y$, with:
 Poisson $V\left(\theta \right)={\mu }_{i}$, $\varphi =1$ binomial $V\left(\theta \right)=\frac{{\mu }_{i}\left({t}_{i}-{\mu }_{i}\right)}{{t}_{i}}$, $\varphi =1$ Normal $V\left(\theta \right)=1$ gamma $V\left(\theta \right)={\mu }_{i}^{2}$
In the cases of a Normal and gamma error structure, the scale parameter ($\varphi$), is supplied by you. This value is usually obtained from the function used to fit the prediction model. In many cases, for a Normal error structure, $\varphi ={\stackrel{^}{\sigma }}^{2}$, i.e., the estimated variance.

## References

McCullagh P and Nelder J A (1983) Generalized Linear Models Chapman and Hall

## Parameters

### Compulsory Input Parameters

1:     $\mathrm{errfn}$ – string (length ≥ 1)
Indicates the distribution used to model the dependent variable, $y$.
${\mathbf{errfn}}=\text{'B'}$
The binomial distribution is used.
${\mathbf{errfn}}=\text{'G'}$
The gamma distribution is used.
${\mathbf{errfn}}=\text{'N'}$
The Normal (Gaussian) distribution is used.
${\mathbf{errfn}}=\text{'P'}$
The Poisson distribution is used.
Constraint: ${\mathbf{errfn}}=\text{'B'}$, $\text{'G'}$, $\text{'N'}$ or $\text{'P'}$.
Indicates which link function to be used.
${\mathbf{link}}=\text{'C'}$
A complementary log-log link is used.
${\mathbf{link}}=\text{'E'}$
${\mathbf{link}}=\text{'G'}$
${\mathbf{link}}=\text{'I'}$
${\mathbf{link}}=\text{'L'}$
${\mathbf{link}}=\text{'P'}$
${\mathbf{link}}=\text{'R'}$
${\mathbf{link}}=\text{'S'}$
A square root link is used.
Details on the functional form of the different links can be found in the G02 Chapter Introduction.
Constraints:
• if ${\mathbf{errfn}}=\text{'B'}$, ${\mathbf{link}}=\text{'C'}$, $\text{'G'}$ or $\text{'P'}$;
• otherwise ${\mathbf{link}}=\text{'E'}$, $\text{'I'}$, $\text{'L'}$, $\text{'R'}$ or $\text{'S'}$.
3:     $\mathrm{mean_p}$ – string (length ≥ 1)
Indicates if a mean term is to be included.
${\mathbf{mean_p}}=\text{'M'}$
A mean term, intercept, will be included in the model.
${\mathbf{mean_p}}=\text{'Z'}$
The model will pass through the origin, zero-point.
Constraint: ${\mathbf{mean_p}}=\text{'M'}$ or $\text{'Z'}$.
4:     $\mathrm{x}\left(\mathit{ldx},:\right)$ – double array
The first dimension of the array x must be at least ${\mathbf{n}}$.
The second dimension of the array x must be at least ${\mathbf{m}}$.
${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th independent variable, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
5:     $\mathrm{isx}\left({\mathbf{m}}\right)$int64int32nag_int array
Indicates which independent variables are to be included in the model.
If ${\mathbf{isx}}\left(j\right)>0$, the $j$th independent variable is included in the regression model.
Constraints:
• ${\mathbf{isx}}\left(j\right)\ge 0$, for $\mathit{i}=1,2,\dots ,{\mathbf{m}}$;
• if ${\mathbf{mean_p}}=\text{'M'}$, exactly ${\mathbf{ip}}-1$ values of isx must be $\text{}>0$;
• if ${\mathbf{mean_p}}=\text{'Z'}$, exactly ip values of isx must be $\text{}>0$.
6:     $\mathrm{b}\left({\mathbf{ip}}\right)$ – double array
The model parameters, $\beta$.
If ${\mathbf{mean_p}}=\text{'M'}$, ${\mathbf{b}}\left(1\right)$ must contain the mean parameter and ${\mathbf{b}}\left(i+1\right)$ the coefficient of the variable contained in the $j$th independent x, where ${\mathbf{isx}}\left(j\right)$ is the $i$th positive value in the array isx.
If ${\mathbf{mean_p}}=\text{'Z'}$, ${\mathbf{b}}\left(i\right)$ must contain the coefficient of the variable contained in the $j$th independent x, where ${\mathbf{isx}}\left(j\right)$ is the $i$th positive value in the array isx.
7:     $\mathrm{covar}\left({\mathbf{ip}}×\left({\mathbf{ip}}+1\right)/2\right)$ – double array
The upper triangular part of the variance-covariance matrix, $C$, of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters ${\beta }_{i}$ and ${\beta }_{j}$, that is the values stored in ${\mathbf{b}}\left(i\right)$ and ${\mathbf{b}}\left(j\right)$, should be supplied in ${\mathbf{covar}}\left(\mathit{j}×\left(\mathit{j}-1\right)/2+\mathit{i}\right)$, for $\mathit{i}=1,2,\dots ,{\mathbf{ip}}$ and $\mathit{j}=\mathit{i},\dots ,{\mathbf{ip}}$.
Constraint: the matrix represented in covar must be a valid variance-covariance matrix.
8:     $\mathrm{vfobs}$ – logical scalar
If ${\mathbf{vfobs}}=\mathit{true}$, the variance of future observations is included in the standard error of the predicted variable (i.e., ${I}_{\mathrm{fobs}}=1$), otherwise ${I}_{\mathrm{fobs}}=0$.

### Optional Input Parameters

1:     $\mathrm{n}$int64int32nag_int scalar
Default: the dimension of the arrays t, off, wt and the first dimension of the array x. (An error is raised if these dimensions are not equal.)
$n$, the number of observations.
Constraint: ${\mathbf{n}}\ge 1$.
2:     $\mathrm{m}$int64int32nag_int scalar
Default: the dimension of the array isx and the second dimension of the array x. (An error is raised if these dimensions are not equal.)
$m$, the total number of independent variables.
Constraint: ${\mathbf{m}}\ge 1$.
3:     $\mathrm{ip}$int64int32nag_int scalar
Default: the dimension of the array b.
The number of independent variables in the model, including the mean or intercept if present.
Constraint: ${\mathbf{ip}}>0$.
4:     $\mathrm{t}\left(:\right)$ – double array
The dimension of the array must be at least ${\mathbf{n}}$ if ${\mathbf{errfn}}=\text{'B'}$, and at least $1$ otherwise
If ${\mathbf{errfn}}=\text{'B'}$, ${\mathbf{t}}\left(i\right)$ must contain the binomial denominator, ${t}_{i}$, for the $i$th observation.
Otherwise t is not referenced.
Constraint: if ${\mathbf{errfn}}=\text{'B'}$, ${\mathbf{t}}\left(\mathit{i}\right)\ge 0.0$, for $\mathit{i}=1,2,\dots ,n$.
5:     $\mathrm{off}\left(:\right)$ – double array
The dimension of the array must be at least ${\mathbf{n}}$ if $\mathit{offset}=\text{'Y'}$, and at least $1$ otherwise
If $\mathit{offset}=\text{'Y'}$, ${\mathbf{off}}\left(i\right)$ must contain the offset ${o}_{i}$, for the $i$th observation.
Otherwise off is not referenced.
6:     $\mathrm{wt}\left(:\right)$ – double array
The dimension of the array must be at least ${\mathbf{n}}$ if $\mathit{weight}=\text{'W'}$ and ${\mathbf{vfobs}}=\mathit{true}$, and at least $1$ otherwise
If $\mathit{weight}=\text{'W'}$ and ${\mathbf{vfobs}}=\mathit{true}$, ${\mathbf{wt}}\left(i\right)$ must contain the weight, ${w}_{i}$, for the $i$th observation.
If the variance of future observations is not included in the standard error of the predicted variable, wt is not referenced.
Constraint: if ${\mathbf{vfobs}}=\mathit{true}$ and $\mathit{weight}=\text{'W'}$, ${\mathbf{wt}}\left(\mathit{i}\right)\ge 0$., for $\mathit{i}=1,2,\dots ,\mathit{i}$.
7:     $\mathrm{s}$ – double scalar
Default: $0$
If ${\mathbf{errfn}}=\text{'N'}$ or $\text{'G'}$ and ${\mathbf{vfobs}}=\mathit{true}$, the scale parameter, $\varphi$.
Otherwise s is not referenced and $\varphi =1$.
Constraint: if ${\mathbf{errfn}}=\text{'N'}$ or $\text{'G'}$ and ${\mathbf{vfobs}}=\mathit{true}$, ${\mathbf{s}}>0.0$.
8:     $\mathrm{a}$ – double scalar
Default: $0$
If ${\mathbf{link}}=\text{'E'}$, a must contain the power of the exponential.
If ${\mathbf{link}}\ne \text{'E'}$, a is not referenced.
Constraint: if ${\mathbf{link}}=\text{'E'}$, ${\mathbf{a}}\ne 0.0$.

### Output Parameters

1:     $\mathrm{eta}\left({\mathbf{n}}\right)$ – double array
The linear predictor, $\eta$.
2:     $\mathrm{seeta}\left({\mathbf{n}}\right)$ – double array
The standard error of the linear predictor, $\mathrm{se}\left(\eta \right)$.
3:     $\mathrm{pred}\left({\mathbf{n}}\right)$ – double array
The predicted value, $\stackrel{^}{y}$.
4:     $\mathrm{sepred}\left({\mathbf{n}}\right)$ – double array
The standard error of the predicted value, $\mathrm{se}\left(\stackrel{^}{y}\right)$. If ${\mathbf{pred}}\left(i\right)$ could not be calculated, then nag_correg_glm_predict (g02gp) returns ${\mathbf{ifail}}={\mathbf{22}}$, and ${\mathbf{sepred}}\left(i\right)$ is set to $-99.0$.
5:     $\mathrm{ifail}$int64int32nag_int scalar
${\mathbf{ifail}}={\mathbf{0}}$ unless the function detects an error (see Error Indicators and Warnings).

## Error Indicators and Warnings

Note: nag_correg_glm_predict (g02gp) may return useful information for one or more of the following detected errors or warnings.
Errors or warnings detected by the function:

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

${\mathbf{ifail}}=1$
On entry, errfn is invalid.
${\mathbf{ifail}}=2$
On entry, errfn and link combination is invalid.
${\mathbf{ifail}}=3$
On entry, mean_p is invalid.
${\mathbf{ifail}}=4$
On entry, offset is invalid.
${\mathbf{ifail}}=5$
On entry, weight is invalid.
${\mathbf{ifail}}=6$
Constraint: ${\mathbf{n}}\ge 1$.
${\mathbf{ifail}}=8$
Constraint: $\mathit{ldx}\ge {\mathbf{n}}$.
${\mathbf{ifail}}=9$
Constraint: ${\mathbf{m}}\ge 1$.
${\mathbf{ifail}}=10$
On entry, isx not consistent with ip.
${\mathbf{ifail}}=11$
Constraint: ${\mathbf{ip}}>0$.
${\mathbf{ifail}}=12$
Constraint: ${\mathbf{t}}\left(i\right)\ge 0.0$, for all $i$.
${\mathbf{ifail}}=14$
Constraint: ${\mathbf{wt}}\left(i\right)\ge 0.0$, for all $i$.
${\mathbf{ifail}}=15$
Constraint: ${\mathbf{s}}>0.0$.
${\mathbf{ifail}}=16$
On entry, ${\mathbf{a}}=0.0$.
${\mathbf{ifail}}=18$
On entry, ${\mathbf{covar}}\left(i\right)<0.0$ for at least one diagonal element.
W  ${\mathbf{ifail}}=22$
At least one predicted value could not be calculated as required. sepred is set to $-99.0$ for affected predicted values.
${\mathbf{ifail}}=-99$
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.

Not applicable.

None.

## Example

The model
 $y = 1 β1 + β2 x + ε$
is fitted to a training dataset with five observations. The resulting model is then used to predict the response for two new observations.
```function g02gp_example

fprintf('g02gp example results\n\n');

x = [ 1;  2; 3; 4; 5];
y = [25; 10; 6; 4; 3];

isx = [int64(1)];
ip  = int64(2);

mean_p = 'M';
s      = 0;

% Fit generalized linear model, with Normal errors to training data
[s, rss, idf, b, irank, se, covar, v, ifail] = ...
g02ga( ...
link, mean_p, x, isx, ip, y, s);

% Display parameter estimates for training data
fprintf('\nResidual sum of squares =  %12.4e\n', rss);
fprintf('Degrees of freedom      =  %d\n', idf);
fprintf('\n      Estimate     Standard error\n');
for i = 1:ip
fprintf('%14.4f %14.4f\n', b(i), se(i));
end

% Prediction data
x = [32; 18];

% compute redicted values
errfn  = 'N';
vfobs = true;
[eta, seeta, pred, sepred, ifail] = ...
g02gp( ...
errfn, link, mean_p, x, isx, b, covar, vfobs, 's', s);

% Display predicted values
fprintf('\n  i      eta          se(eta)      predicted    se(predicted)\n');
for i = 1:ip
fprintf('%3d%13.5f%13.5f%13.5f%13.5f\n', i, eta(i), seeta(i), ...
pred(i), sepred(i));
end

```
```g02gp example results

Residual sum of squares =    3.8717e-01
Degrees of freedom      =  3

Estimate     Standard error
-0.0239         0.0028
0.0638         0.0026

i      eta          se(eta)      predicted    se(predicted)
1      2.01807      0.08168      0.49552      0.35981
2      1.12472      0.04476      0.88911      0.36098
```