g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

# NAG Library Function Documentnag_glm_predict (g02gpc)

## 1  Purpose

nag_glm_predict (g02gpc) allows prediction from a generalized linear model fit via nag_glm_normal (g02gac), nag_glm_binomial (g02gbc), nag_glm_poisson (g02gcc) or nag_glm_gamma (g02gdc).

## 2  Specification

 #include #include
 void nag_glm_predict (Nag_Distributions errfn, Nag_Link link, Nag_IncludeMean mean, Integer n, const double x[], Integer tdx, Integer m, const Integer sx[], Integer ip, const double binom_t[], const double offset[], const double wt[], double scale, double ex_power, const double b[], const double cov[], Nag_Boolean vfobs, double eta[], double seeta[], double pred[], double sepred[], NagError *fail)

## 3  Description

A generalized linear model consists of the following elements:
 (i) A suitable distribution for the dependent variable $y$. (ii) A linear model, with linear predictor $\eta =X\beta$, where $X$ is a matrix of independent variables and $\beta$ a column vector of $p$ parameters. (iii) A link function $g\left(.\right)$ between the expected value of $y$ and the linear predictor, that is $E\left(y\right)=\mu =g\left(\eta \right)$.
In order to predict from a generalized linear model, that is estimate a value for the dependent variable, $y$, given a set of independent variables $X$, the matrix $X$ must be supplied, along with values for the parameters $\beta$ and their associated variance-covariance matrix, $C$. Suitable values for $\beta$ and $C$ are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example nag_glm_normal (g02gac), nag_glm_binomial (g02gbc), nag_glm_poisson (g02gcc) or nag_glm_gamma (g02gdc). The predicted variable, and its standard error can then be obtained from:
 $y^ = g-1η , se y^ = δg-1x δx η seη + Ifobs Vary$
where
 $η=o+Xβ , seη = diag⁡XCXT ,$
$o$ is a vector of offsets and ${I}_{\mathrm{fobs}}=0$, if the variance of future observations is not taken into account, and $1$ otherwise. Here $\mathrm{diag}A$ indicates the diagonal elements of matrix $A$.
If required, the variance for the $i$th future observation, $\mathrm{Var}\left({y}_{i}\right)$, can be calculated as:
 $Varyi = ϕ Vθ wi$
where ${w}_{i}$ is a weight, $\varphi$ is the scale (or dispersion) parameter, and $V\left(\theta \right)$ is the variance function. Both the scale parameter and the variance function depend on the distribution used for the $y$, with:
 Poisson $V\left(\theta \right)={\mu }_{i}$, $\varphi =1$ binomial $V\left(\theta \right)=\frac{{\mu }_{i}\left({t}_{i}-{\mu }_{i}\right)}{{t}_{i}}$, $\varphi =1$ Normal $V\left(\theta \right)=1$ gamma $V\left(\theta \right)={\mu }_{i}^{2}$
In the cases of a Normal and gamma error structure, the scale parameter ($\varphi$), is supplied by you. This value is usually obtained from the function used to fit the prediction model. In many cases, for a Normal error structure, $\varphi ={\stackrel{^}{\sigma }}^{2}$, i.e., the estimated variance.

## 4  References

McCullagh P and Nelder J A (1983) Generalized Linear Models Chapman and Hall

## 5  Arguments

1:     errfnNag_DistributionsInput
On entry: indicates the distribution used to model the dependent variable, $y$.
${\mathbf{errfn}}=\mathrm{Nag_Binomial}$
The binomial distribution is used.
${\mathbf{errfn}}=\mathrm{Nag_Gamma}$
The gamma distribution is used.
${\mathbf{errfn}}=\mathrm{Nag_Normal}$
The Normal (Gaussian) distribution is used.
${\mathbf{errfn}}=\mathrm{Nag_Poisson}$
The Poisson distribution is used.
Constraint: ${\mathbf{errfn}}=\mathrm{Nag_Binomial}$, $\mathrm{Nag_Gamma}$, $\mathrm{Nag_Normal}$ or $\mathrm{Nag_Poisson}$.
On entry: indicates which link function to be used.
${\mathbf{link}}=\mathrm{Nag_Compl}$
A complementary log-log link is used.
${\mathbf{link}}=\mathrm{Nag_Expo}$
${\mathbf{link}}=\mathrm{Nag_Logistic}$
${\mathbf{link}}=\mathrm{Nag_Iden}$
${\mathbf{link}}=\mathrm{Nag_Log}$
${\mathbf{link}}=\mathrm{Nag_Probit}$
${\mathbf{link}}=\mathrm{Nag_Reci}$
${\mathbf{link}}=\mathrm{Nag_Sqrt}$
A square root link is used.
Details on the functional form of the different links can be found in the g02 Chapter Introduction.
Constraints:
• if ${\mathbf{errfn}}=\mathrm{Nag_Binomial}$, ${\mathbf{link}}=\mathrm{Nag_Compl}$, $\mathrm{Nag_Logistic}$ or $\mathrm{Nag_Probit}$;
• otherwise ${\mathbf{link}}=\mathrm{Nag_Expo}$, $\mathrm{Nag_Iden}$, $\mathrm{Nag_Log}$, $\mathrm{Nag_Reci}$ or $\mathrm{Nag_Sqrt}$.
3:     meanNag_IncludeMeanInput
On entry: indicates if a mean term is to be included.
${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$
A mean term, intercept, will be included in the model.
${\mathbf{mean}}=\mathrm{Nag_MeanZero}$
The model will pass through the origin, zero-point.
Constraint: ${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$ or $\mathrm{Nag_MeanZero}$.
4:     nIntegerInput
On entry: $n$, the number of observations.
Constraint: ${\mathbf{n}}\ge 1$.
5:     x[${\mathbf{n}}×{\mathbf{tdx}}$]const doubleInput
On entry: ${\mathbf{x}}\left[\left(\mathit{i}-1\right)×{\mathbf{tdx}}+\mathit{j}-1\right]$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th independent variable, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.
6:     tdxIntegerInput
On entry: the stride separating matrix column elements in the array x.
Constraint: ${\mathbf{tdx}}\ge {\mathbf{m}}$.
7:     mIntegerInput
On entry: $m$, the total number of independent variables.
Constraint: ${\mathbf{m}}\ge 1$.
8:     sx[m]const IntegerInput
On entry: indicates which independent variables are to be included in the model.
If ${\mathbf{sx}}\left[j-1\right]>0$, the $j$th independent variable is included in the regression model.
Constraints:
• ${\mathbf{sx}}\left[j-1\right]\ge 0$, for $\mathit{i}=1,2,\dots ,{\mathbf{m}}$;
• if ${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$, exactly ${\mathbf{ip}}-1$ values of sx must be $\text{}>0$;
• if ${\mathbf{mean}}=\mathrm{Nag_MeanZero}$, exactly ip values of sx must be $\text{}>0$.
9:     ipIntegerInput
On entry: the number of independent variables in the model, including the mean or intercept if present.
Constraint: ${\mathbf{ip}}>0$.
10:   binom_t[n]const doubleInput
On entry: if ${\mathbf{errfn}}=\mathrm{Nag_Binomial}$, ${\mathbf{binom_t}}\left[i-1\right]$ must contain the binomial denominator, ${t}_{i}$, for the $i$th observation.
Otherwise binom_t is not referenced.
Constraint: if ${\mathbf{errfn}}=\mathrm{Nag_Binomial}$, ${\mathbf{binom_t}}\left[\mathit{i}-1\right]\ge 0.0$, for $\mathit{i}=1,2,\dots ,n$.
11:   offset[n]const doubleInput
On entry: if an offset is required then ${\mathbf{offset}}\left[i-1\right]$ must contain the value of the offset ${o}_{i}$, for the $i$th observation. Otherwise offset must be supplied as the null pointer, (double *)0.
12:   wt[n]const doubleInput
On entry: if weighted estimates are required then ${\mathbf{wt}}\left[i-1\right]$ must contain the weight, ${\omega }_{i}$ for the $i$th observation. Otherwise wt must be supplied as the null pointer, (double *)0.
If ${\mathbf{wt}}\left[i-1\right]=0.0$, then the $i$th observation is not included in the model, in which case the effective number of observations is the number of observations with positive weights.
If ${\mathbf{wt}}=\text{}$ null pointer, then the effective number of observations is $n$.
If the variance of future observations is not included in the standard error of the predicted variable, wt is not referenced.
Constraint: if  and ${\mathbf{vfobs}}=\mathrm{Nag_TRUE}$, ${\mathbf{wt}}\left[\mathit{i}-1\right]\ge 0.0$, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$.
13:   scaledoubleInput
On entry: if ${\mathbf{errfn}}=\mathrm{Nag_Normal}$ or $\mathrm{Nag_Gamma}$ and ${\mathbf{vfobs}}=\mathrm{Nag_TRUE}$, the scale parameter, $\varphi$.
Otherwise scale is not referenced and $\varphi =1$.
Constraint: if ${\mathbf{errfn}}=\mathrm{Nag_Normal}$ or $\mathrm{Nag_Gamma}$ and ${\mathbf{vfobs}}=\mathrm{Nag_TRUE}$, ${\mathbf{scale}}>0.0$.
14:   ex_powerdoubleInput
On entry: if ${\mathbf{link}}=\mathrm{Nag_Expo}$, ex_power must contain the power of the exponential.
If ${\mathbf{link}}\ne \mathrm{Nag_Expo}$, ex_power is not referenced.
Constraint: if ${\mathbf{link}}=\mathrm{Nag_Expo}$, ${\mathbf{ex_power}}\ne 0.0$.
15:   b[ip]const doubleInput
On entry: the model parameters, $\beta$.
If ${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$, ${\mathbf{b}}\left[0\right]$ must contain the mean parameter and ${\mathbf{b}}\left[i\right]$ the coefficient of the variable contained in the $j$th independent x, where ${\mathbf{sx}}\left[j-1\right]$ is the $i$th positive value in the array sx.
If ${\mathbf{mean}}=\mathrm{Nag_MeanZero}$, ${\mathbf{b}}\left[i-1\right]$ must contain the coefficient of the variable contained in the $j$th independent x, where ${\mathbf{sx}}\left[j-1\right]$ is the $i$th positive value in the array sx.
16:   cov[${\mathbf{ip}}×\left({\mathbf{ip}}+1\right)/2$]const doubleInput
On entry: the upper triangular part of the variance-covariance matrix, $C$, of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters ${\beta }_{i}$ and ${\beta }_{j}$, that is the values stored in ${\mathbf{b}}\left[i-1\right]$ and ${\mathbf{b}}\left[j-1\right]$, should be supplied in ${\mathbf{cov}}\left[\mathit{j}×\left(\mathit{j}-1\right)/2+\mathit{i}-1\right]$, for $\mathit{i}=1,2,\dots ,{\mathbf{ip}}$ and $\mathit{j}=\mathit{i},\dots ,{\mathbf{ip}}$.
Constraint: the matrix represented in cov must be a valid variance-covariance matrix.
17:   vfobsNag_BooleanInput
On entry: if ${\mathbf{vfobs}}=\mathrm{Nag_TRUE}$, the variance of future observations is included in the standard error of the predicted variable (i.e., ${I}_{\mathrm{fobs}}=1$), otherwise ${I}_{\mathrm{fobs}}=0$.
18:   eta[n]doubleOutput
On exit: the linear predictor, $\eta$.
19:   seeta[n]doubleOutput
On exit: the standard error of the linear predictor, $\mathrm{se}\left(\eta \right)$.
20:   pred[n]doubleOutput
On exit: the predicted value, $\stackrel{^}{y}$.
21:   sepred[n]doubleOutput
On exit: the standard error of the predicted value, $\mathrm{se}\left(\stackrel{^}{y}\right)$. If ${\mathbf{pred}}\left[i-1\right]$ could not be calculated, then nag_glm_predict (g02gpc) returns NE_INVALID_PRED, and ${\mathbf{sepred}}\left[i-1\right]$ is set to $-99.0$.
22:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

## 6  Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
On entry, argument $〈\mathit{\text{value}}〉$ had an illegal value.
On entry, the error type and link function combination supplied is invalid.
NE_INT
On entry, ${\mathbf{ip}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ip}}>0$.
On entry, ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{m}}\ge 1$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 1$.
NE_INT_2
On entry, ${\mathbf{tdx}}=〈\mathit{\text{value}}〉$ and ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{tdx}}\ge {\mathbf{m}}$.
NE_INT_ARRAY_CONS
On entry, sx not consistent with ip: $〈\mathit{\text{value}}〉$ values $>0$, expected $〈\mathit{\text{value}}〉$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_INVALID_PRED
At least one predicted value could not be calculated as required. sepred is set to $-99.0$ for affected predicted values.
NE_REAL
On entry, ${\mathbf{ex_power}}=0.0$.
On entry, ${\mathbf{scale}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{scale}}>0.0$.
NE_REAL_ARRAY_CONS
On entry, ${\mathbf{cov}}\left[i-1\right]<0.0$ for at least one diagonal element: $i=〈\mathit{\text{value}}〉$, ${\mathbf{cov}}\left[i-1\right]=〈\mathit{\text{value}}〉$.
On entry, $i=〈\mathit{\text{value}}〉$ and ${\mathbf{binom_t}}\left[i-1\right]=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{binom_t}}\left[i-1\right]\ge 0.0$, for all $i$.
On entry, $i=〈\mathit{\text{value}}〉$ and ${\mathbf{wt}}\left[i-1\right]=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{wt}}\left[i-1\right]\ge 0.0$, for all $i$.

Not applicable.

None.

## 9  Example

The model
 $y = 1 β1 + β2 x + ε$
is fitted to a training dataset with five observations. The resulting model is then used to predict the response for two new observations.

### 9.1  Program Text

Program Text (g02gpce.c)

### 9.2  Program Data

Program Data (g02gpce.d)

### 9.3  Program Results

Program Results (g02gpce.r)