g02 Chapter Contents
g02 Chapter Introduction
NAG C Library Manual

NAG Library Function Documentnag_glm_predict (g02gpc)

1  Purpose

nag_glm_predict (g02gpc) allows prediction from a generalized linear model fit via nag_glm_normal (g02gac), nag_glm_binomial (g02gbc), nag_glm_poisson (g02gcc) or nag_glm_gamma (g02gdc).

2  Specification

 #include #include
 void nag_glm_predict (Nag_Distributions errfn, Nag_Link link, Nag_IncludeMean mean, Integer n, const double x[], Integer tdx, Integer m, const Integer sx[], Integer ip, const double binom_t[], const double offset[], const double wt[], double scale, double ex_power, const double b[], const double cov[], Nag_Boolean vfobs, double eta[], double seeta[], double pred[], double sepred[], NagError *fail)

3  Description

A generalized linear model consists of the following elements:
 (i) A suitable distribution for the dependent variable $y$. (ii) A linear model, with linear predictor $\eta =X\beta$, where $X$ is a matrix of independent variables and $\beta$ a column vector of $p$ parameters. (iii) A link function $g\left(.\right)$ between the expected value of $y$ and the linear predictor, that is $E\left(y\right)=\mu =g\left(\eta \right)$.
In order to predict from a generalized linear model, that is estimate a value for the dependent variable, $y$, given a set of independent variables $X$, the matrix $X$ must be supplied, along with values for the parameters $\beta$ and their associated variance-covariance matrix, $C$. Suitable values for $\beta$ and $C$ are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example nag_glm_normal (g02gac), nag_glm_binomial (g02gbc), nag_glm_poisson (g02gcc) or nag_glm_gamma (g02gdc). The predicted variable, and its standard error can then be obtained from:
 $y^ = g-1η , se y^ = δg-1x δx η seη + Ifobs Vary$
where
 $η=o+Xβ , seη = diag⁡XCXT ,$
$o$ is a vector of offsets and ${I}_{\mathrm{fobs}}=0$, if the variance of future observations is not taken into account, and $1$ otherwise. Here $\mathrm{diag}A$ indicates the diagonal elements of matrix $A$.
If required, the variance for the $i$th future observation, $\mathrm{Var}\left({y}_{i}\right)$, can be calculated as:
 $Varyi = ϕ Vθ wi$
where ${w}_{i}$ is a weight, $\varphi$ is the scale (or dispersion) parameter, and $V\left(\theta \right)$ is the variance function. Both the scale parameter and the variance function depend on the distribution used for the $y$, with:
 Poisson $V\left(\theta \right)={\mu }_{i}$, $\varphi =1$ binomial $V\left(\theta \right)=\frac{{\mu }_{i}\left({t}_{i}-{\mu }_{i}\right)}{{t}_{i}}$, $\varphi =1$ Normal $V\left(\theta \right)=1$ gamma $V\left(\theta \right)={\mu }_{i}^{2}$
In the cases of a Normal and gamma error structure, the scale parameter ($\varphi$), is supplied by you. This value is usually obtained from the function used to fit the prediction model. In many cases, for a Normal error structure, $\varphi ={\stackrel{^}{\sigma }}^{2}$, i.e., the estimated variance.

4  References

McCullagh P and Nelder J A (1983) Generalized Linear Models Chapman and Hall

5  Arguments

1:     errfnNag_DistributionsInput
On entry: indicates the distribution used to model the dependent variable, $y$.
${\mathbf{errfn}}=\mathrm{Nag_Binomial}$
The binomial distribution is used.
${\mathbf{errfn}}=\mathrm{Nag_Gamma}$
The gamma distribution is used.
${\mathbf{errfn}}=\mathrm{Nag_Normal}$
The Normal (Gaussian) distribution is used.
${\mathbf{errfn}}=\mathrm{Nag_Poisson}$
The Poisson distribution is used.
Constraint: ${\mathbf{errfn}}=\mathrm{Nag_Binomial}$, $\mathrm{Nag_Gamma}$, $\mathrm{Nag_Normal}$ or $\mathrm{Nag_Poisson}$.
On entry: indicates which link function to be used.
${\mathbf{link}}=\mathrm{Nag_Compl}$
A complementary log-log link is used.
${\mathbf{link}}=\mathrm{Nag_Expo}$
An exponent link is used.
${\mathbf{link}}=\mathrm{Nag_Logistic}$
A logistic link is used.
${\mathbf{link}}=\mathrm{Nag_Iden}$
An identity link is used.
${\mathbf{link}}=\mathrm{Nag_Log}$
A log link is used.
${\mathbf{link}}=\mathrm{Nag_Probit}$
A probit link is used.
${\mathbf{link}}=\mathrm{Nag_Reci}$
A reciprocal link is used.
${\mathbf{link}}=\mathrm{Nag_Sqrt}$
A square root link is used.
Details on the functional form of the different links can be found in the g02 Chapter Introduction.
Constraints:
• if ${\mathbf{errfn}}=\mathrm{Nag_Binomial}$, ${\mathbf{link}}=\mathrm{Nag_Compl}$, $\mathrm{Nag_Logistic}$ or $\mathrm{Nag_Probit}$;
• otherwise ${\mathbf{link}}=\mathrm{Nag_Expo}$, $\mathrm{Nag_Iden}$, $\mathrm{Nag_Log}$, $\mathrm{Nag_Reci}$ or $\mathrm{Nag_Sqrt}$.
3:     meanNag_IncludeMeanInput
On entry: indicates if a mean term is to be included.
${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$
A mean term, intercept, will be included in the model.
${\mathbf{mean}}=\mathrm{Nag_MeanZero}$
The model will pass through the origin, zero-point.
Constraint: ${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$ or $\mathrm{Nag_MeanZero}$.
4:     nIntegerInput
On entry: $n$, the number of observations.
Constraint: ${\mathbf{n}}\ge 1$.
5:     x[${\mathbf{n}}×{\mathbf{tdx}}$]const doubleInput
On entry: ${\mathbf{x}}\left[\left(\mathit{i}-1\right)×{\mathbf{tdx}}+\mathit{j}-1\right]$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th independent variable, for $\mathit{i}=1,2,\dots ,n$ and $\mathit{j}=1,2,\dots ,m$.
6:     tdxIntegerInput
On entry: the stride separating matrix column elements in the array x.
Constraint: ${\mathbf{tdx}}\ge {\mathbf{m}}$.
7:     mIntegerInput
On entry: $m$, the total number of independent variables.
Constraint: ${\mathbf{m}}\ge 1$.
8:     sx[m]const IntegerInput
On entry: indicates which independent variables are to be included in the model.
If ${\mathbf{sx}}\left[j-1\right]>0$, the $j$th independent variable is included in the regression model.
Constraints:
• ${\mathbf{sx}}\left[j-1\right]\ge 0$, for $\mathit{i}=1,2,\dots ,{\mathbf{m}}$;
• if ${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$, exactly ${\mathbf{ip}}-1$ values of sx must be $\text{}>0$;
• if ${\mathbf{mean}}=\mathrm{Nag_MeanZero}$, exactly ip values of sx must be $\text{}>0$.
9:     ipIntegerInput
On entry: the number of independent variables in the model, including the mean or intercept if present.
Constraint: ${\mathbf{ip}}>0$.
10:   binom_t[n]const doubleInput
On entry: if ${\mathbf{errfn}}=\mathrm{Nag_Binomial}$, ${\mathbf{binom_t}}\left[i-1\right]$ must contain the binomial denominator, ${t}_{i}$, for the $i$th observation.
Otherwise binom_t is not referenced.
Constraint: if ${\mathbf{errfn}}=\mathrm{Nag_Binomial}$, ${\mathbf{binom_t}}\left[\mathit{i}-1\right]\ge 0.0$, for $\mathit{i}=1,2,\dots ,n$.
11:   offset[n]const doubleInput
On entry: if an offset is required then ${\mathbf{offset}}\left[i-1\right]$ must contain the value of the offset ${o}_{i}$, for the $i$th observation. Otherwise offset must be supplied as the null pointer, (double *)0.
12:   wt[n]const doubleInput
On entry: if weighted estimates are required then ${\mathbf{wt}}\left[i-1\right]$ must contain the weight, ${\omega }_{i}$ for the $i$th observation. Otherwise wt must be supplied as the null pointer, (double *)0.
If ${\mathbf{wt}}\left[i-1\right]=0.0$, then the $i$th observation is not included in the model, in which case the effective number of observations is the number of observations with positive weights.
If ${\mathbf{wt}}=\text{}$ null pointer, then the effective number of observations is $n$.
If the variance of future observations is not included in the standard error of the predicted variable, wt is not referenced.
Constraint: if  and ${\mathbf{vfobs}}=\mathrm{Nag_TRUE}$, ${\mathbf{wt}}\left[\mathit{i}-1\right]\ge 0.0$, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$.
13:   scaledoubleInput
On entry: if ${\mathbf{errfn}}=\mathrm{Nag_Normal}$ or $\mathrm{Nag_Gamma}$ and ${\mathbf{vfobs}}=\mathrm{Nag_TRUE}$, the scale parameter, $\varphi$.
Otherwise scale is not referenced and $\varphi =1$.
Constraint: if ${\mathbf{errfn}}=\mathrm{Nag_Normal}$ or $\mathrm{Nag_Gamma}$ and ${\mathbf{vfobs}}=\mathrm{Nag_TRUE}$, ${\mathbf{scale}}>0.0$.
14:   ex_powerdoubleInput
On entry: if ${\mathbf{link}}=\mathrm{Nag_Expo}$, ex_power must contain the power of the exponential.
If ${\mathbf{link}}\ne \mathrm{Nag_Expo}$, ex_power is not referenced.
Constraint: if ${\mathbf{link}}=\mathrm{Nag_Expo}$, ${\mathbf{ex_power}}\ne 0.0$.
15:   b[ip]const doubleInput
On entry: the model parameters, $\beta$.
If ${\mathbf{mean}}=\mathrm{Nag_MeanInclude}$, ${\mathbf{b}}\left[0\right]$ must contain the mean parameter and ${\mathbf{b}}\left[i\right]$ the coefficient of the variable contained in the $j$th independent x, where ${\mathbf{sx}}\left[j-1\right]$ is the $i$th positive value in the array sx.
If ${\mathbf{mean}}=\mathrm{Nag_MeanZero}$, ${\mathbf{b}}\left[i-1\right]$ must contain the coefficient of the variable contained in the $j$th independent x, where ${\mathbf{sx}}\left[j-1\right]$ is the $i$th positive value in the array sx.
16:   cov[${\mathbf{ip}}×\left({\mathbf{ip}}+1\right)/2$]const doubleInput
On entry: the upper triangular part of the variance-covariance matrix, $C$, of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters ${\beta }_{i}$ and ${\beta }_{j}$, that is the values stored in ${\mathbf{b}}\left[i-1\right]$ and ${\mathbf{b}}\left[j-1\right]$, should be supplied in ${\mathbf{cov}}\left[\mathit{j}×\left(\mathit{j}-1\right)/2+\mathit{i}-1\right]$, for $\mathit{i}=1,2,\dots ,{\mathbf{ip}}$ and $\mathit{j}=\mathit{i},\dots ,{\mathbf{ip}}$.
Constraint: the matrix represented in cov must be a valid variance-covariance matrix.
17:   vfobsNag_BooleanInput
On entry: if ${\mathbf{vfobs}}=\mathrm{Nag_TRUE}$, the variance of future observations is included in the standard error of the predicted variable (i.e., ${I}_{\mathrm{fobs}}=1$), otherwise ${I}_{\mathrm{fobs}}=0$.
18:   eta[n]doubleOutput
On exit: the linear predictor, $\eta$.
19:   seeta[n]doubleOutput
On exit: the standard error of the linear predictor, $\mathrm{se}\left(\eta \right)$.
20:   pred[n]doubleOutput
On exit: the predicted value, $\stackrel{^}{y}$.
21:   sepred[n]doubleOutput
On exit: the standard error of the predicted value, $\mathrm{se}\left(\stackrel{^}{y}\right)$. If ${\mathbf{pred}}\left[i-1\right]$ could not be calculated, then nag_glm_predict (g02gpc) returns NE_INVALID_PRED, and ${\mathbf{sepred}}\left[i-1\right]$ is set to $-99.0$.
22:   failNagError *Input/Output
The NAG error argument (see Section 3.6 in the Essential Introduction).

6  Error Indicators and Warnings

NE_ALLOC_FAIL
Dynamic memory allocation failed.
On entry, argument $〈\mathit{\text{value}}〉$ had an illegal value.
On entry, the error type and link function combination supplied is invalid.
NE_INT
On entry, ${\mathbf{ip}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{ip}}>0$.
On entry, ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{m}}\ge 1$.
On entry, ${\mathbf{n}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{n}}\ge 1$.
NE_INT_2
On entry, ${\mathbf{tdx}}=〈\mathit{\text{value}}〉$ and ${\mathbf{m}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{tdx}}\ge {\mathbf{m}}$.
NE_INT_ARRAY_CONS
On entry, sx not consistent with ip: $〈\mathit{\text{value}}〉$ values $>0$, expected $〈\mathit{\text{value}}〉$.
NE_INTERNAL_ERROR
An internal error has occurred in this function. Check the function call and any array sizes. If the call is correct then please contact NAG for assistance.
NE_INVALID_PRED
At least one predicted value could not be calculated as required. sepred is set to $-99.0$ for affected predicted values.
NE_REAL
On entry, ${\mathbf{ex_power}}=0.0$.
On entry, ${\mathbf{scale}}=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{scale}}>0.0$.
NE_REAL_ARRAY_CONS
On entry, ${\mathbf{cov}}\left[i-1\right]<0.0$ for at least one diagonal element: $i=〈\mathit{\text{value}}〉$, ${\mathbf{cov}}\left[i-1\right]=〈\mathit{\text{value}}〉$.
On entry, $i=〈\mathit{\text{value}}〉$ and ${\mathbf{binom_t}}\left[i-1\right]=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{binom_t}}\left[i-1\right]\ge 0.0$, for all $i$.
On entry, $i=〈\mathit{\text{value}}〉$ and ${\mathbf{wt}}\left[i-1\right]=〈\mathit{\text{value}}〉$.
Constraint: ${\mathbf{wt}}\left[i-1\right]\ge 0.0$, for all $i$.

Not applicable.

None.

9  Example

The model
 $y = 1 β1 + β2 x + ε$
is fitted to a training dataset with five observations. The resulting model is then used to predict the response for two new observations.

9.1  Program Text

Program Text (g02gpce.c)

9.2  Program Data

Program Data (g02gpce.d)

9.3  Program Results

Program Results (g02gpce.r)