NAG FL Interfaceg02gpf (glm_​predict)

▸▿ Contents

Settings help

FL Name Style:

FL Specification Language:

1Purpose

g02gpf allows prediction from a generalized linear model fit via g02gaf, g02gbf, g02gcf or g02gdf or a linear model fit via g02daf.

2Specification

Fortran Interface
 Subroutine g02gpf ( link, mean, n, x, ldx, m, isx, ip, t, off, wt, s, a, b, cov, eta, pred,
 Integer, Intent (In) :: n, ldx, m, isx(m), ip Integer, Intent (Inout) :: ifail Real (Kind=nag_wp), Intent (In) :: x(ldx,*), t(*), off(*), wt(*), s, a, b(ip), cov(ip*(ip+1)/2) Real (Kind=nag_wp), Intent (Out) :: eta(n), seeta(n), pred(n), sepred(n) Logical, Intent (In) :: vfobs Character (1), Intent (In) :: errfn, link, mean, offset, weight
#include <nag.h>
 void g02gpf_ (const char *errfn, const char *link, const char *mean, const char *offset, const char *weight, const Integer *n, const double x[], const Integer *ldx, const Integer *m, const Integer isx[], const Integer *ip, const double t[], const double off[], const double wt[], const double *s, const double *a, const double b[], const double cov[], const logical *vfobs, double eta[], double seeta[], double pred[], double sepred[], Integer *ifail, const Charlen length_errfn, const Charlen length_link, const Charlen length_mean, const Charlen length_offset, const Charlen length_weight)
The routine may be called by the names g02gpf or nagf_correg_glm_predict.

3Description

A generalized linear model consists of the following elements:
1. (i)A suitable distribution for the dependent variable $y$.
2. (ii)A linear model, with linear predictor $\eta =X\beta$, where $X$ is a matrix of independent variables and $\beta$ a column vector of $p$ parameters.
3. (iii)A link function $g\left(.\right)$ between the expected value of $y$ and the linear predictor, that is $E\left(y\right)=\mu =g\left(\eta \right)$.
In order to predict from a generalized linear model, that is estimate a value for the dependent variable, $y$, given a set of independent variables $X$, the matrix $X$ must be supplied, along with values for the parameters $\beta$ and their associated variance-covariance matrix, $C$. Suitable values for $\beta$ and $C$ are usually estimated by first fitting the prediction model to a training dataset with known responses, using for example g02gaf, g02gbf, g02gcf or g02gdf. The predicted variable, and its standard error can then be obtained from:
 $y^ = g-1(η) , se(y^) = ( δg-1(x) δx ) η se(η) + Ifobs Var(y)$
where
 $η=o+Xβ , se(η) = diag⁡XCXT ,$
$o$ is a vector of offsets and ${I}_{\mathrm{fobs}}=0$, if the variance of future observations is not taken into account, and $1$ otherwise. Here $\mathrm{diag}A$ indicates the diagonal elements of matrix $A$.
If required, the variance for the $i$th future observation, $\mathrm{Var}\left({y}_{i}\right)$, can be calculated as:
 $Var(yi) = ϕ V(θ) wi$
where ${w}_{i}$ is a weight, $\varphi$ is the scale (or dispersion) parameter, and $V\left(\theta \right)$ is the variance function. Both the scale parameter and the variance function depend on the distribution used for the $y$, with:
 Poisson $V\left(\theta \right)={\mu }_{i}$, $\varphi =1$ binomial $V\left(\theta \right)=\frac{{\mu }_{i}\left({t}_{i}-{\mu }_{i}\right)}{{t}_{i}}$, $\varphi =1$ Normal $V\left(\theta \right)=1$ gamma $V\left(\theta \right)={\mu }_{i}^{2}$
In the cases of a Normal and gamma error structure, the scale parameter ($\varphi$), is supplied by you. This value is usually obtained from the routine used to fit the prediction model. In many cases, for a Normal error structure, $\varphi ={\stackrel{^}{\sigma }}^{2}$, i.e., the estimated variance.

4References

McCullagh P and Nelder J A (1983) Generalized Linear Models Chapman and Hall

5Arguments

1: $\mathbf{errfn}$Character(1) Input
On entry: indicates the distribution used to model the dependent variable, $y$.
${\mathbf{errfn}}=\text{'B'}$
The binomial distribution is used.
${\mathbf{errfn}}=\text{'G'}$
The gamma distribution is used.
${\mathbf{errfn}}=\text{'N'}$
The Normal (Gaussian) distribution is used.
${\mathbf{errfn}}=\text{'P'}$
The Poisson distribution is used.
Constraint: ${\mathbf{errfn}}=\text{'B'}$, $\text{'G'}$, $\text{'N'}$ or $\text{'P'}$.
On entry: indicates which link function is to be used.
${\mathbf{link}}=\text{'C'}$
A complementary log-log link is used.
${\mathbf{link}}=\text{'E'}$
${\mathbf{link}}=\text{'G'}$
${\mathbf{link}}=\text{'I'}$
${\mathbf{link}}=\text{'L'}$
${\mathbf{link}}=\text{'P'}$
${\mathbf{link}}=\text{'R'}$
${\mathbf{link}}=\text{'S'}$
A square root link is used.
Details on the functional form of the different links can be found in the G02 Chapter Introduction.
Constraints:
• if ${\mathbf{errfn}}=\text{'B'}$, ${\mathbf{link}}=\text{'C'}$, $\text{'G'}$ or $\text{'P'}$;
• otherwise ${\mathbf{link}}=\text{'E'}$, $\text{'I'}$, $\text{'L'}$, $\text{'R'}$ or $\text{'S'}$.
3: $\mathbf{mean}$Character(1) Input
On entry: indicates if a mean term is to be included.
${\mathbf{mean}}=\text{'M'}$
A mean term, intercept, will be included in the model.
${\mathbf{mean}}=\text{'Z'}$
The model will pass through the origin, zero-point.
Constraint: ${\mathbf{mean}}=\text{'M'}$ or $\text{'Z'}$.
4: $\mathbf{offset}$Character(1) Input
On entry: indicates if an offset is required.
${\mathbf{offset}}=\text{'Y'}$
An offset must be supplied in off.
${\mathbf{offset}}=\text{'N'}$
off is not referenced.
Constraint: ${\mathbf{offset}}=\text{'Y'}$ or $\text{'N'}$.
5: $\mathbf{weight}$Character(1) Input
On entry: if ${\mathbf{vfobs}}=\mathrm{.TRUE.}$ indicates if weights are used, otherwise weight is not referenced.
${\mathbf{weight}}=\text{'U'}$
No weights are used.
${\mathbf{weight}}=\text{'W'}$
Weights are used and must be supplied in wt.
Constraint: if ${\mathbf{vfobs}}=\mathrm{.TRUE.}$, ${\mathbf{weight}}=\text{'U'}$ or $\text{'W'}$.
6: $\mathbf{n}$Integer Input
On entry: $n$, the number of observations.
Constraint: ${\mathbf{n}}\ge 1$.
7: $\mathbf{x}\left({\mathbf{ldx}},*\right)$Real (Kind=nag_wp) array Input
Note: the second dimension of the array x must be at least ${\mathbf{m}}$.
On entry: ${\mathbf{x}}\left(\mathit{i},\mathit{j}\right)$ must contain the $\mathit{i}$th observation for the $\mathit{j}$th independent variable, for $\mathit{i}=1,2,\dots ,{\mathbf{n}}$ and $\mathit{j}=1,2,\dots ,{\mathbf{m}}$.
8: $\mathbf{ldx}$Integer Input
On entry: the first dimension of the array x as declared in the (sub)program from which g02gpf is called.
Constraint: ${\mathbf{ldx}}\ge {\mathbf{n}}$.
9: $\mathbf{m}$Integer Input
On entry: $m$, the total number of independent variables.
Constraint: ${\mathbf{m}}\ge 1$.
10: $\mathbf{isx}\left({\mathbf{m}}\right)$Integer array Input
On entry: indicates which independent variables are to be included in the model.
If ${\mathbf{isx}}\left(j\right)>0$, the variable contained in the $j$th column of x is included in the regression model.
Constraints:
• ${\mathbf{isx}}\left(\mathit{j}\right)\ge 0$, for $\mathit{j}=1,2,\dots ,{\mathbf{m}}$;
• if ${\mathbf{mean}}=\text{'M'}$, exactly ${\mathbf{ip}}-1$ values of isx must be $\text{}>0$;
• if ${\mathbf{mean}}=\text{'Z'}$, exactly ip values of isx must be $\text{}>0$.
11: $\mathbf{ip}$Integer Input
On entry: the number of independent variables in the model, including the mean or intercept if present.
Constraint: ${\mathbf{ip}}>0$.
12: $\mathbf{t}\left(*\right)$Real (Kind=nag_wp) array Input
Note: the dimension of the array t must be at least ${\mathbf{n}}$ if ${\mathbf{errfn}}=\text{'B'}$.
On entry: if ${\mathbf{errfn}}=\text{'B'}$, ${\mathbf{t}}\left(i\right)$ must contain the binomial denominator, ${t}_{i}$, for the $i$th observation.
Otherwise t is not referenced.
Constraint: if ${\mathbf{errfn}}=\text{'B'}$, ${\mathbf{t}}\left(\mathit{i}\right)\ge 0.0$, for $\mathit{i}=1,2,\dots ,n$.
13: $\mathbf{off}\left(*\right)$Real (Kind=nag_wp) array Input
Note: the dimension of the array off must be at least ${\mathbf{n}}$ if ${\mathbf{offset}}=\text{'Y'}$.
On entry: if ${\mathbf{offset}}=\text{'Y'}$, ${\mathbf{off}}\left(i\right)$ must contain the offset ${o}_{i}$, for the $i$th observation.
Otherwise off is not referenced.
14: $\mathbf{wt}\left(*\right)$Real (Kind=nag_wp) array Input
Note: the dimension of the array wt must be at least ${\mathbf{n}}$ if ${\mathbf{weight}}=\text{'W'}$ and ${\mathbf{vfobs}}=\mathrm{.TRUE.}$.
On entry: if ${\mathbf{weight}}=\text{'W'}$ and ${\mathbf{vfobs}}=\mathrm{.TRUE.}$, ${\mathbf{wt}}\left(i\right)$ must contain the weight, ${w}_{i}$, for the $i$th observation.
If the variance of future observations is not included in the standard error of the predicted variable, wt is not referenced.
Constraint: if ${\mathbf{vfobs}}=\mathrm{.TRUE.}$ and ${\mathbf{weight}}=\text{'W'}$, ${\mathbf{wt}}\left(\mathit{i}\right)\ge 0$., for $\mathit{i}=1,2,\dots ,\mathit{i}$.
15: $\mathbf{s}$Real (Kind=nag_wp) Input
On entry: if ${\mathbf{errfn}}=\text{'N'}$ or $\text{'G'}$ and ${\mathbf{vfobs}}=\mathrm{.TRUE.}$, the scale parameter, $\varphi$.
Otherwise s is not referenced and $\varphi =1$.
Constraint: if ${\mathbf{errfn}}=\text{'N'}$ or $\text{'G'}$ and ${\mathbf{vfobs}}=\mathrm{.TRUE.}$, ${\mathbf{s}}>0.0$.
16: $\mathbf{a}$Real (Kind=nag_wp) Input
On entry: if ${\mathbf{link}}=\text{'E'}$, a must contain the power of the exponential.
If ${\mathbf{link}}\ne \text{'E'}$, a is not referenced.
Constraint: if ${\mathbf{link}}=\text{'E'}$, ${\mathbf{a}}\ne 0.0$.
17: $\mathbf{b}\left({\mathbf{ip}}\right)$Real (Kind=nag_wp) array Input
On entry: the model parameters, $\beta$.
If ${\mathbf{mean}}=\text{'M'}$, ${\mathbf{b}}\left(1\right)$ must contain the mean parameter and ${\mathbf{b}}\left(i+1\right)$ the coefficient of the variable contained in the $j$th independent x, where ${\mathbf{isx}}\left(j\right)$ is the $i$th positive value in the array isx.
If ${\mathbf{mean}}=\text{'Z'}$, ${\mathbf{b}}\left(i\right)$ must contain the coefficient of the variable contained in the $j$th independent x, where ${\mathbf{isx}}\left(j\right)$ is the $i$th positive value in the array isx.
18: $\mathbf{cov}\left({\mathbf{ip}}×\left({\mathbf{ip}}+1\right)/2\right)$Real (Kind=nag_wp) array Input
On entry: the upper triangular part of the variance-covariance matrix, $C$, of the model parameters. This matrix should be supplied packed by column, i.e., the covariance between parameters ${\beta }_{i}$ and ${\beta }_{j}$, that is the values stored in ${\mathbf{b}}\left(i\right)$ and ${\mathbf{b}}\left(j\right)$, should be supplied in ${\mathbf{cov}}\left(\mathit{j}×\left(\mathit{j}-1\right)/2+\mathit{i}\right)$, for $\mathit{i}=1,2,\dots ,{\mathbf{ip}}$ and $\mathit{j}=\mathit{i},\dots ,{\mathbf{ip}}$.
Constraint: the matrix represented in cov must be a valid variance-covariance matrix.
19: $\mathbf{vfobs}$Logical Input
On entry: if ${\mathbf{vfobs}}=\mathrm{.TRUE.}$, the variance of future observations is included in the standard error of the predicted variable (i.e., ${I}_{\mathrm{fobs}}=1$), otherwise ${I}_{\mathrm{fobs}}=0$.
20: $\mathbf{eta}\left({\mathbf{n}}\right)$Real (Kind=nag_wp) array Output
On exit: the linear predictor, $\eta$.
21: $\mathbf{seeta}\left({\mathbf{n}}\right)$Real (Kind=nag_wp) array Output
On exit: the standard error of the linear predictor, $\mathrm{se}\left(\eta \right)$.
22: $\mathbf{pred}\left({\mathbf{n}}\right)$Real (Kind=nag_wp) array Output
On exit: the predicted value, $\stackrel{^}{y}$.
23: $\mathbf{sepred}\left({\mathbf{n}}\right)$Real (Kind=nag_wp) array Output
On exit: the standard error of the predicted value, $\mathrm{se}\left(\stackrel{^}{y}\right)$. If ${\mathbf{pred}}\left(i\right)$ could not be calculated, g02gpf returns ${\mathbf{ifail}}={\mathbf{22}}$, and ${\mathbf{sepred}}\left(i\right)$ is set to $-99.0$.
24: $\mathbf{ifail}$Integer Input/Output
On entry: ifail must be set to $0$, $-1$ or $1$ to set behaviour on detection of an error; these values have no effect when no error is detected.
A value of $0$ causes the printing of an error message and program execution will be halted; otherwise program execution continues. A value of $-1$ means that an error message is printed while a value of $1$ means that it is not.
If halting is not appropriate, the value $-1$ or $1$ is recommended. If message printing is undesirable, then the value $1$ is recommended. Otherwise, the value $-1$ is recommended since useful values can be provided in some output arguments even when ${\mathbf{ifail}}\ne {\mathbf{0}}$ on exit. When the value $-\mathbf{1}$ or $\mathbf{1}$ is used it is essential to test the value of ifail on exit.
On exit: ${\mathbf{ifail}}={\mathbf{0}}$ unless the routine detects an error or a warning has been flagged (see Section 6).

6Error Indicators and Warnings

If on entry ${\mathbf{ifail}}=0$ or $-1$, explanatory error messages are output on the current error message unit (as defined by x04aaf).
Errors or warnings detected by the routine:
Note: in some cases g02gpf may return useful information.
${\mathbf{ifail}}=1$
On entry, ${\mathbf{errfn}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{errfn}}=\text{'B'}$, $\text{'G'}$, $\text{'N'}$ or $\text{'P'}$.
${\mathbf{ifail}}=2$
On entry, ${\mathbf{errfn}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{link}}=⟨\mathit{\text{value}}⟩$.
Constraint: if ${\mathbf{errfn}}=\text{'B'}$, ${\mathbf{link}}=\text{'C'}$, $\text{'G'}$ or $\text{'P'}$,
otherwise, ${\mathbf{link}}=\text{'E'}$, $\text{'I'}$, $\text{'L'}$, $\text{'R'}$ or $\text{'S'}$.
${\mathbf{ifail}}=3$
On entry, ${\mathbf{mean}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{mean}}=\text{'M'}$ or $\text{'Z'}$.
${\mathbf{ifail}}=4$
On entry, ${\mathbf{offset}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{offset}}=\text{'Y'}$ or $\text{'N'}$.
${\mathbf{ifail}}=5$
On entry, ${\mathbf{weight}}=⟨\mathit{\text{value}}⟩$.
Constraint: if ${\mathbf{vfobs}}=\mathrm{.TRUE.}$, ${\mathbf{weight}}=\text{'U'}$ or $\text{'W'}$.
${\mathbf{ifail}}=6$
On entry, ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{n}}\ge 1$.
${\mathbf{ifail}}=8$
On entry, ${\mathbf{ldx}}=⟨\mathit{\text{value}}⟩$ and ${\mathbf{n}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ldx}}\ge {\mathbf{n}}$.
${\mathbf{ifail}}=9$
On entry, ${\mathbf{m}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{m}}\ge 1$.
${\mathbf{ifail}}=10$
On entry, ${\mathbf{isx}}\left(⟨\mathit{\text{value}}⟩\right)<0$.
Constraint: ${\mathbf{isx}}\left(j\right)\ge 0.0$, for $j=1,2,\dots ,{\mathbf{m}}$.
${\mathbf{ifail}}=11$
On entry, ${\mathbf{ip}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{ip}}>0$.
${\mathbf{ifail}}=12$
On entry, ${\mathbf{t}}\left(⟨\mathit{\text{value}}⟩\right)=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{t}}\left(i\right)\ge 0.0$, for all $i$.
${\mathbf{ifail}}=14$
On entry, ${\mathbf{wt}}\left(⟨\mathit{\text{value}}⟩\right)=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{wt}}\left(i\right)\ge 0.0$, for all $i$.
${\mathbf{ifail}}=15$
On entry, ${\mathbf{s}}=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{s}}>0.0$.
${\mathbf{ifail}}=16$
On entry, ${\mathbf{a}}=0.0$.
Constraint: if ${\mathbf{link}}=\text{'E'}$, ${\mathbf{a}}\ne 0.0$.
${\mathbf{ifail}}=18$
On entry, ${\mathbf{cov}}\left(⟨\mathit{\text{value}}⟩\right)=⟨\mathit{\text{value}}⟩$.
Constraint: ${\mathbf{cov}}\left(i\right)\ge 0.0$ for at least one diagonal element.
${\mathbf{ifail}}=22$
At least one predicted value could not be calculated as required. sepred is set to $-99.0$ for affected predicted values.
${\mathbf{ifail}}=-99$
See Section 7 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-399$
Your licence key may have expired or may not have been installed correctly.
See Section 8 in the Introduction to the NAG Library FL Interface for further information.
${\mathbf{ifail}}=-999$
Dynamic memory allocation failed.
See Section 9 in the Introduction to the NAG Library FL Interface for further information.

Not applicable.

8Parallelism and Performance

g02gpf makes calls to BLAS and/or LAPACK routines, which may be threaded within the vendor library used by this implementation. Consult the documentation for the vendor library for further information.
Please consult the X06 Chapter Introduction for information on how to control and interrogate the OpenMP environment used within this routine. Please also consult the Users' Note for your implementation for any additional implementation-specific information.

When using g02gpf following a call to g02daf you should set ${\mathbf{errfn}}=\text{'N'}$, ${\mathbf{link}}=\text{'I'}$, ${\mathbf{offset}}=\text{'N'}$ and ${\mathbf{s}}=\frac{{\mathbf{rss}}}{{\mathbf{idf}}}$.

10Example

The model
 $y = 1 β1 + β2 x + ε$
is fitted to a training dataset with five observations. The resulting model is then used to predict the response for two new observations.

10.1Program Text

Program Text (g02gpfe.f90)

10.2Program Data

Program Data (g02gpfe.d)

10.3Program Results

Program Results (g02gpfe.r)