g02gc:: Correlation and Regression Analysis (NAG Toolbox)

A generalized linear model with Poisson errors consists of the following elements:

(a)

a set of

n

observations,

y_{i}

, from a Poisson distribution:

\frac{μ^{y} e^{- μ}}{y!} .

(b)

X

, a set of

p

independent variables for each observation,

x_{1}, x_{2}, \dots, x_{p}

(c)

a linear model:

η = \sum β_{j} x_{j} .

(d)

a link between the linear predictor,

η

, and the mean of the distribution,

μ

η = g (μ)

. The possible link functions are:

(i)	exponent link: $η = μ^{a}$ , for a constant $a$ ,
(ii)	identity link: $η = μ$ ,
(iii)	log link: $η = \log μ$ ,
(iv)	square root link: $η = \sqrt{μ}$ ,
(v)	reciprocal link: $η = \frac{1}{μ}$ .

(e)

a measure of fit, the deviance:

\sum_{i = 1}^{n} dev (y_{i}, {\hat{μ}}_{i}) = \sum_{i = 1}^{n} 2 (y_{i} \log (\frac{y_{i}}{{\hat{μ}}_{i}}) - (y_{i} - {\hat{μ}}_{i})) .

The linear arguments are estimated by iterative weighted least squares. An adjusted dependent variable,

z

, is formed:

z = η + (y - μ) \frac{d η}{d μ}

and a working weight,

w

w = {(τ d \frac{d η}{d μ})}^{2},

where

τ = \sqrt{μ}

R

is of full rank, then

\hat{β}

is the solution to:

R \hat{β} = Q^{T} w^{1 / 2} z .

R

is not of full rank a solution is obtained by means of a singular value decomposition (SVD) of

R

R = Q_{*} (\begin{array}{l} D & 0 \\ 0 & 0 \end{array}) P^{T},

where

D

is a

k

k

diagonal matrix with nonzero diagonal elements,

k

being the rank of

R

and

w^{1 / 2} X

The initial values for the algorithm are obtained by taking

\hat{η} = g (y) .

The fit of the model can be assessed by examining and testing the deviance, in particular by comparing the difference in deviance between nested models, i.e., when one model is a sub-model of the other. The difference in deviance between two nested models has, asymptotically, a

χ^{2}

-distribution with degrees of freedom given by the difference in the degrees of freedom associated with the two deviances.

The estimated linear predictor

\hat{η} = X \hat{β}

, can be written as

H w^{1 / 2} z

for an

n

n

matrix

H

. The

i

th diagonal elements of

H

h_{i}

, give a measure of the influence of the

i

th values of the independent variables on the fitted regression model. These are known as leverages.

If part of the linear predictor can be represented by a variables with a known coefficient then this can be included in the model by using an offset,

o

η = o + \sum β_{j} x_{j} .

If the model is not of full rank the solution given will be only one of the possible solutions. Other estimates may be obtained by applying constraints to the arguments. These solutions can be obtained by using nag_correg_glm_constrain (g02gk) after using nag_correg_glm_poisson (g02gc). Only certain linear combinations of the arguments will have unique estimates, these are known as estimable functions, these can be estimated and tested using nag_correg_glm_estfunc (g02gn).

Details of the SVD are made available in the form of the matrix

P^{*}

P^{*} = (\begin{matrix} D^{- 1} P_{1}^{T} \\ P_{0}^{T} \end{matrix}) .

The generalized linear model with Poisson errors can be used to model contingency table data; see Cook and Weisberg (1982) and McCullagh and Nelder (1983).

References

Parameters

Compulsory Input Parameters

Optional Input Parameters

Output Parameters

Error Indicators and Warnings

Cases prefixed with W are classified as warnings and do not generate an error of type NAG:error_n. See nag_issue_warnings.

Accuracy

The accuracy depends on the value of tol as described in Arguments. As the deviance is a function of

\log μ

the accuracy of the

\hat{β}

will only be a function of tol. tol should therefore be set smaller than the accuracy required for

\hat{β}

Further Comments

Example

function g02gc_example


fprintf('g02gc example results\n\n');

x = [1, 0, 0, 1, 0, 0, 0, 0;
     1, 0, 0, 0, 1, 0, 0, 0;
     1, 0, 0, 0, 0, 1, 0, 0;
     1, 0, 0, 0, 0, 0, 1, 0;
     1, 0, 0, 0, 0, 0, 0, 1;
     0, 1, 0, 1, 0, 0, 0, 0;
     0, 1, 0, 0, 1, 0, 0, 0;
     0, 1, 0, 0, 0, 1, 0, 0;
     0, 1, 0, 0, 0, 0, 1, 0;
     0, 1, 0, 0, 0, 0, 0, 1;
     0, 0, 1, 1, 0, 0, 0, 0;
     0, 0, 1, 0, 1, 0, 0, 0;
     0, 0, 1, 0, 0, 1, 0, 0;
     0, 0, 1, 0, 0, 0, 1, 0;
     0, 0, 1, 0, 0, 0, 0, 1];

y = [141;   67;  114;   79;   39; 
     131;   66;  143;   72;   35;
      36;   14;   38;   28;   16];

[n,m] = size(x);
isx = ones(m,1,'int64');
ip = int64(m+1);

link = 'L';
mean_p = 'M';
eps = 1e-6;
maxit = int64(20);

% Fit generalized linear model with Poisson errors
[dev, idf, b, irank, se, covar, v, ifail] = ...
g02gc( ...
       link, mean_p, x, isx, ip, y, 'eps', eps, 'maxit', maxit);

%  Display results
fprintf('Deviance           = %12.4e\n', dev);
fprintf('Degrees of freedom = %2d\n', idf);
fprintf('\nVariable   Parameter estimate   Standard error\n\n');
ivar = double([1:ip]');
fprintf('%6d%16.4f%20.4f\n',[ivar b se]');
fprintf('\n     y        fv     residual       h\n\n');
for j=1:n
  fprintf('%7.1f%10.2f%12.4f%10.3f\n',y(j),v(j,2),v(j,5),v(j,6));
end

g02gc example results

Deviance           =   9.0379e+00
Degrees of freedom =  8

Variable   Parameter estimate   Standard error

     1          2.5977              0.0258
     2          1.2619              0.0438
     3          1.2777              0.0436
     4          0.0580              0.0668
     5          1.0307              0.0551
     6          0.2910              0.0732
     7          0.9876              0.0559
     8          0.4880              0.0675
     9         -0.1996              0.0904

     y        fv     residual       h

  141.0    132.99      0.6875     0.604
   67.0     63.47      0.4386     0.514
  114.0    127.38     -1.2072     0.596
   79.0     77.29      0.1936     0.532
   39.0     38.86      0.0222     0.482
  131.0    135.11     -0.3553     0.608
   66.0     64.48      0.1881     0.520
  143.0    129.41      1.1749     0.601
   72.0     78.52     -0.7465     0.537
   35.0     39.48     -0.7271     0.488
   36.0     39.90     -0.6276     0.393
   14.0     19.04     -1.2131     0.255
   38.0     38.21     -0.0346     0.382
   28.0     23.19      0.9675     0.282
   16.0     11.66      1.2028     0.206

$v (i, 1)$	contains the linear predictor value, $η_{i}$ , for $i = 1, 2, \dots, n$ .
$v (i, 2)$	contains the fitted value, ${\hat{μ}}_{i}$ , for $i = 1, 2, \dots, n$ .
$v (i, 3)$	contains the variance standardization, $\frac{1}{τ_{i}}$ , for $i = 1, 2, \dots, n$ .
$v (i, 4)$	contains the square root of the working weight, $w_{i}^{\frac{1}{2}}$ , for $i = 1, 2, \dots, n$ .
$v (i, 5)$	contains the deviance residual, $r_{i}$ , for $i = 1, 2, \dots, n$ .
$v (i, 6)$	contains the leverage, $h_{i}$ , for $i = 1, 2, \dots, n$ .
$v (i, 7)$	contains the offset, $o_{i}$ , for $i = 1, 2, \dots, n$ . If $offset ='N'$ , all values will be zero.
$v (i, j)$	for $j = 8, \dots, ip + 7$ , contains the results of the $Q R$ decomposition or the singular value decomposition.

On entry,	$n < 2$ ,
or	$m < 1$ ,
or	$ldx < n$ ,
or	$ldv < n$ ,
or	$ip < 1$ ,
or	$link \neq'E'$ , $'I'$ , $'L'$ , $'S'$ or $'R'$ ,
or	$link ='E'$ and $a = 0.0$ ,
or	$mean_p \neq'M'$ or $'Z'$ ,
or	$weight \neq'U'$ or $'W'$ ,
or	$offset \neq'N'$ or $'Y'$ ,
or	$maxit < 0$ ,
or	$tol < 0.0$ ,
or	$eps < 0.0$ .

On entry,	a value of $isx < 0$ ,
or	the value of ip is incompatible with the values of mean_p and isx,
or	ip is greater than the effective number of observations.

NAG Toolbox: nag_correg_glm_poisson (g02gc)

▸▿ Contents

Purpose

Syntax

Description