hide long namesshow long names
hide short namesshow short names
Integer type:  int32  int64  nag_int  show int32  show int32  show int64  show int64  show nag_int  show nag_int

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

NAG Toolbox: nag_correg_quantile_linreg (g02qg)

 Contents

    1  Purpose
    2  Syntax
    7  Accuracy
    9  Example

Purpose

nag_correg_quantile_linreg (g02qg) performs a multiple linear quantile regression. Parameter estimates and, if required, confidence limits, covariance matrices and residuals are calculated. nag_correg_quantile_linreg (g02qg) may be used to perform a weighted quantile regression. A simplified interface for nag_correg_quantile_linreg (g02qg) is provided by nag_correg_quantile_linreg_easy (g02qf).

Syntax

[df, b, bl, bu, ch, res, state, info, ifail] = g02qg(sorder, intcpt, weight, dat, isx, y, tau, b, iopts, opts, state, 'n', n, 'm', m, 'ip', ip, 'wt', wt, 'ntau', ntau)
[df, b, bl, bu, ch, res, state, info, ifail] = nag_correg_quantile_linreg(sorder, intcpt, weight, dat, isx, y, tau, b, iopts, opts, state, 'n', n, 'm', m, 'ip', ip, 'wt', wt, 'ntau', ntau)

Description

Given a vector of n observed values, y = y i : i = 1, 2, , n , an n×p design matrix X, a column vector, x, of length p holding the ith row of X and a quantile τ 0 , 1 , nag_correg_quantile_linreg (g02qg) estimates the p-element vector β as the solution to
minimize β p i=1 n ρ τ y i - xiT β (1)
where ρ τ  is the piecewise linear loss function ρ τ z = z τ - I z < 0 , and I z < 0  is an indicator function taking the value 1 if z<0 and 0 otherwise. Weights can be incorporated by replacing X and y with WX and Wy respectively, where W is an n×n diagonal matrix. Observations with zero weights can either be included or excluded from the analysis; this is in contrast to least squares regression where such observations do not contribute to the objective function and are therefore always dropped.
nag_correg_quantile_linreg (g02qg) uses the interior point algorithm of Portnoy and Koenker (1997), described briefly in Algorithmic Details, to obtain the parameter estimates β^, for a given value of τ.
Under the assumption of Normally distributed errors, Koenker (2005) shows that the limiting covariance matrix of β^-β has the form
Σ = τ 1 - τ n H n -1 J n H n -1  
where Jn = n-1 i=1 n x i xiT  and Hn is a function of τ, as described below. Given an estimate of the covariance matrix, Σ^, lower (β^L) and upper (β^U) limits for an 100×α% confidence interval can be calculated for each of the p parameters, via
β^ Li = β^ i - t n-p , 1 + α / 2 Σ^ ii , β^ Ui = β^ i + t n-p , 1 + α / 2 Σ^ ii  
where tn-p,0.975 is the 97.5 percentile of the Student's t distribution with n-k degrees of freedom, where k is the rank of the cross-product matrix XTX.
Four methods for estimating the covariance matrix, Σ, are available:
(i) Independent, identically distributed (IID) errors
Under an assumption of IID errors the asymptotic relationship for Σ simplifies to
Σ = τ 1 - τ n s τ 2 XT X -1  
where s is the sparsity function. nag_correg_quantile_linreg (g02qg) estimates sτ from the residuals, ri = yi - xiT β^  and a bandwidth hn.
(ii) Powell Sandwich
Powell (1991) suggested estimating the matrix Hn by a kernel estimator of the form
H^ n = n cn -1 i=1 n K r i c n xi xiT  
where K is a kernel function and cn satisfies lim n cn 0  and lim n n cn . When the Powell method is chosen, nag_correg_quantile_linreg (g02qg) uses a Gaussian kernel (i.e., K=ϕ) and sets
cn = minσr, qr3 - qr1 / 1.34 × Φ-1 τ + hn - Φ-1 τ - hn  
where hn is a bandwidth, σr,qr1 and qr3 are, respectively, the standard deviation and the 25% and 75% quantiles for the residuals, ri.
(iii) Hendricks–Koenker Sandwich
Koenker (2005) suggested estimating the matrix Hn using
H^ n = n-1 i=1 n 2 hn xiT β^ τ + hn - β^ τ - hn xi xiT  
where hn is a bandwidth and β^τ+hn denotes the parameter estimates obtained from a quantile regression using the τ+hnth quantile. Similarly with β^τ-hn.
(iv) Bootstrap
The last method uses bootstrapping to either estimate a covariance matrix or obtain confidence intervals for the parameter estimates directly. This method therefore does not assume Normally distributed errors. Samples of size n are taken from the paired data yi,xi (i.e., the independent and dependent variables are sampled together). A quantile regression is then fitted to each sample resulting in a series of bootstrap estimates for the model parameters, β. A covariance matrix can then be calculated directly from this series of values. Alternatively, confidence limits, β^L and β^U, can be obtained directly from the 1-α/2 and 1+α/2 sample quantiles of the bootstrap estimates.
Further details of the algorithms used to calculate the covariance matrices can be found in Algorithmic Details.
All three asymptotic estimates of the covariance matrix require a bandwidth, hn. Two alternative methods for determining this are provided:
(i) Sheather–Hall
hn = 1.5 Φ-1αb ϕ Φ-1τ 2 n 2 Φ-1τ + 1 13  
for a user-supplied value αb,
(ii) Bofinger
hn = 4.5 ϕ Φ-1τ 4 n 2 Φ-1τ + 1 2 15  
nag_correg_quantile_linreg (g02qg) allows optional arguments to be supplied via the iopts and opts arrays (see Optional Parameters for details of the available options). Prior to calling nag_correg_quantile_linreg (g02qg) the optional parameter arrays, iopts and opts must be initialized by calling nag_correg_optset (g02zk) with optstr set to Initialize=g02qg (see Optional Parameters for details on the available options). If bootstrap confidence limits are required (Interval Method=BOOTSTRAP XY) then one of the random number initialization functions nag_rand_init_repeat (g05kf) (for a repeatable analysis) or nag_rand_init_nonrepeat (g05kg) (for an unrepeatable analysis) must also have been previously called.

References

Koenker R (2005) Quantile Regression Econometric Society Monographs, Cambridge University Press, New York
Mehrotra S (1992) On the implementation of a primal-dual interior point method SIAM J. Optim. 2 575–601
Nocedal J and Wright S J (1999) Numerical Optimization Springer Series in Operations Research, Springer, New York
Portnoy S and Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute error estimators Statistical Science 4 279–300
Powell J L (1991) Estimation of monotonic regression models under quantile restrictions Nonparametric and Semiparametric Methods in Econometrics Cambridge University Press, Cambridge

Parameters

Compulsory Input Parameters

1:     sorder int64int32nag_int scalar
Determines the storage order of variates supplied in dat.
Constraint: sorder=1 or 2.
2:     intcpt – string (length ≥ 1)
Indicates whether an intercept will be included in the model. The intercept is included by adding a column of ones as the first column in the design matrix, X.
intcpt='Y'
An intercept will be included in the model.
intcpt='N'
An intercept will not be included in the model.
Constraint: intcpt='N' or 'Y'.
3:     weight – string (length ≥ 1)
Indicates if weights are to be used.
weight='W'
A weighted regression model is fitted to the data using weights supplied in array wt.
weight='U'
An unweighted regression model is fitted to the data and array wt is not referenced.
Constraint: weight='U' or 'W'.
4:     datlddat: – double array
The first dimension, lddat, of the array dat must satisfy
  • if sorder=1, lddatn;
  • otherwise lddatm.
The second dimension of the array dat must be at least m if sorder=1 and at least n if sorder=2.
The ith value for the jth variate, for i=1,2,,n and j=1,2,,m, must be supplied in
  • datij if sorder=1, and
  • datji if sorder=2.
The design matrix X is constructed from dat, isx and intcpt.
5:     isxm int64int32nag_int array
Indicates which independent variables are to be included in the model.
isxj=0
The jth variate, supplied in dat, is not included in the regression model.
isxj=1
The jth variate, supplied in dat, is included in the regression model.
Constraints:
  • isxj=0 or 1, for j=1,2,,m;
  • if intcpt='Y', exactly ip-1 values of isx must be set to 1;
  • if intcpt='N', exactly ip values of isx must be set to 1.
6:     yn – double array
y, the observations on the dependent variable.
7:     tauntau – double array
The vector of quantiles of interest. A separate model is fitted to each quantile.
Constraint: ε<tauj<1-ε where ε is the machine precision returned by nag_machine_precision (x02aj), for j=1,2,,ntau.
8:     bipntau – double array
If Calculate Initial Values=NO, bil must hold an initial estimates for β^i, for i=1,2,,ip and l=1,2,,ntau. If Calculate Initial Values=YES, b need not be set.
9:     iopts: int64int32nag_int array
Note: the dimension of this array is dictated by the requirements of associated functions that must have been previously called. This array must be the same array passed as argument iopts in the previous call to nag_correg_optset (g02zk).
Optional parameter array, as initialized by a call to nag_correg_optset (g02zk).
10:   opts: – double array
Note: the dimension of this array is dictated by the requirements of associated functions that must have been previously called. This array must be the same array passed as argument opts in the previous call to nag_correg_optset (g02zk).
Optional parameter array, as initialized by a call to nag_correg_optset (g02zk).
11:   state: int64int32nag_int array
Note: the actual argument supplied must be the array state supplied to the initialization routines nag_rand_init_repeat (g05kf) or nag_rand_init_nonrepeat (g05kg).
If Interval Method=BOOTSTRAP XY, state contains information about the selected random number generator. Otherwise state is not referenced.

Optional Input Parameters

1:     n int64int32nag_int scalar
Default: the dimension of the array y and the first dimension of the array dat. (An error is raised if these dimensions are not equal.)
The total number of observations in the dataset. If no weights are supplied, or no zero weights are supplied or observations with zero weights are included in the model then n=n. Otherwise n=n+ the number of observations with zero weights.
Constraint: n2.
2:     m int64int32nag_int scalar
Default: the dimension of the array isx and the first dimension of the array dat. (An error is raised if these dimensions are not equal.)
m, the total number of variates in the dataset.
Constraint: m0.
3:     ip int64int32nag_int scalar
Default: the first dimension of the array b.
p, the number of independent variables in the model, including the intercept, see intcpt, if present.
Constraints:
  • 1ip<n;
  • if intcpt='Y', 1ipm+1;
  • if intcpt='N', 1ipm.
4:     wt: – double array
The dimension of the array wt must be at least n if weight='W'
If weight='W', wt must contain the diagonal elements of the weight matrix W. Otherwise wt is not referenced.
When
Drop Zero Weights=YES
If wti=0.0, the ith observation is not included in the model, in which case the effective number of observations, n, is the number of observations with nonzero weights. If Return Residuals=YES, the values of res will be set to zero for observations with zero weights.
Drop Zero Weights=NO
All observations are included in the model and the effective number of observations is n, i.e., n=n.
Constraints:
  • If weight='W', wti0.0, for i=1,2,,n;
  • The effective number of observations 2.
5:     ntau int64int32nag_int scalar
Default: the dimension of the array tau and the second dimension of the array b. (An error is raised if these dimensions are not equal.)
The number of quantiles of interest.
Constraint: ntau1.

Output Parameters

1:     df – double scalar
The degrees of freedom given by n-k, where n is the effective number of observations and k is the rank of the cross-product matrix XTX.
2:     bipntau – double array
bil, for i=1,2,,ip, contains the estimates of the parameters of the regression model, β^, estimated for τ=taul.
If intcpt='Y', b1l will contain the estimate corresponding to the intercept and bi+1l will contain the coefficient of the jth variate contained in dat, where isxj is the ith nonzero value in the array isx.
If intcpt='N', bil will contain the coefficient of the jth variate contained in dat, where isxj is the ith nonzero value in the array isx.
3:     blip: – double array
The second dimension of the array bl will be ntau if Interval MethodNONE.
If Interval MethodNONE, blil contains the lower limit of an 100×α% confidence interval for bil, for i=1,2,,ip and l=1,2,,ntau.
If Interval Method=NONE, bl is not referenced.
The method used for calculating the interval is controlled by the optional parameters Interval Method and Bootstrap Interval Method. The size of the interval, α, is controlled by the optional parameter Significance Level.
4:     buip: – double array
The second dimension of the array bu will be ntau if Interval MethodNONE.
If Interval MethodNONE, buil contains the upper limit of an 100×α% confidence interval for bil, for i=1,2,,ip and l=1,2,,ntau.
If Interval Method=NONE, bu is not referenced.
The method used for calculating the interval is controlled by the optional parameters Interval Method and Bootstrap Interval Method. The size of the interval, α is controlled by the optional parameter Significance Level.
5:     chipip: – double array
The last dimension of the array ch will be ntau if Interval MethodNONE and Matrix Returned=COVARIANCE and at least ntau+1 if Interval MethodNONE, IID or BOOTSTRAP XY and Matrix Returned=H INVERSE
Depending on the supplied optional parameters, ch will either not be referenced, hold an estimate of the upper triangular part of the covariance matrix, Σ, or an estimate of the upper triangular parts of nJn and n-1Hn-1.
If Interval Method=NONE or Matrix Returned=NONE, ch is not referenced.
If Interval Method=BOOTSTRAP XY or IID and Matrix Returned=H INVERSE, ch is not referenced.
Otherwise, for i,j=1,2,,ip,ji and l=1,2,,ntau:
  • If Matrix Returned=COVARIANCE, chijl holds an estimate of the covariance between bil and bjl.
  • If Matrix Returned=H INVERSE, chij1 holds an estimate of the i,jth element of nJn and chijl+1 holds an estimate of the i,jth element of n-1Hn-1, for τ=taul.
The method used for calculating Σ and Hn-1 is controlled by the optional parameter Interval Method.
6:     resn: – double array
The second dimension of the array res will be ntau if Return Residuals=YES.
If Return Residuals=YES, resil holds the (weighted) residuals, ri, for τ=taul, for i=1,2,,n and l=1,2,,ntau.
If weight='W' and Drop Zero Weights=YES, the value of res will be set to zero for observations with zero weights.
If Return Residuals=NO, res is not referenced.
7:     state: int64int32nag_int array
8:     infontau int64int32nag_int array
infoi holds additional information concerning the model fitting and confidence limit calculations when τ=taui.
Code Warning
0 Model fitted and confidence limits (if requested) calculated successfully
1 The function did not converge. The returned values are based on the estimate at the last iteration. Try increasing Iteration Limit whilst calculating the parameter estimates or relaxing the definition of convergence by increasing Tolerance.
2 A singular matrix was encountered during the optimization. The model was not fitted for this value of τ.
4 Some truncation occurred whilst calculating the confidence limits for this value of τ. See Algorithmic Details for details. The returned upper and lower limits may be narrower than specified.
8 The function did not converge whilst calculating the confidence limits. The returned limits are based on the estimate at the last iteration. Try increasing Iteration Limit.
16 Confidence limits for this value of τ could not be calculated. The returned upper and lower limits are set to a large positive and large negative value respectively as defined by the optional parameter Big.
It is possible for multiple warnings to be applicable to a single model. In these cases the value returned in info is the sum of the corresponding individual nonzero warning codes.
9:     ifail int64int32nag_int scalar
ifail=0 unless the function detects an error (see Error Indicators and Warnings).

Error Indicators and Warnings

Errors or warnings detected by the function:
   ifail=11
Constraint: sorder=1 or 2.
   ifail=21
On entry, intcpt=_ was an illegal value.
   ifail=31
On entry, weight had an illegal value.
   ifail=41
Constraint: n2.
   ifail=51
Constraint: m0.
   ifail=71
Constraint: lddatn.
   ifail=72
Constraint: lddatm.
   ifail=81
Constraint: isxi=0 or 1 for all i.
   ifail=91
Constraint: 1ip<n.
   ifail=92
On entry, ip is not consistent with isx or intcpt.
   ifail=111
Constraint: wti0.0 for all i.
   ifail=112
Constraint: effective number of observations_.
   ifail=121
Constraint: ntau1.
   ifail=131
On entry is invalid.
   ifail=201
On entry, either the option arrays have not been initialized or they have been corrupted.
   ifail=221
On entry, state vector has been corrupted or not initialized.
   ifail=231
A potential problem occurred whilst fitting the model(s).
Additional information has been returned in info.
   ifail=-99
An unexpected error has been triggered by this routine. Please contact NAG.
   ifail=-399
Your licence key may have expired or may not have been installed correctly.
   ifail=-999
Dynamic memory allocation failed.

Accuracy

Not applicable.

Further Comments

nag_correg_quantile_linreg (g02qg) allocates internally approximately the following elements of double storage: 13n+ np+ 3p2+ 6p+ 3p+1×ntau. If Interval Method=BOOTSTRAP XY then a further np elements are required, and this increases by p×ntau×Bootstrap Iterations if Bootstrap Interval Method=QUANTILE. Where possible, any user-supplied output arrays are used as workspace and so the amount actually allocated may be less. If sorder=2, weight='U', intcpt='N' and ip=m an internal copy of the input data is avoided and the amount of locally allocated memory is reduced by np.

Example

A quantile regression model is fitted to Engels 1857 study of household expenditure on food. The model regresses the dependent variable, household food expenditure, against two explanatory variables, a column of ones and household income. The model is fit for five different values of τ and the covariance matrix is estimated assuming Normal IID errors. Both the covariance matrix and the residuals are returned.
function g02qg_example


fprintf('g02qg example results\n\n');

sorder = int64(1);
c1 = 'y';
weight = 'u';
dat = [ 420.1577;  541.4117;  901.1575;  639.0802;  750.8756;  945.7989;
        829.3979;  979.1648; 1309.8789; 1492.3987;  502.8390;  616.7168;
        790.9225;  555.8786;  713.4412;  838.7561;  535.0766;  596.4408;
        924.5619;  487.7583;  692.6397;  997.8770;  506.9995;  654.1587;
        933.9193;  433.6813;  587.5962;  896.4746;  454.4782;  584.9989;
        800.7990;  502.4369;  713.5197;  906.0006;  880.5969;  796.8289;
        854.8791; 1167.3716;  523.8000;  670.7792;  377.0584;  851.5430;
       1121.0937;  625.5179;  805.5377;  558.5812;  884.4005; 1257.4989;
       2051.1789; 1466.3330;  730.0989; 2432.3910;  940.9218; 1177.8547;
       1222.5939; 1519.5811;  687.6638;  953.1192;  953.1192;  953.1192;
        939.0418; 1283.4025; 1511.5789; 1342.5821;  511.7980;  689.7988;
       1532.3074; 1056.0808;  387.3195;  387.3195;  410.9987;  499.7510;
        832.7554;  614.9986;  887.4658; 1595.1611; 1807.9520;  541.2006;
       1057.6767;  800.7990; 1245.6964; 1201.0002;  634.4002;  956.2315;
       1148.6010; 1768.8236; 2822.5330;  922.3548; 2293.1920;  627.4726;
        889.9809; 1162.2000; 1197.0794;  530.7972; 1142.1526; 1088.0039;
        484.6612; 1536.0201;  678.8974;  671.8802;  690.4683;  860.6948;
        873.3095;  894.4598; 1148.6470;  926.8762;  839.0414;  829.4974;
       1264.0043; 1937.9771;  698.8317;  920.4199; 1897.5711;  891.6824;
        889.6784; 1221.4818;  544.5991; 1031.4491; 1462.9497;  830.4353;
        975.0415; 1337.9983;  867.6427;  725.7459;  989.0056; 1525.0005;
        672.1960;  923.3977;  472.3215;  590.7601;  831.7983; 1139.4945;
        507.5169;  576.1972;  696.5991;  650.8180;  949.5802;  497.1193;
        570.1674;  724.7306;  408.3399;  638.6713; 1225.7890;  715.3701;
        800.4708;  975.5974; 1613.7565;  608.5019;  958.6634;  835.9426;
       1024.8177; 1006.4353;  726.0000;  494.4174;  776.5958;  415.4407;
        581.3599;  643.3571; 2551.6615; 1795.3226; 1165.7734;  815.6212;
       1264.2066; 1095.4056;  447.4479; 1178.9742;  975.8023; 1017.8522;
        423.8798;  558.7767;  943.2487; 1348.3002; 2340.6174;  587.1792;
       1540.9741; 1115.8481; 1044.6843; 1389.7929; 2497.7860; 1585.3809;
       1862.0438; 2008.8546;  697.3099;  571.2517;  598.3465;  461.0977;
        977.1107;  883.9849;  718.3594;  543.8971; 1587.3480; 4957.8130;
        969.6838;  419.9980;  561.9990;  689.5988; 1398.5203;  820.8168;
        875.1716; 1392.4499; 1256.3174; 1362.8590; 1999.2552; 1209.4730;
       1125.0356; 1827.4010; 1014.1540;  880.3944;  873.7375;  951.4432;
        473.0022;  601.0030;  713.9979;  829.2984;  959.7953; 1212.9613;
        958.8743; 1129.4431; 1943.0419;  539.6388;  463.5990;  562.6400;
        736.7584; 1415.4461; 2208.7897;  636.0009;  759.4010; 1078.8382;
        748.6413;  987.6417;  788.0961; 1020.0225; 1230.9235;  440.5174;
        743.0772];
y   = [ 255.8394;  310.9587;  485.6800;  402.9974;  495.5608;  633.7978;
        630.7566;  700.4409;  830.9586;  815.3602;  338.0014;  412.3613;
        520.0006;  452.4015;  512.7201;  658.8395;  392.5995;  443.5586;
        640.1164;  333.8394;  466.9583;  543.3969;  317.7198;  424.3209;
        518.9617;  338.0014;  419.6412;  476.3200;  386.3602;  423.2783;
        503.3572;  354.6389;  497.3182;  588.5195;  654.5971;  550.7274;
        528.3770;  640.4813;  401.3204;  435.9990;  276.5606;  588.3488;
        664.1978;  444.8602;  462.8995;  377.7792;  553.1504;  810.8962;
       1067.9541; 1049.8788;  522.7012; 1424.8047;  517.9196;  830.9586;
        925.5795; 1162.0024;  383.4580;  621.1173;  621.1173;  621.1173;
        548.6002;  745.2353;  837.8005;  795.3402;  418.5976;  508.7974;
        883.2780;  742.5276;  242.3202;  242.3202;  266.0010;  408.4992;
        614.7588;  385.3184;  515.6200; 1138.1620;  993.9630;  299.1993;
        750.3202;  572.0807;  907.3969;  811.5776;  427.7975;  649.9985;
        860.6002; 1143.4211; 2032.6792;  590.6183; 1570.3911;  483.4800;
        600.4804;  696.2021;  774.7962;  390.5984;  612.5619;  708.7622;
        296.9192; 1071.4627;  496.5976;  503.3974;  357.6411;  430.3376;
        624.6990;  582.5413;  580.2215;  543.8807;  588.6372;  627.9999;
        712.1012;  968.3949;  482.5816;  593.1694; 1033.5658;  693.6795;
        693.6795;  761.2791;  361.3981;  628.4522;  771.4486;  757.1187;
        821.5970; 1022.3202;  679.4407;  538.7491;  679.9981;  977.0033;
        561.2015;  728.3997;  372.3186;  361.5210;  620.8006;  819.9964;
        360.8780;  395.7608;  442.0001;  404.0384;  670.7993;  297.5702;
        353.4882;  383.9376;  284.8008;  431.1000;  801.3518;  448.4513;
        577.9111;  570.5210;  865.3205;  444.5578;  680.4198;  576.2779;
        708.4787;  734.2356;  433.0010;  327.4188;  485.5198;  305.4390;
        468.0008;  459.8177;  863.9199;  831.4407;  534.7610;  392.0502;
        934.9752;  813.3081;  263.7100;  769.0838;  630.5863;  645.9874;
        319.5584;  348.4518;  614.5068;  662.0096; 1504.3708;  406.2180;
        692.1689;  588.1371;  511.2609;  700.5600; 1301.1451;  879.0660;
        912.8851; 1509.7812;  484.0605;  399.6703;  444.1001;  248.8101;
        527.8014;  500.6313;  436.8107;  374.7990;  726.3921; 1827.2000;
        523.4911;  334.9998;  473.2009;  581.2029;  929.7540;  591.1974;
        637.5483;  674.9509;  776.7589;  959.5170; 1250.9643;  737.8201;
        810.6772;  983.0009;  708.8968;  633.1200;  631.7982;  608.6419;
        300.9999;  377.9984;  397.0015;  588.5195;  681.7616;  807.3603;
        696.8011;  811.1962; 1305.7201;  442.0001;  353.6013;  468.0008;
        526.7573;  890.2390; 1318.8033;  331.0005;  416.4015;  596.8406;
        429.0399;  619.6408;  400.7990;  775.0209;  772.7611;  306.5191;
        522.6019];

isx = [int64(1)];
tau = [0.10; 0.25; 0.50; 0.75; 0.90];
state = zeros(1, 1, 'int64');
ip = 2;
b = zeros(2, 5);
iopts = zeros(100, 1, 'int64');
opts = zeros(100, 1);

% Initialize the optional argument array
[iopts, opts, ifail] = g02zk( ...
                              'Initialize = g02qg', iopts, opts);

% Set optional arguments
[iopts, opts, ifail] = g02zk( ...
                              'Return Residuals = Yes', iopts, opts);
[iopts, opts, ifail] = g02zk( ...
                              'Matrix Returned = Covariance', iopts, opts);
[iopts, opts, ifail] = g02zk( ...
                              'Interval Method = IID', iopts, opts);

% Call the model fitting routine
[df, b, bl, bu, ch, res, state, info, ifail] = ...
  g02qg( ...
         sorder, c1, weight, dat, isx, y, tau, b, iopts, opts, state);

% Display the parameter estimates
% plot setup
t = '\tau';
fig1 = figure;
hold on;
plot(dat,y,'+r');
tt{1} = 'data';
% loop over tau
for l=1:numel(tau)
  fprintf('\nQuantile: %6.3f\n\n', tau(l));
  fprintf('        Lower   Parameter   Upper\n');
  fprintf('        Limit   Estimate    Limit\n');
  for j=1:2
    fprintf('%3d   %7.3f   %7.3f   %7.3f\n', j, bl(j,l), b(j,l), bu(j,l));
  end
  fprintf('\nCovariance matrix\n');
  for i=1:ip
    fprintf('%10.3e ', ch(1:i, i, l));
    fprintf('\n');
  end
  fprintf('\n');
  plot([0 (2000-b(1,l))/b(2,l)],[b(1,l) 2000]);
  tt{l+1} = sprintf('%s = %4.2f',t,tau(l));
end
% plot labels
legend(tt,'Location','SouthEast')
xlabel('Household income');
ylabel('Household food expenditure');
title({'Quantile Regression', ...
       ' Study of Household Expenditure on Food', ...
       'Engels 1857'});
axis([0 5000 0 2000]);
hold off;
if (numel(res) > 0)
  fprintf('First 10 Residuals\n');
  fprintf('                              Quantile\n');
  fprintf('Obs.   %6.3f     %6.3f     %6.3f     %6.3f     %6.3f\n', tau);
  for i=1:10
    fprintf(' %3d %10.5f %10.5f %10.5f %10.5f %10.5f\n', i, res(i, 1:5));
  end
else
  fprintf('Residuals not returned\n');
end


g02qg example results


Quantile:  0.100

        Lower   Parameter   Upper
        Limit   Estimate    Limit
  1    74.946   110.142   145.337
  2     0.370     0.402     0.433

Covariance matrix
 3.191e+02 
-2.541e-01  2.587e-04 


Quantile:  0.250

        Lower   Parameter   Upper
        Limit   Estimate    Limit
  1    64.232    95.483   126.735
  2     0.446     0.474     0.502

Covariance matrix
 2.516e+02 
-2.004e-01  2.039e-04 


Quantile:  0.500

        Lower   Parameter   Upper
        Limit   Estimate    Limit
  1    55.399    81.482   107.566
  2     0.537     0.560     0.584

Covariance matrix
 1.753e+02 
-1.396e-01  1.421e-04 


Quantile:  0.750

        Lower   Parameter   Upper
        Limit   Estimate    Limit
  1    41.372    62.396    83.421
  2     0.625     0.644     0.663

Covariance matrix
 1.139e+02 
-9.068e-02  9.230e-05 


Quantile:  0.900

        Lower   Parameter   Upper
        Limit   Estimate    Limit
  1    26.829    67.351   107.873
  2     0.650     0.686     0.723

Covariance matrix
 4.230e+02 
-3.369e-01  3.429e-04 

First 10 Residuals
                              Quantile
Obs.    0.100      0.250      0.500      0.750      0.900
   1  -23.10718  -38.84219  -61.00711  -77.14462  -99.86551
   2  -16.70358  -41.20981  -73.81193 -100.11463 -127.96277
   3   13.48419  -37.04518 -100.61322 -157.07478 -200.13481
   4   36.09526    4.52393  -36.48522  -70.97584 -102.95390
   5   83.74310   44.08476   -6.54743  -50.41028  -87.11562
   6  143.66660   89.90799   22.49734  -37.70668  -82.65437
   7  187.39134  142.05288   84.66171   34.21603   -5.80963
   8  196.90443  140.73220   70.44951    7.44831  -38.91027
   9  194.55254  114.45726   15.70761  -75.01861 -135.36147
  10  105.62394   12.32563 -102.13482 -208.16238 -276.22311
g02qg_fig1.png

Algorithmic Details

By the addition of slack variables the minimization (1) can be reformulated into the linear programming problem
minimize u, v, β + n × + n × p τ eT u + 1 - τ eT v ​   subject to   y = Xβ + u - v (2)
and its associated dual
maximize d yT d ​   subject to   XTd=0, d τ-1,τ n (3)
where e is a vector of n 1s. Setting a=d+1-τe gives the equivalent formulation
maximize a yT a ​   subject to   XT a = 1 - τ XT e , a 0,1 n . (4)
The algorithm introduced by Portnoy and Koenker (1997) and used by nag_correg_quantile_linreg (g02qg), uses the primal-dual formulation expressed in equations (2) and (4) along with a logarithmic barrier function to obtain estimates for β. The algorithm is based on the predictor-corrector algorithm of Mehrotra (1992) and further details can be obtained from Portnoy and Koenker (1997) and Koenker (2005). A good description of linear programming, interior point algorithms, barrier functions and Mehrotra's predictor-corrector algorithm can be found in Nocedal and Wright (1999).

Interior Point Algorithm

In this section a brief description of the interior point algorithm used to estimate the model parameters is presented. It should be noted that there are some differences in the equations given here – particularly (7) and (9) – compared to those given in Koenker (2005) and Portnoy and Koenker (1997).

Central path

Rather than optimize (4) directly, an additional slack variable s is added and the constraint a 0,1 n  is replaced with a+s=e , a i 0 , s i 0 , for i= 1,2,,n .
The positivity constraint on a and s is handled using the logarithmic barrier function
B a,s,μ = yT a + μ i=1 n logai + logsi .  
The primal-dual form of the problem is used giving the Lagrangian
L a,s,β,u,μ = B a,s,μ - βT XT a - 1 - τ XT e - uT a + s - e  
whose central path is described by the following first order conditions
XTa = 1-τ XTe a+s = e Xβ+u-v = y SUe = μe AVe = μe (5)
where A denotes the diagonal matrix with diagonal elements given by a, similarly with S,U and V. By enforcing the inequalities on s and a strictly, i.e., ai>0 and si>0 for all i we ensure that A and S are positive definite diagonal matrices and hence A-1 and S-1 exist.
Rather than applying Newton's method to the system of equations given in (5) to obtain the step directions δβ,δa,δs,δu  and δv , Mehrotra substituted the steps directly into (5) giving the augmented system of equations
XT a + δa = 1-τXTe a + δa + s + δs = e X β + δβ + u + δu - v + δv = y S + Δs U + Δu e = μe A + Δa V + Δv e = μe (6)
where Δa,Δs,Δu  and Δv  denote the diagonal matrices with diagonal elements given by δa,δs,δu  and δv  respectively.

Affine scaling step

The affine scaling step is constructed by setting μ=0 in (5) and applying Newton's method to obtain an intermediate set of step directions
XT W X δ β = XT W y - X β + τ - 1 XT e + XT a δ a = W y-Xβ - X δ β δ s = - δ a δ u = S-1 U δ a - U e δ v = A-1 V δ s - V e (7)
where W = S-1 U + A-1 V -1 .
Initial step sizes for the primal (γ^P) and dual (γ^D) parameters are constructed as
γ^P = σ × min min i , δ a i < 0 ai / δ a i , min i , δ s i < 0 si / δ s i γ^D = σ × min min i , δ u i < 0 ui / δ u i , min i , δ v i < 0 vi / δ v i (8)
where σ is a user-supplied scaling factor. If γ^P × γ^D 1  then the nonlinearity adjustment, described in Nonlinearity Adjustment, is not made and the model parameters are updated using the current step size and directions.

Nonlinearity Adjustment

In the nonlinearity adjustment step a new estimate of μ is obtained by letting
g^γ^P,γ^D = s + γ^P δs T u + γ^D δu + a + γ^P δa T v + γ^D δv  
and estimating μ as
μ = g^γ^P,γ^D g^0,0 3 g^0,0 2n .  
This estimate, along with the nonlinear terms (Δu, Δs, Δa and Δv) from (6) are calculated using the values of δa,δs,δu  and δv  obtained from the affine scaling step.
Given an updated estimate for μ and the nonlinear terms the system of equations
XT W X δ β = XT W y - X β + μ S-1 - A-1 e + S-1 Δ s Δ u e - A-1 Δ a Δ v e + τ - 1 XT e + XT a δ a = W y-Xβ - X δ β + μ S-1 - A-1 δ s = - δ a δ u = μ S-1 e + S-1 U δ a - U e - S-1 Δs Δu e δ v = μ A-1 e + A-1 V δ s - V e - A-1 Δa Δv e (9)
are solved and updated values for δβ,δa,δs,δu,δv,γ^P  and γ^D  calculated.

Update and convergence

At each iteration the model parameters β,a,s,u,v  are updated using step directions, δβ,δa,δs,δu,δv  and step lengths γ^P,γ^D .
Convergence is assessed using the duality gap, that is, the differences between the objective function in the primal and dual formulations. For any feasible point u,v,s,a  the duality gap can be calculated from equations (2) and (3) as
τ eT u + 1-τ eT v - dT y = τ eT u + 1-τ eT v - a - 1 - τ e T y = sT u + aT v = eT u - aT y + 1 - τ eT X β  
and the optimization terminates if the duality gap is smaller than the tolerance supplied in the optional parameter Tolerance.

Additional information

Initial values are required for the parameters a,s,u,v and β. If not supplied by the user, initial values for β are calculated from a least squares regression of y on X. This regression is carried out by first constructing the cross-product matrix XTX and then using a pivoted QR decomposition as performed by nag_lapack_dgeqp3 (f08bf). In addition, if the cross-product matrix is not of full rank, a rank reduction is carried out and, rather than using the full design matrix, X, a matrix formed from the first p-rank columns of XP is used instead, where P is the pivot matrix used during the QR decomposition. Parameter estimates, confidence intervals and the rows and columns of the matrices returned in the argument ch (if any) are set to zero for variables dropped during the rank-reduction. The rank reduction step is performed irrespective of whether initial values are supplied by the user.
Once initial values have been obtained for β, the initial values for u and v are calculated from the residuals. If ri<εu then a value of ±εu is used instead, where εu is supplied in the optional parameter Epsilon. The initial values for the a and s are always set to 1-τ and τ respectively.
The solution for δβ in both (7) and (9) is obtained using a Bunch–Kaufman decomposition, as implemented in nag_lapack_dsytrf (f07md).

Calculation of Covariance Matrix

nag_correg_quantile_linreg (g02qg) supplies four methods to calculate the covariance matrices associated with the parameter estimates for β. This section gives some additional detail on three of the algorithms, the fourth, (which uses bootstrapping), is described in Description.
(i) Independent, identically distributed (IID) errors
When assuming IID errors, the covariance matrices depend on the sparsity, sτ, which nag_correg_quantile_linreg (g02qg) estimates as follows:
(a) Let ri denote the residuals from the original quantile regression, that is ri = yi - xiT β^ .
(b) Drop any residual where ri is less than εu, supplied in the optional parameter Epsilon.
(c) Sort and relabel the remaining residuals in ascending order, by absolute value, so that εu < r1 < r2 < .
(d) Select the first l values where l=hnn, for some bandwidth hn.
(e) Sort and relabel these l residuals again, so that r1 < r2 < < rl  and regress them against a design matrix with two columns (p=2) and rows given by xi = 1, i/n-p  using quantile regression with τ=0.5.
(f) Use the resulting estimate of the slope as an estimate of the sparsity.
(ii) Powell Sandwich
When using the Powell Sandwich to estimate the matrix Hn, the quantity
cn = minσr, qr3 - qr1 / 1.34 × Φ-1 τ + hn - Φ-1 τ - hn  
is calculated. Dependent on the value of τ and the method used to calculate the bandwidth (hn), it is possible for the quantities τ±hn to be too large or small, compared to machine precision (ε). More specifically, when τ-hnε, or τ+hn1-ε, a warning flag is raised in info, the value is truncated to ε or 1-ε respectively and the covariance matrix calculated as usual.
(iii) Hendricks–Koenker Sandwich
The Hendricks–Koenker Sandwich requires the calculation of the quantity di= xiT β^ τ + hn - β^ τ - hn . As with the Powell Sandwich, in cases where τ-hnε, or τ+hn1-ε, a warning flag is raised in info, the value truncated to ε or 1-ε respectively and the covariance matrix calculated as usual.
In addition, it is required that di>0, in this method. Hence, instead of using 2 hn / di  in the calculation of Hn, max 2 hn / di + εu ,0  is used instead, where εu is supplied in the optional parameter Epsilon.

Optional Parameters

Several optional parameters in nag_correg_quantile_linreg (g02qg) control aspects of the optimization algorithm, methodology used, logic or output. Their values are contained in the arrays iopts and opts; these must be initialized before calling nag_correg_quantile_linreg (g02qg) by first calling nag_correg_optset (g02zk) with optstr set to Initialize=g02qg.
Each optional parameter has an associated default value; to set any of them to a non-default value, use nag_correg_optset (g02zk). The current value of an optional parameter can be queried using nag_correg_optget (g02zl).
The remainder of this section can be skipped if you wish to use the default values for all optional parameters.
The following is a list of the optional parameters available. A full description of each optional parameter is provided in Description of the s.

Description of the Optional Parameters

For each option, we give a summary line, a description of the optional parameter and details of constraints.
The summary line contains:
Keywords and character values are case and white space insensitive.
Band Width Alpha  r
Default =1.0
A multiplier used to construct the parameter αb used when calculating the Sheather–Hall bandwidth (see Description), with αb=1-α×Band Width Alpha. Here, α is the Significance Level.
Constraint: Band Width Alpha>0.0.
Band Width Method  a
Default ='SHEATHER HALL'
The method used to calculate the bandwidth used in the calculation of the asymptotic covariance matrix Σ and H-1 if Interval Method=HKS, KERNEL or IID (see Description).
Constraint: Band Width Method=SHEATHER HALL or BOFINGER.
Big  r
Default =10.020
This parameter should be set to something larger than the biggest value supplied in dat and y.
Constraint: Big>0.0.
Bootstrap Interval Method  a
Default =QUANTILE
If Interval Method=BOOTSTRAP XY, Bootstrap Interval Method controls how the confidence intervals are calculated from the bootstrap estimates.
Bootstrap Interval Method=T
t intervals are calculated. That is, the covariance matrix, Σ = σ ij : i,j = 1,2,,p  is calculated from the bootstrap estimates and the limits calculated as β i ± t n - p , 1 + α / 2 σ ii  where t n - p , 1 + α / 2  is the 1 + α / 2  percentage point from a Student's t distribution on n - p  degrees of freedom, n is the effective number of observations and α is given by the optional parameter Significance Level.
Bootstrap Interval Method=QUANTILE
Quantile intervals are calculated. That is, the upper and lower limits are taken as the 1 + α / 2  and 1 - α / 2  quantiles of the bootstrap estimates, as calculated using nag_stat_quantiles (g01am).
Constraint: Bootstrap Interval Method=T or QUANTILE.
Bootstrap Iterations  i
Default =100
The number of bootstrap samples used to calculate the confidence limits and covariance matrix (if requested) when Interval Method=BOOTSTRAP XY.
Constraint: Bootstrap Iterations>1.
Bootstrap Monitoring  a
Default =NO
If Bootstrap Monitoring=YES and Interval Method=BOOTSTRAP XY, then the parameter estimates for each of the bootstrap samples are displayed. This information is sent to the unit number specified by Unit Number.
Constraint: Bootstrap Monitoring=YES or NO.
Calculate Initial Values  a
Default =YES
If Calculate Initial Values=YES then the initial values for the regression parameters, β, are calculated from the data. Otherwise they must be supplied in b.
Constraint: Calculate Initial Values=YES or NO.
Defaults  
This special keyword is used to reset all optional parameters to their default values.
Drop Zero Weights  a
Default =YES
If a weighted regression is being performed and Drop Zero Weights=YES then observations with zero weight are dropped from the analysis. Otherwise such observations are included.
Constraint: Drop Zero Weights=YES or NO.
Epsilon  r
Default =ε
εu, the tolerance used when calculating the covariance matrix and the initial values for u and v. For additional details see Calculation of Covariance Matrix and Additional information respectively.
Constraint: Epsilon0.0.
Interval Method  a
Default =IID
The value of Interval Method controls whether confidence limits are returned in bl and bu and how these limits are calculated. This parameter also controls how the matrices returned in ch are calculated.
Interval Method=NONE
No limits are calculated and bl, bu and ch are not referenced.
Interval Method=KERNEL
The Powell Sandwich method with a Gaussian kernel is used.
Interval Method=HKS
The Hendricks–Koenker Sandwich is used.
Interval Method=IID
The errors are assumed to be identical, and independently distributed.
Interval Method=BOOTSTRAP XY
A bootstrap method is used, where sampling is done on the pair yi,xi. The number of bootstrap samples is controlled by the parameter Bootstrap Iterations and the type of interval constructed from the bootstrap samples is controlled by Bootstrap Interval Method.
Constraint: Interval Method=NONE, KERNEL, HKS, IID or BOOTSTRAP XY.
Iteration Limit  i
Default =100
The maximum number of iterations to be performed by the interior point optimization algorithm.
Constraint: Iteration Limit>0.
Matrix Returned  a
Default =NONE
The value of Matrix Returned controls the type of matrices returned in ch. If Interval Method=NONE, this parameter is ignored and ch is not referenced. Otherwise:
Matrix Returned=NONE
No matrices are returned and ch is not referenced.
Matrix Returned=COVARIANCE
The covariance matrices are returned.
Matrix Returned=H INVERSE
If Interval Method=KERNEL or HKS, the matrices J and H-1 are returned. Otherwise no matrices are returned and ch is not referenced.
The matrices returned are calculated as described in Description, with the algorithm used specified by Interval Method. In the case of Interval Method=BOOTSTRAP XY the covariance matrix is calculated directly from the bootstrap estimates.
Constraint: Matrix Returned=NONE, COVARIANCE or H INVERSE.
Monitoring  a
Default =NO
If Monitoring=YES then the duality gap is displayed at each iteration of the interior point optimization algorithm. In addition, the final estimates for β are also displayed.
The monitoring information is sent to the unit number specified by Unit Number.
Constraint: Monitoring=YES or NO.
QR Tolerance  r
Default =ε0.9
The tolerance used to calculate the rank, k, of the p×p cross-product matrix, XTX. Letting Q be the orthogonal matrix obtained from a QR decomposition of XTX, then the rank is calculated by comparing Qii with Q11×QR Tolerance.
If the cross-product matrix is rank deficient, then the parameter estimates for the p-k columns with the smallest values of Qii are set to zero, along with the corresponding entries in bl, bu and ch, if returned. This is equivalent to dropping these variables from the model. Details on the QR decomposition used can be found in nag_lapack_dgeqp3 (f08bf).
Constraint: QR Tolerance>0.0.
Return Residuals  a
Default =NO
If Return Residuals=YES, the residuals are returned in res. Otherwise res is not referenced.
Constraint: Return Residuals=YES or NO.
Sigma  r
Default =0.99995
The scaling factor used when calculating the affine scaling step size (see equation (8)).
Constraint: 0.0<Sigma<1.0.
Significance Level  r
Default =0.95
α, the size of the confidence interval whose limits are returned in bl and bu.
Constraint: 0.0<Significance Level<1.0.
Tolerance  r
Default =ε
Convergence tolerance. The optimization is deemed to have converged if the duality gap is less than Tolerance (see Update and convergence).
Constraint: Tolerance>0.0.
Unit Number  i
Default taken from nag_file_set_unit_advisory (x04ab)
The unit number to which any monitoring information is sent.
Constraint: Unit Number>1.

Description of Monitoring Information

See the description of the optional argument Monitoring.

PDF version (NAG web site, 64-bit version, 64-bit version)
Chapter Contents
Chapter Introduction
NAG Toolbox

© The Numerical Algorithms Group Ltd, Oxford, UK. 2009–2015