naginterfaces.library.rand.kfold_​xyw

naginterfaces.library.rand.kfold_xyw(k, fold, x, statecomm, sordx=1, y=None, w=None)[source]

kfold_xyw generates training and validation datasets suitable for use in cross-validation or jack-knifing.

For full information please refer to the NAG Library document for g05pv

https://www.nag.com/numeric/nl/nagdoc_29.3/flhtml/g05/g05pvf.html

Parameters
kint

, the number of folds.

foldint

The number of the fold to return as the validation dataset.

On the first call to kfold_xyw should be set to and then incremented by one at each subsequent call until all sets of training and validation datasets have been produced.

See Further Comments for more details on how a different calling sequence can be used.

xfloat, ndarray, shape , modified in place

Note: the required extent for this argument in dimension 1 is determined as follows: if : ; otherwise: .

Note: the required extent for this argument in dimension 2 is determined as follows: if : ; if : ; otherwise: .

The way the data is stored in is defined by .

If , contains the th observation for the th variable, for and .

If , contains the th observation for the th variable, for and .

On entry: if , must hold , the values of for the original dataset, otherwise, must not be changed since the last call to kfold_xyw.

On exit: values of for the training and validation datasets, with held in observations to and in observations to .

statecommdict, RNG communication object, modified in place

RNG communication structure.

This argument must have been initialized by a prior call to init_repeat() or init_nonrepeat().

sordxint, optional

Determines how variables are stored in .

yNone or float, ndarray, shape , optional, modified in place

Note: the required length for this argument is determined as follows: if : ; otherwise: .

If the original dataset does not include then must be set to None.

Optionally, on entry: , the values of for the original dataset. If , must hold the vector returned in by the last call to kfold_xyw.

On exit, if not None on entry: values of for the training and validation datasets, with held in elements to and in elements to .

wNone or float, ndarray, shape , optional, modified in place

Note: the required length for this argument is determined as follows: if : ; otherwise: .

Optionally, on entry: if , must hold the vector returned in by the last call to kfold_xyw.

On exit, if not None on entry: values of for the training and validation datasets, with held in elements to and in elements to .

Returns
ntint

, the number of observations in the training dataset.

Raises
NagValueError
(errno )

On entry, and .

Constraint: .

(errno )

On entry, and .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: .

(errno )

On entry, .

Constraint: or .

(errno )

On entry, and .

Constraint: if , .

(errno )

On entry, and .

Constraint: if , .

(errno )

On entry, [‘state’] vector has been corrupted or not initialized.

Warns
NagAlgorithmicWarning
(errno )

More than of the data did not move when the data was shuffled. of the observations stayed put.

Notes

Let denote a matrix of observations on variables and and each denote a vector of length . For example, might represent a matrix of independent variables, the dependent variable and the associated weights in a weighted regression.

kfold_xyw generates a series of training datasets, denoted by the matrix, vector, vector triplet of observations, and validation datasets, denoted with observations. These training and validation datasets are generated as follows.

Each of the original observations is randomly assigned to one of equally sized groups or folds. For the th sample the validation dataset consists of those observations in group and the training dataset consists of all those observations not in group . Therefore, at most samples can be generated.

If is not divisible by then the observations are assigned to groups as evenly as possible, therefore, any group will be at most one observation larger or smaller than any other group.

When using the resulting datasets are suitable for leave-one-out cross-validation, or the training dataset on its own for jack-knifing. When using the resulting datasets are suitable for -fold cross-validation. Datasets suitable for reversed cross-validation can be obtained by switching the training and validation datasets, i.e., use the th group as the training dataset and the rest of the data as the validation dataset.

One of the initialization functions init_repeat() (for a repeatable sequence if computed sequentially) or init_nonrepeat() (for a non-repeatable sequence) must be called prior to the first call to kfold_xyw.