Tech Tip: Analysing Binary Data

The analysis of binary data, in which the response is either a yes or a no, is becoming increasingly important in many areas. Examples of binary data are whether or not an insurance claim is made in a given period, whether a lead becomes a customer, or whether a machine fails under different circumstances. The data can be coded as either a 0 or a 1.

The most useful model for this type of data is logistic regression. A logistic regression is a particular case of what is known as a generalised linear model. These models are generalisations of the ordinary regression model that allow different types of data, such as binary data, and different types of links. A link connects the explanatory part of the model to the response. A logistic link means that the fitted values for the model stay between 0 and 1. To fit a logistic regression, use NAG function G02GBF (Fortran Library) / g02gbc (C Library), which is for generalised linear models with binomial data. Set the link parameter appropriately for using a logistic link and set all elements of the denominator array to 1.

Often the explanatory part of the model is given by category variables that define groups: for example, occupation or age in ten-year groups. To add these to the model they need to be converted to a set of 0/1 variables that define the groups (these are often known as dummy variables). These dummy variables can be calculated using the NAG function G04EAF/g04eac.

For specific technical advice in using NAG's products, please contact our technical experts.

Return to Technical Tips & Hints index page.