Tech Tip: Using Data Mining Components
Let us suppose that you have been given the task of predicting which customers will call upon a service: for example, it could be an insurance claim or buying a new product. The techniques available to help you in Data Mining Components include logistic regression, multi-layer perceptron neural networks, decision trees and k nearest-neighbour discrimination. Each has particular strengths and weaknesses. One of the main differences is the way they divide up the data space to predict outcome categories using different decision boundaries. But which is most suitable for your needs?
Logistic regression can handle very large data sets but uses a linear decision boundary method so will have limited flexibility unless you select suitable data transformations. Neural networks are more flexible in that they use complex non-linear decision boundaries but they are more complex to fit and so cannot handle as large a data set. Decision trees are most efficient when most of the predictor variables are categorical variables rather than continuous variables. Here the decision boundary is a series of linear dividers. The k nearest-neighbour method is flexible and simple. It does not rely on an underlying parametric model but it does require both training and predicting data to be available at the same time. Here the decision boundaries are local to the individual case. So depending on what is an appropriate model for dividing up your data space and the amount of data that you are examining you can select the most appropriate technique.
For more information on Data Mining Components please visit www.nag.co.uk/dr.
For specific technical advice in using NAG's products, please contact our technical experts.
Return to Technical Tips & Hints index page.