Show Menu

Is your AI model Overfitting or Underfitting?

There’s enormous hype about machine learning for use in business with a number of vendors offering a myriad of solutions.  However, not always is it easy to tell a good vendor from a bad one, especially if you are non-technical in the field.

In theory, machine learning is a set of functions or algorithms that learn behaviours and patterns from existing datasets, to make predictions on unseen data. The function is chosen to meet the demands of the business problem it is solving and could be anything ranging from a simple linear equation to a series of complex sigmoids. Whatever be the learning method or algorithm, it is important for business users that every machine learning model is generalizable.

It is expected that a model will perform well on the historical data that it has been trained on. This means that if a model has seen that Sarah who is 32 years old, lives in Uxbridge, London, and earns £70k annually repays her loans timely, the model will predict that Sarah will make timely repayments when asked about her credit worthiness.

However, for a good model, this is not enough. Whether a model is generalizable or not is an important check for the model’s performance on previously unseen data that is not significantly different from the training data set used for the model build. This means that the model should also be able to provide a view on Tom’s credit worthiness though it might not have seen Tom’s data while learning.

This idea of generalization is so core to machine learning that it dictates whether your business will benefit from a model or not. This is especially true in case of banks and financial institutions where non-generalizable models i.e. overfit or underfit models, can cause significant monetary losses.

Why should a business worry about non-generalizable models?

Let us consider an example of a hedge fund using a machine learning model to predict when to buy or sell products. The model would be usually trained on past data that the fund has access to and it would be used for future trades. Now, if the model does not generalise well to the data that the market will produce tomorrow, prices and times to trade suggested by the model could be way off the profitable mark. In such cases, the business could lose significant sums of money trading with such a model. So before letting an AI make partial or complete decisions in your business, you always have to be doubly sure of the fit of the model i.e., its generalizability.

Why are some models not generalizable?

While machine learning is a branch of computer science, the basic concept of all machine learning model training lies in foundational statistical and mathematical approaches. A good fit in statistical terms implies how well your model approximates the target function. However, in statistics, we always have an idea of the target function before fitting the model. This is not always the case for machine learning models. Machine learning is used for much messier data sets where pattern recognition is much harder.

Overfitting and underfitting

Training data which is noisy (could have trends and errors relating to seasonal cycles, input mistakes etc.) is used to train models and often the model not only learns the variables that impact the target but also the noise i.e. the errors. When a model learns the data along with the noise, so much so that it no longer can pick up the random fluctuations from unseen datasets, it is said to be overfitted. Though the model fits perfectly to the data that it has been trained on, it performs poorly on new data, which beats the very purpose of the model build.

Overfitting can happen when we use algorithms like decision-trees (e.g. to predict customer defaults on a loan) and we don’t sufficiently prune the tree or limit its depth. Such a model can go into creating a model for each point in the dataset and will no longer be generalizable.

While overfitting performs well in terms of accuracy on the training set, an underfit model will perform poorly even on the data it has been trained on. Underfitting is a lot easier to identify than over fitting. Therefore, the data scientist building the model will very quickly identify the problem and try alternative algorithms to fix an underfit model. Because of the ease of spotting an underfit model, underfitting is rarely discussed in machine learning.

How do you spot overfitting?

In a word, it is hard! Both for statistical and machine learning models, good practice dictates setting aside a “test” or a hold-out sample on which the model results can be validated before using the model for business purposes. If the model performs well on the data it has been trained on but performs poorly on the test set, it is usually a sign that the model is overfit.

The industry uses cross-validation as a gold standard for estimating accuracy on unseen data. K-fold cross-validation techniques are common for testing model performance and especially used as checks against overfitting models.

How do you identify the sweet spot of good fit?

Every data scientist will have a different opinion and solution to the perfect fit. However, some suggest looking at the learning curve of the model and checking at what points the model stops learning for the train and the test datasets. The sweet spot of good fit is usually just before the errors start increasing on the test set for the model. We should however, caution that early stopping in machine learning would be a part of the algorithm. To ensure that train-test hygiene is maintained, the algorithm should not have access to the test data during training, when early stopping is used. Run multiple iterations of your model on unseen data and make sure that the model generalises well before using it for your business purposes.

If you are not a machine learning expert, how can you be sure that you are being sold a model that has a good fit?

There is no easy answer to this and it is important that you work with a vendor you can trust. It is important that the vendor uses the optimal machine learning stack, algorithms and also understands the nuances of model fit and purpose, before selling you a model. Experience in building and deploying machine learning solutions is also important while choosing a vendor. Data science talent is scarce and non-standard in the industry, and you should make sure you get the best value for your money when working with a vendor. Ask them about generalisable models and the replies you get will point you towards the calibre of the team that you are dealing with.