How Do KPIs Measure the Success of an AI Project?
First, the success rate of a machine-learning model is what is used to evaluate its performance. Next, we will examine the compatibility between this rate and business objectives. Although not a topic that is often discussed in business media, key performance indicators ( KPIs) for machine learning models are still a critical element of launching an AI project. An IDC study in June 2020 showed that 28% of AI projects failed.
According to the American cabinet, there are three reasons for this: a lack of expertise, lack of relevant data, and lack of integrated development environments. To establish a continuous improvement process for machine learning and avoid getting stuck in a wall, it is important to identify KPIs.
It is upstream that data scientists define technical performance indicators for models. These indicators will vary depending upon the algorithm used. For example, if a regression is used to predict someone’s height as a function of his age, the linear coefficient will be used.
A formula that measures prediction quality. If the square of the coefficient of correlation is zero, then the regression line determines 0%. Conversely, if the coefficient of correlation is 100%, this figure equals 1. This is a prediction of high quality.
Deviation of the Prediction from Reality
The least-squares method is another indicator that can be used to evaluate a regression. It refers to the loss function. This method quantifies an error by computing the square of deviation between the actual and predicted values (see graph below) and then fitting the model by minimizing the squared error. The average absolute error method is also possible. This involves calculating the average value of all the deviations.
“In all cases, this amounts to measuring a gap compared with what we are trying to predict,” says Charlotte Pierron Perles, who is in charge of strategy, data, and at Capgemini Invent, France, the ESN Capgemini advisory board.
False positives and false negativities will need to be screened in the case of spam detection algorithms. “We worked with a cosmetics company to develop a machine-learning solution that would optimize the efficiency of their production lines. Charlotte Pierron Perles explains that the goal was to detect defective bottles early in the production line, which could cause a production interruption.
After talking with the boss and factory operators, we worked with the customer to find a model that fulfilled its purpose, even if that meant detecting false negatives. String input. ”
Three additional indicators, based on false positives or false negatives, allow for the evaluation of classification models:
- The model’s sensitivity is measured by the Recall (R). It’s the ratio of true positives to false positives that are correctly identified (Covid test positive with reason, Covid test incorrectly negative).
- Precision (P) is a measurement of accuracy. It refers to the percentage of true positives that are correct (the Covid test positive for a good cause) in comparison to all other positive results (tests for Covid positive for an excellent reason + tests for Covid positive incorrectly). P = true negatives / true positives + false positives
- F-score is the Harmonic Mean. It measures the model’s ability to make correct predictions and reject them: F = 2x Precision x recall/ Precision + recall
Generalization of the Model
David Tsang Hin Sun is the senior data scientist at French ESN Keyrus. He says, “Once the model’s been constructed, its ability to generalize will be a key indicator.” How do you estimate it? You can measure the difference between what you predicted and what you got, and then track the evolution of that difference over time. After a while, you may encounter a divergence. This could be caused by under-learning or overfitting, note ) because of a data set lacking insufficient training in quality or quantity,” explains David Tsang Sun.
What is the solution? Synthetic minority over-sampling is another technique that can be applied to a classification algorithm. This involves increasing the number of low-incidence examples in the data by oversampling.
Over-learning can lead to divergence. This configuration will allow the model to be trained but not limited to expected correlations. However, because it is too specialized, it will capture field data noise and produce inconsistent results. It will therefore fall into the red in its error function. David Tsang Hin Sun says, “It will then become necessary to review and maybe regularize the weights of the variables.”
The economic KPIs remain. Stephane Roder (CEO of French consulting firm AI Builders) says that we will need to consider whether the error rate is appropriate for the business challenges. Lemonade, an insurer, has created a machine-learning brick that can reimburse customers in three minutes after a claim is submitted.
It uses information, including photos, and it’s based on the claims. He admits that there was an error rate, which could have resulted in a cost. Stephane Roder said: “It’s important to ensure that this measure is in place throughout the lifecycle of the model, especially compared to its TCO (from development to maintenance). ”
Even within the same company, performance expectations can vary. We have created a consumption prediction engine for a French retailer with international standing. Capgemini Invent’s Charlotte Pierron Perles notes that the model’s precision goals were different for new and department store products. The latter’s sales dynamics are dependent on elements that are linked to market reactions, in particular. A different set of algorithms is used to achieve a lower target.
The level of adoption is the last KPI. A model is only as good as its quality if it’s used. Charlotte Pierron Perles insists that this requires the creation of a product with an AI UX (user experience-oriented Artificial Intelligence), which is both accessible to businesses and that fulfills the promise of machine learning. Stephane Roder concluded: “This UX will allow users to give feedback, which will help qualitatively fuel AI knowledge. “