[python]How to evaluate a model other than MSE in a machine learning regression problem
Introduction
When working on regression problems in machine learning, mse (mean squared error) is commonly used to evaluate the model. However, there are times when you can't tell with mse when you're actually doing it. So, this time, I will introduce a method to evaluate the regression model other than mse.
Weakness of mse
I think the weaknesses of mse are as follows.
Does not show trends by data
Pulled by a few outliers
Let's look at it concretely.
1. Does not show trends by data
Since mse only represents the "mean" of deviation of the prediction and the actual, you do not know the accuracy of the prediction in each data. See Figure 1 below. Both have the same mse, but the deviation is even on the left side, while the deviation is biased only to specific data on the right side.
In another example, data in one category may be reasonably accurate, but in another category it may not be correct at all. If you look only at mse, such a situation will be buried in the "average".
2. Pulled by a few outliers
This is also included in 1, but if you use mse, you cannot tell whether there are many small deviations or a small number of large deviations (in Fig. 2 below, mse is the same value for both).
In order to avoid this, outliers are removed in advance to avoid this in kaggle, but in the case of business, this may not be the case. You may not realize that you have forgotten to remove the outliers.
What is problem?
You might think, "So what's wrong with that?" If you find out something like the above, you can understand the current state of the model you made and it will be easier to come up with improvement measures.
For example,
Is scale conversion and outlier handling appropriate?
Isn't there something else in common that should be added to the features?
Isn't it data that can be ignored considering its business use?
To supplement the third point, there may be some data for which accuracy is particularly important and some data for which accuracy is not so important. (For example, in an application that predicts the life of a machine, a slight deviation can be tolerated where there is sufficient margin, but prediction when the life is near needs to be accurate.)
Solution
1. Graph of real data vs prediction result
The first possible solution is to draw a graph with real data on the x-axis and predicted values on the y-axis, as shown in Figures 1 and 2.
Below, y_real is the actual data and y_pred is the predicted value.
plt.plot(y_real, y_pred, 'bo')
plt.xlabel('real', fontsize=12)
plt.ylabel('predeiction', fontsize=12)
However, there is no index that shows the accuracy of the model, so the purpose is to give a bird's-eye view of the whole.
2. Graph of prediction error vs quantile
Since it is not possible to know the accuracy from the graph of 1, find the values of 50%, 75%, and 90% from the one with the lowest prediction error and graph it.
First, find the prediction error for each data.
def get_pred_ratio(y_real, y_pred):
ratios=[]
for real, pred in zip(y_real, y_pred):
if real!=0:
ratio=abs(real-pred)/real
else:
ratio=abs(real-pred)
ratios.append(ratio)
return ratios
Find each quantile with numpy's percentile function.
y_ratio=get_pred_ratio(y_real, y_pred)
percent=[50, 75, 90]
ratio_p=[]
for p in percent:
ratio_p.append(np.percentile(y_ratio, p))
plt.figure()
plt.plot(percent, ratio_p, 'bo')
plt.xlim([0, 100])
plt.ylim([0, 1])
plt.xlabel('Percent (%)', fontsize=12)
plt.ylabel('Prediction ratio', fontsize=12)
In the end, I use the top two together as follows.
from sklearn.metrics import mean_squared_error as mse
def eval_regression(y_pred, y_real):
y_ratio=get_pred_ratio(y_real, y_pred)
percent=[50, 75, 90]
ratio_p=[]
for p in percent:
ratio_p.append(np.percentile(y_ratio, p))
print('mse: ', mse(y_pred, y_real))
for p, r in zip(percent, ratio_p):
print('error {0}%: {1}'.format(p, r))
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(y_real, y_pred, 'bo')
plt.plot(y_real, y_real, 'r')
plt.xlabel('real', fontsize=12)
plt.ylabel('predeiction', fontsize=12)
plt.subplot(1, 2, 2)
plt.plot(percent, ratio_p, 'bo')
plt.xlim([0, 100])
plt.ylim([0, 1])
plt.xlabel('Percent (%)', fontsize=12)
plt.ylabel('Prediction ratio', fontsize=12)
plt.subplots_adjust(wspace=0.7)
eval_regression(y_pred, y_real)
Outputs 50%, 75%, and 90% of mse and prediction error, and draws a graph of actual data-prediction data and a graph of prediction error-quantile.
Summary
Don't get me wrong, I'm not totally denying mse. It's just that in some cases it's better to take the above approach.
The contents of this page are summarized below.
Looking at mse alone can obscure the predictive tendencies of individual data.
Looking at prediction trends for individual data may provide feedback on data processing and feature creation.
This is considered to be effective for data with a particularly large amount of features and data with a wide distribution of objective variables.
This is especially important in the early stages of training. If the accuracy improves to some extent, I think mse is fine.
In the case of business use, it is also effective when considering measures such as what range of data to target, how much AI will do and how much humans will judge.
Methods used the article is uploaded on my github.
Recent Posts
See AllSummary Data analysis is performed using python. The analysis itself is performed using pandas, and the final results are stored in...
Phenomenon I get a title error when trying to import firestore with raspberry pi. from from firebase_admin import firestore ImportError:...
Overview If you want to do fitting, you can do it with scipy.optimize.leastsq etc. in python. However, when doing fitting, there are many...
Comments