top of page

[python]How to evaluate a model other than MSE in a machine learning regression problem


Introduction


When working on regression problems in machine learning, mse (mean squared error) is commonly used to evaluate the model. However, there are times when you can't tell with mse when you're actually doing it. So, this time, I will introduce a method to evaluate the regression model other than mse.



Weakness of mse


I think the weaknesses of mse are as follows.


  1. Does not show trends by data

  2. Pulled by a few outliers


Let's look at it concretely.


1. Does not show trends by data


Since mse only represents the "mean" of deviation of the prediction and the actual, you do not know the accuracy of the prediction in each data. See Figure 1 below. Both have the same mse, but the deviation is even on the left side, while the deviation is biased only to specific data on the right side.

In another example, data in one category may be reasonably accurate, but in another category it may not be correct at all. If you look only at mse, such a situation will be buried in the "average".


Figure 1 Both have the same mse

2. Pulled by a few outliers


This is also included in 1, but if you use mse, you cannot tell whether there are many small deviations or a small number of large deviations (in Fig. 2 below, mse is the same value for both).

In order to avoid this, outliers are removed in advance to avoid this in kaggle, but in the case of business, this may not be the case. You may not realize that you have forgotten to remove the outliers.


Figure 2 Both have the same mse

What is problem?


You might think, "So what's wrong with that?" If you find out something like the above, you can understand the current state of the model you made and it will be easier to come up with improvement measures.

For example,

  • Is scale conversion and outlier handling appropriate?

  • Isn't there something else in common that should be added to the features?

  • Isn't it data that can be ignored considering its business use?

To supplement the third point, there may be some data for which accuracy is particularly important and some data for which accuracy is not so important. (For example, in an application that predicts the life of a machine, a slight deviation can be tolerated where there is sufficient margin, but prediction when the life is near needs to be accurate.)



Solution


1. Graph of real data vs prediction result


The first possible solution is to draw a graph with real data on the x-axis and predicted values on the y-axis, as shown in Figures 1 and 2.


Below, y_real is the actual data and y_pred is the predicted value.


plt.plot(y_real, y_pred, 'bo')
plt.xlabel('real', fontsize=12)
plt.ylabel('predeiction', fontsize=12)

However, there is no index that shows the accuracy of the model, so the purpose is to give a bird's-eye view of the whole.


2. Graph of prediction error vs quantile


Since it is not possible to know the accuracy from the graph of 1, find the values of 50%, 75%, and 90% from the one with the lowest prediction error and graph it.


First, find the prediction error for each data.


def get_pred_ratio(y_real, y_pred):

    ratios=[]    
    
    for real, pred in zip(y_real, y_pred):
    
      if real!=0:
       ratio=abs(real-pred)/real
         else:
             ratio=abs(real-pred)
                                                                          
         ratios.append(ratio)   
         
    return ratios          

Find each quantile with numpy's percentile function.

y_ratio=get_pred_ratio(y_real, y_pred)
percent=[50, 75, 90]
ratio_p=[]

for p in percent:
    ratio_p.append(np.percentile(y_ratio, p))
    
plt.figure()
plt.plot(percent, ratio_p, 'bo')
plt.xlim([0, 100])
plt.ylim([0, 1])
plt.xlabel('Percent (%)', fontsize=12)
plt.ylabel('Prediction ratio', fontsize=12)


In the end, I use the top two together as follows.

from sklearn.metrics import mean_squared_error as mse
    
def eval_regression(y_pred, y_real):

    y_ratio=get_pred_ratio(y_real, y_pred)
    percent=[50, 75, 90]
    ratio_p=[]
      
    for p in percent:
       ratio_p.append(np.percentile(y_ratio, p))
       
    print('mse: ', mse(y_pred, y_real))
    for p, r in zip(percent, ratio_p):
       print('error {0}%: {1}'.format(p, r))
       
    plt.figure(figsize=(10, 5))
    plt.subplot(1, 2, 1)
    plt.plot(y_real, y_pred, 'bo')
    plt.plot(y_real, y_real, 'r')
    plt.xlabel('real', fontsize=12)
    plt.ylabel('predeiction', fontsize=12)
    
    plt.subplot(1, 2, 2)
    plt.plot(percent, ratio_p, 'bo')
    plt.xlim([0, 100])
    plt.ylim([0, 1])
    plt.xlabel('Percent (%)', fontsize=12)
    plt.ylabel('Prediction ratio', fontsize=12)
    plt.subplots_adjust(wspace=0.7)
    
eval_regression(y_pred, y_real)

Outputs 50%, 75%, and 90% of mse and prediction error, and draws a graph of actual data-prediction data and a graph of prediction error-quantile.



Summary


Don't get me wrong, I'm not totally denying mse. It's just that in some cases it's better to take the above approach.

The contents of this page are summarized below.


  • Looking at mse alone can obscure the predictive tendencies of individual data.

  • Looking at prediction trends for individual data may provide feedback on data processing and feature creation.

  • This is considered to be effective for data with a particularly large amount of features and data with a wide distribution of objective variables.

  • This is especially important in the early stages of training. If the accuracy improves to some extent, I think mse is fine.

  • In the case of business use, it is also effective when considering measures such as what range of data to target, how much AI will do and how much humans will judge.


Methods used the article is uploaded on my github.

Recent Posts

See All

[Python] Conditionally fitting

Overview If you want to do fitting, you can do it with scipy.optimize.leastsq etc. in python. However, when doing fitting, there are many...

Comments


category

Let's do our best with our partner:​ ChatReminder

iphone6.5p2.png

It is an application that achieves goals in a chat format with partners.

google-play-badge.png
Download_on_the_App_Store_Badge_JP_RGB_blk_100317.png

Let's do our best with our partner:​ ChatReminder

納品:iPhone6.5①.png

It is an application that achieves goals in a chat format with partners.

google-play-badge.png
Download_on_the_App_Store_Badge_JP_RGB_blk_100317.png

Theme diary: Decide the theme and record for each genre

It is a diary application that allows you to post and record with themes and sub-themes for each genre.

google-play-badge.png
Download_on_the_App_Store_Badge_JP_RGB_blk_100317.png
bottom of page