[python]How to evaluate a model other than MSE in a machine learning regression problem

M.R
Aug 20, 2021
3 min read

Updated: Aug 21, 2021

Introduction

When working on regression problems in machine learning, mse (mean squared error) is commonly used to evaluate the model. However, there are times when you can't tell with mse when you're actually doing it. So, this time, I will introduce a method to evaluate the regression model other than mse.

Weakness of mse

I think the weaknesses of mse are as follows.

Does not show trends by data
Pulled by a few outliers

Let's look at it concretely.

1. Does not show trends by data

Since mse only represents the "mean" of deviation of the prediction and the actual, you do not know the accuracy of the prediction in each data. See Figure 1 below. Both have the same mse, but the deviation is even on the left side, while the deviation is biased only to specific data on the right side.

In another example, data in one category may be reasonably accurate, but in another category it may not be correct at all. If you look only at mse, such a situation will be buried in the "average".

2. Pulled by a few outliers

This is also included in 1, but if you use mse, you cannot tell whether there are many small deviations or a small number of large deviations (in Fig. 2 below, mse is the same value for both).

In order to avoid this, outliers are removed in advance to avoid this in kaggle, but in the case of business, this may not be the case. You may not realize that you have forgotten to remove the outliers.

What is problem?

You might think, "So what's wrong with that?" If you find out something like the above, you can understand the current state of the model you made and it will be easier to come up with improvement measures.

For example,

Is scale conversion and outlier handling appropriate?
Isn't there something else in common that should be added to the features?
Isn't it data that can be ignored considering its business use?

To supplement the third point, there may be some data for which accuracy is particularly important and some data for which accuracy is not so important. (For example, in an application that predicts the life of a machine, a slight deviation can be tolerated where there is sufficient margin, but prediction when the life is near needs to be accurate.)

Solution

1. Graph of real data vs prediction result

The first possible solution is to draw a graph with real data on the x-axis and predicted values on the y-axis, as shown in Figures 1 and 2.

Below, y_real is the actual data and y_pred is the predicted value.

plt.plot(y_real, y_pred, 'bo')
plt.xlabel('real', fontsize=12)
plt.ylabel('predeiction', fontsize=12)

However, there is no index that shows the accuracy of the model, so the purpose is to give a bird's-eye view of the whole.

2. Graph of prediction error vs quantile

Since it is not possible to know the accuracy from the graph of 1, find the values of 50%, 75%, and 90% from the one with the lowest prediction error and graph it.

First, find the prediction error for each data.

def get_pred_ratio(y_real, y_pred):

    ratios=[]    
    
    for real, pred in zip(y_real, y_pred):
    
　　　   if real!=0:
　　　　　　　ratio=abs(real-pred)/real
         else:
             ratio=abs(real-pred)
                                                                          
         ratios.append(ratio)   
         
    return ratios

Find each quantile with numpy's percentile function.

y_ratio=get_pred_ratio(y_real, y_pred)
percent=[50, 75, 90]
ratio_p=[]

for p in percent:
    ratio_p.append(np.percentile(y_ratio, p))
    
plt.figure()
plt.plot(percent, ratio_p, 'bo')
plt.xlim([0, 100])
plt.ylim([0, 1])
plt.xlabel('Percent (%)', fontsize=12)
plt.ylabel('Prediction ratio', fontsize=12)

In the end, I use the top two together as follows.

from sklearn.metrics import mean_squared_error as mse
    
def eval_regression(y_pred, y_real):

    y_ratio=get_pred_ratio(y_real, y_pred)
    percent=[50, 75, 90]
    ratio_p=[]
      
    for p in percent:
       ratio_p.append(np.percentile(y_ratio, p))
       
    print('mse: ', mse(y_pred, y_real))
    for p, r in zip(percent, ratio_p):
       print('error {0}%: {1}'.format(p, r))
       
    plt.figure(figsize=(10, 5))
    plt.subplot(1, 2, 1)
    plt.plot(y_real, y_pred, 'bo')
    plt.plot(y_real, y_real, 'r')
    plt.xlabel('real', fontsize=12)
    plt.ylabel('predeiction', fontsize=12)
    
    plt.subplot(1, 2, 2)
    plt.plot(percent, ratio_p, 'bo')
    plt.xlim([0, 100])
    plt.ylim([0, 1])
    plt.xlabel('Percent (%)', fontsize=12)
    plt.ylabel('Prediction ratio', fontsize=12)
    plt.subplots_adjust(wspace=0.7)

eval_regression(y_pred, y_real)

Outputs 50%, 75%, and 90% of mse and prediction error, and draws a graph of actual data-prediction data and a graph of prediction error-quantile.

Summary

Don't get me wrong, I'm not totally denying mse. It's just that in some cases it's better to take the above approach.

The contents of this page are summarized below.

Looking at mse alone can obscure the predictive tendencies of individual data.
Looking at prediction trends for individual data may provide feedback on data processing and feature creation.
This is considered to be effective for data with a particularly large amount of features and data with a wide distribution of objective variables.
This is especially important in the early stages of training. If the accuracy improves to some extent, I think mse is fine.
In the case of business use, it is also effective when considering measures such as what range of data to target, how much AI will do and how much humans will judge.

Methods used the article is uploaded on my github.

[python]How to evaluate a model other than MSE in a machine learning regression problem

Introduction

Weakness of mse

Solution

Summary

Recent Posts

Comments

category

article

Make a "don't forget to add to list" shopping list app with Flutter + Raspberry pi

I made a towel exchange monitoring app with Flutter and Raspberry Pi

[Flutter] Manage status by linking Firestore and Redux

[python] Visualize data and grasp correlation at the same time

Let's do our best with our partner: ChatReminder

It is an application that achieves goals in a chat format with partners.

Let's do our best with our partner: ChatReminder

It is an application that achieves goals in a chat format with partners.

Theme diary: Decide the theme and record for each genre

It is a diary application that allows you to post and record with themes and sub-themes for each genre.

Introduction

Weakness of mse

Solution

Summary

Comments

category

article

Make a "don't forget to add to list" shopping list app with Flutter + Raspberry pi

I made a towel exchange monitoring app with Flutter and Raspberry Pi

[Flutter] Manage status by linking Firestore and Redux

[python] Visualize data and grasp correlation at the same time

Let's do our best with our partner:​ ChatReminder

It is an application that achieves goals in a chat format with partners.

Let's do our best with our partner:​ ChatReminder

It is an application that achieves goals in a chat format with partners.

Theme diary: Decide the theme and record for each genre

It is a diary application that allows you to post and record with themes and sub-themes for each genre.

Let's do our best with our partner: ChatReminder

Let's do our best with our partner: ChatReminder