Calculating Accuracy of an ML Model.

Abhigyan
Analytics Vidhya
Published in
5 min readMar 17, 2020

--

Before moving on with Accuracy Metrics, Let’s know about the types of learning involved in ML.

Supervised Learning — It is a learning technique in which the dependent variable is mentioned and we have to predict according to it.Supervised learning are of two types:

  1. Regression : This type has it’s dependent variable as an continous feature. Example: House price data.(In which we have to predict House prices).
  2. Classification : This type has it’s dependent variable in a binary or multiclass feature. Example: Titanic data.[In which we have to predict whether a person survived(1) or not(0)]

Un-Supervised Learning — It is learning technique in which the dependent variable is not mentioned.This learning technique is usually used in real-life problems with the e-commerce or finance,etc, companies to target the most valued customers.

Semi-Supervised Learning — As the name suggests,this technique includes both Supervised and Un-Supervised techniques.The data contains both some amount of labelled data but mostly un-labelled.To understand more about semi-supervised learning check out this link.

Photo by Andy Kelly on Unsplash

Now that we know about ML,Let’s dive in to metrics used to calculate accuracy in Supervised Learning.

How to calculate Accuracy in Supervised learning?

Photo by Tim Gouw on Unsplash

Accuracy for regression and classification are calculated with different approaches.

For Regression Model:

  1. Squared error(SE).

Where, Ei = actual - predicted

2. Mean Square error(MSE).

3. Root Mean Square error(RMSE).

4. Relative Mean Square error(rMSE).

5. Mean absolute percent error(MAPE).

6. R-Squared.

7. Absolute error(AE).

8. Mean Absolute error(MAE).

For Classification Model:

1. Confusion Matrix.

A confusion matrix is a table that helps visualise the performance of a classification model.It can be used to calculate Precision,Sensitivity(aka recall),Specificity and accuracy.

Definition of the Terms:

  • True Positive (TP) : Observation is positive, and is predicted to be positive.
  • False Negative (FN) : Observation is positive, but is predicted negative.
  • True Negative (TN) : Observation is negative, and is predicted to be negative.
  • False Positive (FP) : Observation is negative, but is predicted positive.

Precision = TP/(TP+FP)

Sensitivity(recall)=TP/(TP+FN)

Specificity=TN/(TN+FP)

Accuracy=(TP+TN)/(TP+TN+FP+FN)

2. ROC AUC.

AUC means Area Under Curve,which is calculated for the ROC curve.

An ROC curve is a graph plotted between Sensitivity and False positive rate.The closer the value of AUC is to 1 ,the more the model is developed.It can be calculated using functions in both R and Python.

But For those who wants a clear rudimentary understanding,Look below otherwise skip the part.

T = (1*SE)/2 = SE/2 = TP/2*(TP+FN)

U = (SP*1)/2 = SP/2 = TN/2*(TN+FP)

Getting the AUC,

AUC= T+U = (SE+SP)/2

3. F1-Score.

The F measure (F1 score or F score) is a measure of a test’s accuracy and is defined as the weighted harmonic mean of the precision and recall of the test.

The F score is used to measure a test’s accuracy, and it balances the use of precision and recall to do it. The F score can provide a more realistic measure of a test’s performance by using both precision and recall. The F score is often used in information retrieval for measuring search, document classification, and query classification performance.

4. Gini co-efficient.

Gini is most commonly used for imbalanced datasets where the probability alone makes it difficult to predict an outcome.

Gini is measured in values between 0 and 1, where a score of 1 means that the model is 100% accurate in predicting the outcome. A score of 1 only exists in theory. In practice, the closer the Gini is to 1, the better. Whereas, a Gini score equal to 0 means the model is entirely inaccurate. To achieve a score of 0, the model would have to ascribe random values to every prediction.

GINI = AUC*2–1

Like my article? Do give me a clap and share it,as that will boost my confidence.Also,I post new articles every sunday so stay connected for future articles of the basics of data science and machine learning series.

Also,if you want then connect with me on linkedIn.

Photo by Morvanic Lee on Unsplash

--

--