Confusion Matrix in Machine Learning


What is Confusion Matrix in Machine Learning?

In the area of machine learning and generally the problem of statistical classification is called a confusion matrix, which is also known as an error matrix.


We can define confusion matrix as a table that is often used to describe the performance of a classification model. It performs the action on a set of test data for which the true values are known. With the help of confusion matrix we can also visualize performance of an algorithm.
It can be easy to identify confusion between classes with the help of confusion matrix. Most performance measures of the classification model are derived from the confusion matrix.

This article aims at:
1. What is confusion matrix and why we need it?
2. How can we calculate a confusion matrix for a 2-class classification problem from scratch?
3. How can we create a confusion matrix in Python?

Confusion Matrix:


A confusion matrix can be defined as a summary of predicted results on a classification problem and often used to describe the performance of classification model.
It summarized the number of correct and incorrect predictions with count values and broken down by each class. This is the key function of the confusion matrix.
It also shows the path in which your classification model is confused when it makes predictions.
With confusion matrix you not only insight into the error but also can distinguish the types of error being made.

 

 

Class 1

(Predicted)

Class 2

(Predicted)

Class 1 (Actual)

TP

FN

Class 2 (Actual)

FP

TN

 

  • Here,
    Class 1 : Positive
    Class 2 : Negative

Definition of the Terms:


• Positive (P) : Observation by the observer is positive (Lets say it is an apple).
• Negative (N) : Observation is Negative (It is not an apple means it must be the other fruit).
• True Positive (TP) :  positive Observation, and it is predicted to be the positive.
• False Negative (FN) : positive Observation , but is predicted negative.
• True Negative (TN) : Negative Observation, and it is predicted to be negative.
• False Positive (FP) : Negative Observation, but it is predicted positive.

Classification Rate or Accuracy:


It is expressed by the relation:

recall

However, there are problems with accuracy. Since it assumes equal costs for both types of errors so there can be the problems with the accuracy. But 99% accuracy can be excellent, good, poor or terrible based upon the problem.

Recall:

Recall in the confusion matrix is the ratio of total numbers of True Positive (TP) to total number of True Positive(TP) and False Negative (FN). High recall means the class is correctly organized (i.e Larger number of TP and Smaller number of (TP+FN))

Recall is expressed by the relation:

recall2

Precision:


Precision is the ratio of the total number of correctly classified positive examples to the total number of predicted positive examples. High Precision denotes an example labeled as positive is indeed positive.

Mathematically precision can be expressed as :

recall3

High recall, low precision: Most of the positive examples are correctly recognized (low FN) but the numbers of false positives are more.

Low recall, high precision:The number of positive examples are not correctly organized and we miss a lot of positive examples (high FN) but those we predict as positive are indeed positive (low FP)

F-measure:

F-measure is the ratio of two times the products of recall and precision to total number of recall and precision.
To calculate an F-measure we use Harmonic Mean in place of Arithmetic Mean as it punishes the extreme values more.
The value of F-Measure will always be close to the smaller value of Precision or Recall.

f-measure

Consider an example in which we have infinite data elements of class B and a single element of class A. The model is predicting class A against all the instances in this test data.
Here,
Precision: 0.0
Recall : 1.0

Now:
Arithmetic mean: 0.5
Harmonic mean: 0.0

With arithmetic mean the outcome is 50% and with harmonic mean the outcome is 0%. 

Example to interpret confusion matrix:

N= 165

Predicted  (NO)

Predicted (YES)

Actual: (NO)

40

10

Actual : YES

5

90

To make easy of the above confusion matrix i have added all the terms like TP,FP,etc and the row and column totals in the below image:

 

Predicted:

NO

Predicted: YES

 

Actual: NO

Tn=60

FP=20

80

Actual: YES

Fn=5

Tp=100

105

 

65

120

 

 

 


Now,
Classification Rate/Accuracy:


accuracy

Recall:

Recall gives us an idea about when it’s actually yes, how often does it predict yes.
Recalll

Precision:

Precsion tells us about when it predicts yes, how often is it correct.
precison

F-measure:


F-measuree

Now lets demonstrates how to create a confusion matrix on a predicted model. But before we need to import confusion matrix module from sklearn library which helps us to generate the confusion matrix.

 

 

OUTPUT

Confusion Matrix :

[[4 2]

 [1 3]

Accuracy Score : 0.7
Report :

                                     Precision          Recall             F1-Score          Support

0 0.80 0.67 0.73 6
1 0.60 0.75 0.67 4
avg/total 0.72 0.70 0.70 10

 

 

Sending
User Review
0 (0 votes)

Leave a Reply