What is Confusion Matrix in Machine Learning?
In the area of machine learning and generally the problem of statistical classification is called a confusion matrix, which is also known as an error matrix.
We can define confusion matrix as a table that is often used to describe the performance of a classification model. It performs the action on a set of test data for which the true values are known. With the help of confusion matrix we can also visualize performance of an algorithm.
It can be easy to identify confusion between classes with the help of confusion matrix. Most performance measures of the classification model are derived from the confusion matrix.
This article aims at:
1. What is confusion matrix and why we need it?
2. How can we calculate a confusion matrix for a 2class classification problem from scratch?
3. How can we create a confusion matrix in Python?
Confusion Matrix:
A confusion matrix can be defined as a summary of predicted results on a classification problem and often used to describe the performance of classification model.
It summarized the number of correct and incorrect predictions with count values and broken down by each class. This is the key function of the confusion matrix.
It also shows the path in which your classification model is confused when it makes predictions.
With confusion matrix you not only insight into the error but also can distinguish the types of error being made.

Class 1 (Predicted) 
Class 2 (Predicted) 
Class 1 (Actual) 
TP 
FN 
Class 2 (Actual) 
FP 
TN 
 Here,
Class 1 : Positive
Class 2 : Negative
Definition of the Terms:
• Positive (P) : Observation by the observer is positive (Lets say it is an apple).
• Negative (N) : Observation is Negative (It is not an apple means it must be the other fruit).
• True Positive (TP) : positive Observation, and it is predicted to be the positive.
• False Negative (FN) : positive Observation , but is predicted negative.
• True Negative (TN) : Negative Observation, and it is predicted to be negative.
• False Positive (FP) : Negative Observation, but it is predicted positive.
Classification Rate or Accuracy:
It is expressed by the relation:
However, there are problems with accuracy. Since it assumes equal costs for both types of errors so there can be the problems with the accuracy. But 99% accuracy can be excellent, good, poor or terrible based upon the problem.
Recall:
Recall in the confusion matrix is the ratio of total numbers of True Positive (TP) to total number of True Positive(TP) and False Negative (FN). High recall means the class is correctly organized (i.e Larger number of TP and Smaller number of (TP+FN))
Recall is expressed by the relation:
Precision:
Precision is the ratio of the total number of correctly classified positive examples to the total number of predicted positive examples. High Precision denotes an example labeled as positive is indeed positive.
Mathematically precision can be expressed as :
High recall, low precision: Most of the positive examples are correctly recognized (low FN) but the numbers of false positives are more.
Low recall, high precision:The number of positive examples are not correctly organized and we miss a lot of positive examples (high FN) but those we predict as positive are indeed positive (low FP)
Fmeasure:
Fmeasure is the ratio of two times the products of recall and precision to total number of recall and precision.
To calculate an Fmeasure we use Harmonic Mean in place of Arithmetic Mean as it punishes the extreme values more.
The value of FMeasure will always be close to the smaller value of Precision or Recall.
Consider an example in which we have infinite data elements of class B and a single element of class A. The model is predicting class A against all the instances in this test data.
Here,
Precision: 0.0
Recall : 1.0
Now:
Arithmetic mean: 0.5
Harmonic mean: 0.0
With arithmetic mean the outcome is 50% and with harmonic mean the outcome is 0%.
Example to interpret confusion matrix:
N= 165 
Predicted (NO) 
Predicted (YES) 
Actual: (NO) 
40 
10 
Actual : YES 
5 
90 
To make easy of the above confusion matrix i have added all the terms like TP,FP,etc and the row and column totals in the below image:

Predicted: NO 
Predicted: YES 

Actual: NO 
Tn=60 
FP=20 
80 
Actual: YES 
Fn=5 
Tp=100 
105 

65 
120 

Now,
Classification Rate/Accuracy:
Recall:
Recall gives us an idea about when it’s actually yes, how often does it predict yes.
Precision:
Precsion tells us about when it predicts yes, how often is it correct.
Fmeasure:
Now lets demonstrates how to create a confusion matrix on a predicted model. But before we need to import confusion matrix module from sklearn library which helps us to generate the confusion matrix.
1 2 3 4 5 6 7 8 9 10 11 12 
# Python script for confusion matrix creation. from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score from sklearn.metrics import classification_report actual = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0] predicted = [1, 0, 0, 1, 0, 0, 1, 1, 1, 0] results = confusion_matrix(actual, predicted) print 'Confusion Matrix :' print(results) print 'Accuracy Score :',accuracy_score(actual, predicted) print 'Report : ' print classification_report(actual, predicted) 
OUTPUT
Confusion Matrix :
[[4 2]
[1 3]
Accuracy Score : 0.7
Report :
Precision Recall F1Score Support
0  0.80  0.67  0.73  6 
1  0.60  0.75  0.67  4 
avg/total  0.72  0.70  0.70  10 