Confusion Matrix: What is it and its Applications

Geetansh Sharma
6 min readJun 5, 2021

What is Confusion Matrix?

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one.

What is Machine learning?

Machine learning is the study of computer algorithms that improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence.

What is Supervised Machine Learning?

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples.

Confusion Matrix Explained through an Example

Given a sample of 12 pictures, 8 of cats and 4 of dogs, where cats belong to class 1 and dogs belong to class 0,

actual = [1,1,1,1,1,1,1,1,0,0,0,0],

assume that a classifier that distinguishes between cats and dogs is trained, and we take the 12 pictures and run them through the classifier, and the classifier makes 9 accurate predictions and misses 3: 2 cats wrongly predicted as dogs (first 2 predictions) and 1 dog wrongly predicted as a cat (last prediction).

prediction = [0,0,1,1,1,1,1,1,0,0,0,1]

With these two labelled sets (actual and predictions) we can create a confusion matrix that will summarize the results of testing the classifier:

In this confusion matrix, of the 8 cat pictures, the system judged that 2 were dogs, and of the 4 dog pictures, it predicted that 1 were cats. All correct predictions are located in the diagonal of the table (highlighted in bold), so it is easy to visually inspect the table for prediction errors, as they will be represented by values outside the diagonal.

In terms of sensitivity and specificity, the confusion matrix is as follows:

Let’s now define these terms based on the answer to the question “Do you see a Dog?”:

  • true positives (TP): These are cases in which we predicted yes (it is a dog), and it is actually a dog.
  • true negatives (TN): We predicted no, and it is actually a dog.
  • false positives (FP): We predicted yes, but it is not actually a dog. (Also known as a “Type I error.”)
  • false negatives (FN): We predicted no, but it is actually a dog. (Also known as a “Type II error.”)

Errors in Confusion Matrix

False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.

You predicted that a man is pregnant but he actually is not.

False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.

You predicted that a woman is not pregnant but she actually is.

Confusion Metrics

From our confusion matrix, we can calculate five different metrics measuring the validity of our model.

  1. Accuracy (all correct / all) = TP + TN / TP + TN + FP + FN
  2. Misclassification (all incorrect / all) = FP + FN / TP + TN + FP + FN
  3. Precision (true positives / predicted positives) = TP / TP + FP
  4. Sensitivity aka Recall (true positives / all actual positives) = TP / TP + FN
  5. Specificity (true negatives / all actual negatives) =TN / TN + FP

Applications of Confusion Matrix

Cyber Attack Detection and Classification using Parallel Support Vector Machine

Support Vector Machines (SVM) are the classifiers that were originally designed for binary c1assification. The classification applications can solve multi-class problems. The result shows that pSVM gives more detection accuracy for classes and comparable to the false alarm rate.

Cyberattack detection is a classification problem, in which we classify the normal pattern from the abnormal pattern (attack) of the system.

The SDF is a very powerful and popular data mining algorithm for decision-making and classification problems. It has been using in many real-life applications like medical diagnosis, radar signal classification, weather prediction, credit approval, and fraud detection, etc.

A parallel Support Vector Machine (pSVM) algorithm was proposed for the detection and classification of cyber attack datasets.

The performance of the support vector machine is greatly dependent on the kernel function used by SVM. Therefore, we modified the Gaussian kernel function in a data-dependent way in order to improve the efficiency of the classifiers. The relative results of both the classifiers are also obtained to ascertain the theoretical aspects. The analysis is also taken up to show that PSVM performs better than SDF.

The classification accuracy of PSVM remarkably improve (accuracy for Normal class as well as DOS class is almost 100%) and comparable to false alarm rate and training, testing times.

KDD CUP ‘’99 Data Set Description

This data set is prepared by Stolfo et al and is built based on the data captured in the DARPA’98 IDS evaluation program . DARPA’98 is about 4 gigabytes of compressed raw (binary) TCP dump data of 7 weeks of network traffic, which can be processed into about 5 million connection records, each with about 100 bytes.

For each TCP/IP connection, 41 various quantitative (continuous data type) and qualitative (discrete data type) features were extracted among the 41 features, 34 features (numeric), and 7 features (symbolic).

To analysis the different results, there are standard metrics that have been developed for evaluating network intrusion detections. Detection Rate (DR) and false alarm rate are the two most famous metrics that have already been used. DR is computed as the ratio between the number of correctly detected attacks and the total number of attacks, while the false alarm (false positive) rate is computed as the ratio between the number of normal connections that is incorrectly misclassified as attacks and the total number of normal connections.

In parallel SVM machine first we reduced non-classified features data by distance matrix of binary pattern. From this concept, the cascade structure is developed by initializing the problem with a number of independent smaller optimizations and the partial results are combined in later stages in a hierarchical way, supposing the training data subsets and are independent among each other.

  • True Positive (TP): The amount of attack detected when it is actually attack.
  • True Negative (TN): The amount of normal detected when it is actually normal.
  • False Positive (FP): The amount of attack detected when it is actually normal (False alarm).
  • False Negative (FN): The amount of normal detected when it is actually attack.

--

--