Classification in Machine learning

August 23, 2021, Learn eTutorial

1092

We already discussed supervised learning that is divided into regression and classification. We learned about regression and the methods of regression in previous tutorials. Now we are going to discuss classification algorithms.

What is a classification algorithm?

Classification algorithms work on the principle of supervised learning as it needs a training dataset for training the model. In the classification algorithm, we are classifying the data into different categories based on a training dataset.
For example, we want to classify the emails as spam or not spam or in simple words, we have to classify according to the sex as male and female, Alternatively Yes or No, etc.
When compared to regression the output of the classification algorithm is different which will be a category that uses the supervised method of train dataset to predict an output. The classification algorithm can be represented using the formula,

y=f(x), where y = categorical output

Using a simple picture we can learn more about the classification whose main objective is to classify the data into different categories. In the below picture there are two classes that have different properties and features. So our objective is to separate the data that have similar properties into one and other similar properties into another class.

Types of classifications

Classifications can be divided into two types and before that, we have to understand the term classifier. A classifier is nothing but the algorithm that is used in the dataset to classify the data.

Binary Classifier: As the name suggests if the classification has only two outcomes, it comes under binary classifier. For example men or women, yes or no, 0 or 1, etc.
Multi classifier: If the outcome of a classification algorithm has more than two outcomes then it is called the multi classifiers. For example colors, shades, shapes , etc.

Types of Machine learning classifications

Classification algorithms are broadly divided into two types

Linear models
Non-Linear models

Linear models

The linear model can be divided into

Logistic regression
Support Vector Machines

Nonlinear models

Nonlinear models can be are of different types which are

K-nearest Neighbours
Kernel SVM
Naïve Bayes
Decision Tree Classification
Random Forest classification

Learners in Classification

Lazy Learners
Eager Learners

Lazy Learners

As the name suggests in lazy learners the train dataset will be stored and that will wait until the real data enter and do the classification and training the model. Here classification is based on the related data stored in the training dataset. In lazy learners it will take more time for predictions but less time for training. Examples are the K-NN algorithm and case-based reasoning.

Eager Learners

Eager learners make the model as soon as they got the training dataset. It does not wait for the test dataset as we saw in the lazy learners. It will make the model before the test data arrives. For example Decision trees, Naïve Bayes, ANN, etc.

Applications of classification algorithms

Biometric identification
Drugs classification
Color and shape classification
Speech recognition
Spam detection
Fraud identification

Evaluation of classifier models

In supervised learning, both the classification and regression models need to get evaluated once it is completed. In the classification model, we have three types of evaluation which are

1. Holdout method

It is the most common method used in the supervised learning method for measuring the accuracy of a classifier. We will divide the dataset in to train the dataset and the test dataset.

After that, we provide the model with the training dataset which has the dataset and their corresponding category. Then the model will get learned with the training dataset. Then we provide the test data set to the model which has the dataset but not a corresponding category. The model has to predict the category of the test dataset accurately.

2 Confusion Matrix

Confusion matrix or the error matrix will get the output in the form of a matrix that will describe our model performance. The matrix rows and columns contain the result in short format, which contains how many correct and incorrect predictions. Check the matrix below to get an idea.

3.Log loss

Log loss is a method that is perfect for the binary classification model which means the output will be between the number zero and one. In this method, we assume that lower the log loss value means high accuracy because log loos value increases if the difference in the predicted value and real value is very large. In binary classification, the cross-entropy is calculated as
?(ylog(p)+(1?y)log(1?p))

Where the ‘y’ is actual output and ‘p’ is the predicted output

Learn Machine Learning

Introduction to Machine Learning

Applications of Machine learning

Why Machine Learning?

The Machine Learning Workflow

Data Visualization and Exploratory Data Analysis

Data processing

Trends in Machine Learning 2021

Machine learning models

Introduction to Supervised Learning

Regression Analysis

Introduction to cost functions

Linear Regression

Feature Selection

Outliers

Regularization

Polynomial Regression

Splines

Classification

Logistic Regression

Discriminant Analysis

Support Vector Machines

Naive Bayes

Clustering

K means Clustering

Ensemble Learning

Classification and Regression Trees

Bagging & Random Forests

Boosting

Unsupervised learning

Classification in Machine learning

What is a classification algorithm?

Types of classifications

Types of Machine learning classifications

Linear models

Nonlinear models

Learners in Classification

Lazy Learners

Eager Learners

Applications of classification algorithms

Evaluation of classifier models

1. Holdout method

2 Confusion Matrix

3.Log loss

Related Tutorials

OtherTutorials

Python

Python

C

C

Java

Java

Machine Learning

Machine Learning

R

R

PHP

PHP

Golang

Golang

Artificial Intelligence

Artificial Intelligence

HTML

HTML

Cyber Security

Cyber Security

C++

C++

Data Science

Data Science

Join Us