Tutorial Study Image

Machine Learning Interview Questions

Machine learning helps the systems to automatically learn from the experience without being programmed. It is the application of artificial intelligence. with the help of machine learning, we can make the computer programs access the data and use that data to learn from it. Machine learning makes the computer sense to data similar to humans.

Machine learning makes a system able to decide (Make decisions) from the data, similar to humans. With the help of machine learning, we can perform real world tasks, and can do problem-solving, and helpful in automation.

1. Quality of data: Machine learning needs high-quality data. With low-quality data, it causes error decisions.

2. Time lagging: Machine learning needs a lot of time to make decisions on data.

3. Machine learning uses complex algorithms that make it difficult to deploy.

  • Emotional analysis
  • Error detection and correction
  • Weather prediction
  • Market prediction
  • Speech recognition
  • Fraud detection
  • The recommendations in online shopping
1. Supervised learning
2. Unsupervised learning
3. Semi-supervised learning
4. Reinforcement learning
5. Online learning
6. Instance-based learning
7. Model-based learning

Supervised learning works on the principle that it tries to learn a connection between the input data and the output results with each data sample. It is the most common type of algorithm used in machine learning. examples: decision tree, KNN, random forest, etc.

In unsupervised algorithms, we do not have any rules or guidelines, which means there will not be a given pattern, the model itself has to find the hidden pattern from the given data. It is the opposite of supervised learning algorithms.

In supervised learning algorithms, it needs predefined data to learn the relation and predict the result. In Unsupervised learning, there will be no labeled data so the machine has to find the hidden pattern from the input data.

Type I error means a false positive which means it claiming that something happened but in reality, it has not happened.

Type II error means a false negative which means it claiming nothing has happened but in reality, it happened.

Data mining: It is the process where the system or data trying to extract the patterns from the data using machine learning algorithms.

Machine Learning: It is the development of programs that help the system to learn from data without being programmed.

P-value is used to determine the importance of the statistical test. The value of P will be between 0 and 1 which help the users to determine conclusions.

The answer is 'no' because in some cases it reaches the local minimum, so in all cases, we cant reach the global point. It depends on the data and conditions.

Python is good for text data analysis because python has a panda library that provides easy-to-use, fast, flexible, and powerful data analysis and manipulation tools.

It is the method of choosing models among diverse mathematical models and which are used to define the same data is known as Model Selection. In the fields of statistics, data mining, and machine learning  Model learning is used.

  1. Classification
  2. Speech Recognition
  3. Regression
  4. Predict Time Series
  5. Annotate Strings

This is the technique that used in unsupervised learning. If we have a set of data points, then we can use the clustering algorithm. To classify all the data points into their particular groups this technique will help us.

  1. To find clusters of the data
  2. To find low-dimensional representations of the data
  3. To find interesting directions in data
  4. To find novel observations/ database cleaning
  5. To find interesting coordinates and correlations
  1. Decision Trees
  2. Neural Networks (back propagation)
  3. Probabilistic networks
  4. Nearest Neighbor
  5. Support vector machines

Classification is used for the prediction of discrete class labels. It involves the identification of values that lie in a particular group. Classification problem example is, classifying an email as spam or non-spam.

The regression method entails predicting a response value from a consecutive set of outcomes. Hence a regression problem needs the prediction of a quantity also. A regression problem example is predicting the price of a stock over a period of time.

PAC (Probably Approximately Correct) learning is a learning framework. It was used to analyze learning algorithms and their statistical efficiency.

It is a type of modelling error that results in the failure to predict future observations or can add additional data to the existing model. It happens when a function is fit to a limited set of data points and may end with more number of parameters.

 It is a regression method or can regularize the coefficient estimates towards zero. For avoiding the risk of overfitting, it reduces flexibility and discourages learning in a model. Regularization reduces the model complexity and makes it better for prediction.

ILP ( Inductive Logic Programming) uses logic programming and is a part of machine learning. It can be used to build predictive models for searching patterns in data, at this time the programs are assumed as a hypothesis.

Precision can be a positive predictive value. Among the received instances it is the fraction of relevant instances.

Recall also known as sensitivity is the fraction of relevant instances that have been retrieved over the total amount or relevant instances.

  1. Combining binary classifiers
  2. Modifying binary to incorporate multiclass learning
  • Bagging is a process in ensemble learning. For improving, unstable estimation or classification schemes bagging can be used
  • Boosting methods are used sequentially to reduce the bias of the combined model.

Cluster Sampling is used to select random intact groups within a defined population, and they share similar characteristics. The sampling unit is a collection or cluster of elements where the Cluster sample is a probability.

Bayesian Networks are used to represent the graphical model for probability relationships among a set of variables.This is also known as 'belief networks' or 'casual networks'.Dynamic Bayesian networks are the variables that relate to Bayesian networks.

  1. Logical : It has a set of Bayesian Clauses, that can capture the qualitative structure of the domain.
  2. Quantitative: It is used to encode quantitative information about the domain.

ARM finds associations and relationships among large sets of data items. And helps to discover the patterns in data like features that occur together and features (dimensions) that are correlated. In Market-based Analysis, how to find frequently an item set that occurs in a transaction is an example situation where ARM is used.

It is the situation when our data has too many features. If we have more features than observations, we have a risk of overfitting the model. The term is used to express the difficulty of using brute force or grid search to optimize a function with too many inputs.

To measure the accuracy of a hypothesis function we are using a cost function, which is denoted by J.

Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. The name forest because it is a collection of decision trees and each tree built from a sample of data. The result of the Random forest is the mean prediction of each individual tree.

Local Minima is the smallest value of the function. 

When the form of our hypothesis function(h) maps poorly to the trend of the data, this is called Underfitting, or high bias.

It happens when a hypothesis function fits the available data but does not generalize well to predict new data and it is called Overfitting, or high variance.

1. Makes training faster.
2. Prevents from getting stuck in local optima.
3. It gives a better error surface shape.
4. Weight decay and Bayes optimization can be done more conveniently.

The process in which words are reduced to a root by removing unnecessary characters and allows us to map related words to the same stem.

Dimension reduction helps to reduce the number of random variables under some consideration. It is divided into feature selection and extraction.

It is an algorithm technique of Machine Learning. This algorithm technique helps to search for the best suitable method or path it should follow in a specific situation so this technique used by different software and machines. It learns based on the reward or penalty of each performed action.

These both are errors. The errors due to erroneous or overly simplistic assumptions in the learning algorithm are called Bias errors. This error leads to the model under-fitting the data and make high predictive accuracy very hard

 The error due to too much complexity in the learning algorithm is called Variance. This error leads to the model overfit the data             

Genetic Programming is a subset of machine learning. It implements an algorithm to resolve a user-defined task for that it uses random mutation, a fitness function, crossover, and multiple generations of evolution.

  • An array is a group of elements of a similar data type. In this elements are stored consecutively in the memory. And it supports Random Access.
  • Linked List is an ordered group of elements of the same type, which are connected using pointers. In this new elements can be stored anywhere in memory. And it supports Sequential Access.

Confusion Matrix is also known as the error matrix. It is a table used for summarizing the performance of a classification algorithm.

  1. Computer Vision
  2. Speech Recognition
  3. Data Mining
  4. Statistics
  5. Informal Retrieval
  6. Bio-Informatics
  1. Platt Calibration
  2. Isotonic Regression

These are the two methods used for the best prediction of probabilities in Supervised Learning. It is created for binary classification, and it is not trivial.

Perceptron is an algorithm for supervised classification. In this, the classification of the input is done into one of several possible non-binary outputs.

The process of strategically generating and combining multiple models such as classifiers or experts for solving a particular computational program is called ensemble learning.

  1. Sequential ensemble methods
  2. Parallel ensemble methods

Heat maps give visual representations of data. This visual representation consisting matrices with colours. Here two dimensions of the data are captured by the location of a point, a third dimension is represented by the colour of the point.

Feature extraction is a method to transform or project the data onto a new feature space. In the context of dimensionality reduction, it can be described as an approach to data compression with the goal of keeping the most relevant information.

  1. Agglomerative hierarchical clustering
  2. Divisive hierarchical clustering

Recurrent Neural Networks (RNNs) are feedforward neural networks with feedback loops or backpropagation through time. In this, before deactivating the neurons are fire for a limited amount of time. And these neurons activate another set of neurons that fire the next point in time.

Residual plots are a commonly used graphical analysis for diagnosing regression models. It helps regression models to detect non-linearity and outliers and to check if the errors are randomly distributed or not.

A hypothesis is a specific model that helps in mapping inputs to outputs. This can further be used for evaluation and prediction.

In Machine Learning Entropy measures the randomness in the processed data. It is more difficult to take a conclusion from the data if there is more entropy in the given data.

Logistic regression is a technique for predictive analysis,  it is employed to predict the probability of a categorical dependent variable. Logistic regression also helps to explain data and the relationship between one dependent binary variable and one or more independent variables.

  1. Binary Logistic Regression: In this only two outcomes possible.
  2. Multinomial Logistic Regression: In this, the output consists of three or more unordered categories.
  3. Ordinal Logistic Regression: In this, the output consists of three or more ordered categories
  1. Data Acquisition
  2. Ground Truth Acquisition
  3. Cross-Validation Technique
  4. Query Type
  5. Scoring Metric
  6. Significance Test

 An epoch indicates the number of passes of the entire training dataset the machine learning algorithm has completed. In the case of huge data, datasets are divided into several batches and each of these batches goes through the given model this process is known as iteration.

ROC (Receiver Operating Characteristic) curves give the graphical representation of trade-offs between True and False positive rates. Hence it gives the idea about the accuracy of the model.

  1. Inductive learning: It is the process of using observations to make the conclusions 
  2. Deductive learning: It is the process of using conclusions to form observations 
  1. Entropy: it is an indicator of how messy your data is. when we reach closer to the leaf node entropy started to decrease.
  2. The Information Gain: It is based on the decrease in entropy after a dataset is split on an attribute. when we reach closer to the leaf node entropy started to increase.

During model training, a tuning parameter that determines the step size of each iteration or epoch is called the learning rate. Also learning rate is how fast to update the neurons' weights. If the learning rate is high, thus the model weights are updated fast and if the learning rate is low the model weights are updated slowly.

Precision:  In pattern recognition Precision is the fraction of relevant instances among the retrieved instances.The situation where precision is used when False Positive is important to our output.

Recall:  In pattern recognition Recall is the fraction of relevant instances that were retrieved. The situation where Recall is used when False Negative is important to our output.


  1. Train the model
  2. Test the model 
  3. Deploy the model

False positives: The cases that get classified as wrongly True but are False. 

False negatives: The cases that get classified as wrongly False but are True.

It is a type of unsupervised learning technique. Association checks for the dependency between data items. It maps the data items according to the dependency and make it more profitable. Association rule will be of three types.

  1. Apriori
  2. Eclat
  3. F-P Growth Algorithm
  1. Predicting yes or no
  2. Estimating gender
  3. Breed of an animal
  4. Type of color
Popular Programs