Data science is the process that extracts useful information from the data which is obtained from different sources by using different scientific methods. As we all know there are many complex problems that exist in the real world. In data science, the useful insights which are derived from the data that are collected from different sources are used in order to solve the business problems that exist within a company. In this module let us discuss how to solve a problem in data science in detail.
The first principle of thinking is an approach where new solutions are created by breaking down the problems after identifying the assumptions. Innovative solutions can be made using the first principle thinking approach.
Suppose if a company is facing a problem then the very first step taken will be complex problem identification and it is broken down into smaller parts. Break down process continues until you can't break down any further. Finally, innovative solutions can be made which will help to solve the data science problem.
The traditional approach as well as the first principle approach is used to solve data science problems. Studies say that the first principle approach is the most appropriate and efficient method in order to solve data science problems. The traditional approach is also known as the Analog approach.
The traditional approach or Analog approach always begins with the existing ideas and some improvements are done to the options which are available. Finally, the best option is chosen to solve the problem. The main problem faced by a traditional approach is it won’t solve the core problem.
The first principle approach always identifies the assumptions and it will break down the problems into smaller components in such a way that they can’t be divided any further. Finally, a new solution is created to solve the data science problem. In the first principle approach, most of the time is spent on identifying as well as understanding the problem because once the problem is identified clearly then the proper solution can be generated.
|Traditional Approach||First principle Approach|
Several steps are taken by a data scientist in order to solve a problem in data science.
Determining the problem is the very first step to solving a problem in data science. Problems should be defined properly in order to solve the problem. If the problems are not clear or if it is not defined properly it will be very difficult for each and every data scientist when they work it to find the solutions. So the identified problems should be defined clearly and properly.
Mainly data scientist uses two types of approach
1. Traditional approach(if needed link can be given to the above section )
2. First principle approach
Among these two approaches most commonly used approach is first principle approach. It is because the first principle approach always start with identifying the assumptions and the identified problems are divided into small components . Finally new solutions are created.
Many data science algorithms are used in order to solve a problem in data science. linear regression, logistic regression, decision trees, naïve bayes, KNN, support vector machines , k mean clustering, PCA are some of the common data science algorithms mainly used to solve problems.
When a data scientist identifies a problem they will define the problem properly and clearly then suitable approach is determined. After that the next thing is data collection. The data which is collected should be maintained properly along with the dates on which the data is collected.
The collected data should be analysed properly and cleaning should be done . Data cleaning is a time taking process. Each and every data scientists spend more time for cleaning data. Cleaning data consist of removing the missing values, duplicate records identification and making some corrections if needed.
After data collection and data cleaning the next step is data analysing. In order to analyse the collected data from different sources so many data science libraries are available. If in this stage the selected data science approach is not working then suitable and appropriate approach is again selected.
Once data analysing is done properly then the next step is interpreting the result. In this step the results are interpreted. The main four steps in result interpretation are assembling all the information properly, generate all the findings, conclusions are developed and finally all the recommendations are also developed.
Netflix: It is a subscription-based online platform that is used to watch movies, tv shows, and series with a strong internet connection. So Netflix uses data science in order to solve the problems. Netflix mainly uses collaborating filtering algorithm is mainly used for recommending movies to the Netflix users. The recommendation of movies is based on the movies that the users watched previously. Not only Netflix but also many other social media like YouTube, Hotstar, Facebook, etc use the same method to satisfy customer needs.
Uber: They use the user data which is available to them mainly to improve their customer services. Enter into the uber app and just one click will make the cab reach your destination where you are standing. All this is happing very smoothly because of the hard work which is done by each and every data scientist who works behind that particular company. This application interacts with the customers on one hand and with the drivers on the other hand. Here the data scientists mainly use deep learning, AI, and many other mechanisms to run the business smoothly and efficiently.