In this tutorial, you will understand basic statistical concepts related to the R program. In R, statistics allows us to analyze, review and summarize the data with the help of some statistical tools available like mean, median, mode, variance, standard deviation and so on the list continues. These are the available built-in functions in R.
Statistics is a branch of mathematics dealing with data collection and organization, analysis, interpretation, and presentation. It is used to process complex problems in the real-world which helps analysts look forward to meaningful trends and changes. Thus statistical learning refers to a set of tools for modeling and understanding the data. Thus statistics helps to collect, analyze and make conclusions from the data.
Statistics include the following procedures listed below:
R is a popular language adopted for data science and statistics. R is also known as statistical computing. The R programming language is used by professionals and data experts for modeling, financial data, marketing trends, and other analysis. Statistics in R is one of the major reasons for users to switch using the R programming language. This is because R has a rich collection of statistical techniques or functions and has sophisticated graphical and visualization capabilities for plots and graphs. Some of the reason that favors users are:
Data analysis is required because we live in a data-rich world. Data is revolutionizing businesses and many other sectors. Analyzing the data provides better insights. Analyst review data so that they can reach meaningful conclusions and several statistics functions, principles, and algorithms are implemented to analyze raw data builds the statistical model,s and infer or predict the result.
The field of statistics has an influence over most of the domains of everyday life such as education, stock market, life science, insurance, Retail, etc.
There are a few statistical terminologies to be aware of before starting with statistics. They are
Note: Statistics is a term used to summarize a process that an analyst uses to characterize a dataset.
In statistical analysis, the basic aspect is to obtain data. The analysis of an event can be done in any of the two ways: Quantitative or Qualitative.
Create another object shirt_sizes together with factor() function with argument as shirts to create shirt_sizes as a factor.
shirt_sizes = factor(shirts)
> shirt_sizes
[1] S M L XL XXL S L
Levels: L M S XL XXL
factor and levels which forms unique values within the factor.
For example consider a vector weight created using c().The vector has the following weights displayed in the output. Weight is created as a vector with numeric data.
weight=c(45.7,30.0,67.4,89.3)
print(weight)
[1] 45.7 30.0 67.4 89.3
So we can say that the vector weight stores quantitative data that are numerics.
Take the example of different sizes of shirts. Create an object as a shirt.
shirts = c("S","M","L","XL","XXL","S","L")
print(weight)
Every element is inside double quotes because they represent character vector.
Let us display the elements of vector shirts.
[1] "S" "M" "L" "XL" "XXL" "S" "L"
Create another object shirt_sizes together with factor() function with the argument as shirts to create shirt_sizes as a factor.
shirt_sizes = factor(shirts)
> shirt_sizes
[1] S M L XL XXL S L
Levels: L M S XL XXL
factor and levels which forms unique values within the factor.
Consider another example that summarizes both concepts, if you order a coffee from a restaurant, it is available in small, medium, or large which is a qualitative analysis. But if a store sells 50 regular coffees in a week it is quantitative analysis because there is a perfect count or number or statistics.
Another example of qualitative data or analysis is categorizing gender based on properties such as male or female, rating of a product, etc which are not actually measured but are categorized on the basis of their properties, attributes, labels, etc.