R programming is an interpreted programming language that is used for statistical analysis and for graph creation in the fields of regression analysis and in data mining. Ross Ihaka and Robert gentlemen of Auckland university founded the R programming language.
R programming language did not use the data types like in the usual programming languages like C, java, etc. Instead of that R programming language uses the data objects that are used for the calculations.
In R programming language it has 6 different types of data objects that are
5. Data frames
In R language if there are is a series of elements that are of the same data type, it is called vectors and the element in the vector is called a component.
The list is the opposite of the vectors such that if the elements are of different data types like strings, numbers, vectors, etc, we call it a list. The list can have another list in it.
Data frame can be called the combination of the matrices and the lists. each different columns have a different datatype in the Dataframe.
A valid variable name can have letters or digits or dots or underscore. We can start a variable with the letter or a period (.). If the variable is starting in a period, we cannot use a digit followed by a period. Finally, R has some reserved words that cannot be used.
As we know the vector in R is of a single dimension and of the same datatype. Matrix is two dimensional which are in rows and columns. In the case of arrays, it can be of any number of dimensions
some of the basic syntaxes in the R programming includes
"#" is used for the comment and the compiler does not process it.
" "" " Quotes are for mentioning the string in R.
" \" Backslash is used to ignore come characters in R programming
In R programming while declaring an object we use the function initialize() to initialize the private data members.
We can use the csv file using the read.csv function in R. It creates a data frame on reading.
Getwd() command helps to get the name of the current directory in R programming.
We use the next statement if we want to skip an iteration in any loop in R.
There are many functions we can use in R programming some of the built in functions include
The base package in R programming is the default package that is loaded when we set up the R environment. It includes functions like input/output, arithmetic calculations, etc.
Logistic regression works on the principle of the probability of a binary variable. We can use the function glm() in R programming for logistic regression.
Suppose we have two vectors participating in an operation that are of different lengths, then the elements of the vector with a short distance will be used again and again to complete the operation which will be called as recycling of vectors.
In R programming there are 3 methods to call the function which will be
In a lazy evaluation function, the arguments are not always evaluated. It will be evaluated only if the arguments are used or referred to or used inside the function body. If the function body doesn't have any reference to the arguments, then it will be ignored which is called lazy evaluation.
We can install a package in R programming with just a single command code given below
In R programming, a random walk is a model which has no variance or a mean. It has a strong time dependence. In a random model, the changes are called white noise or the increments which are the cause of random walk.
In the case of the data analysis, python doesn't have the inbuilt data analysis functionalities but in the case of the R programming, it has the inbuilt functionalities for the data analysis.
If we need the functionalities in python we have to install some packages like panda etc.
There are many companies and websites that are using R programming now and many are turning to R programming, some of them are
R studio is an environment for the development of R programming. It acts as a UI that will help to program the R easily and readily. It is considered more user-friendly than Rgui and it has a lot of features and drop-down menus which help in many ways of customization and programming in R.
In R programming we will use the subset() for getting the observations and variables where the sample() method will get a random sample of user-defined size.
As the name suggests this command is used to install a package in R programming from local directory by selecting the package file.
hist() function is used to make a histogram in R programming.
It is related to the mathematical operations in R programming, the “%%” is used to obtain the remainder of the division whereas the “%/%” will return the quotient of the division.
rm() function is used to remove a vector in r programming.
If we want to use the same function for the different elements of the array, we can use the apply() function in R programming. Example: finding the mean of each row.
In R programming we have a package called “XML” to read the XML file.
In R programming we can update any element in the list but in case of delete, we have the option to delete only the last element in the list.
The basic syntax to create a matrix in R programming is “ matrix(data, nrow, ncol, byrow, dimnames) “
In R programming we have the inbuilt function called boxplot() for creating the boxplot graph. It takes the data frame and formula as input.
By executing a number of commands sequentially we can convert a data object from one form to another form in R programming is referred to as data reshaping. Example: formation of a data frame by merging lists.
This command or a single line code generates random numbers between 0 and 5.
In R programming, With the help of the command “installed.packages()” we can display the list of packages installed.
Both are related to the packages in the R programming. If we use the library() function, it will display an error message if the package is not found. Whereas the require() function will load a warning message if the required package is not loaded.
The t-test() function in R programming is used to find if the mean of two groups is the same or not.
With the with() function in R programming, we can apply an expression to the dataset whereas the by() function will help to apply a function.
Both are related to the output display. If we want to show the output as a list we use
lapply, where the
sapply will show the output as a data frame or a vector.
aggregate() function is used to aggregate the data.
This doBy package in R programming is used to define a table using the model formula and the function.
If we want to create a frequency table in R programming, we can use the function
By using the strsplit command in R programming, it splits the vector into two substrings at the position mentioned. In our problem, it splits the vector ‘x’ into two substrings at position “f”.
x <- "I Love R programming"
split.string <- strsplit(x, " ")
extract.words <- split.string[]
result <- unique(tolower(extract.words))
unlist() command in R programming converts a list into a vector.
x <- pbinom(26,51,0.5)
We can use the function data.frame() for converting the data in JSON to a data frame.
In R programming we can call every matrix as an array but we aren't able to say every array is a matrix.
Matrix is two-dimensional and in the case of an array, it may be any dimension.
In R programming
fitdistr() function is defined in the MASS package which is used to get max likelihood fitting of a univariate distribution.
GGobi is used to make a visualization of high-dimensional data. GGobi in R programming is an open-source program.
We already know about the boxplots or scatterplots, histograms, etc in machine learning. Iplots is a package that helps to get these plots like parallel plots, scatterplots, etc, histograms, etc.
The Lattice() package is helpful in improving the base graphics in R programming. It increases the quality of defaults and also makes the display of multivariate relations easy.
The anova() function in R programming is used in comparing the models that are nested.
cv.lm() function is used for validating the k-fold in R. It is defined in the DAAg package.
stepAIC() function is for the stepwise model selection which is defined in the MASS package.
The leaps() function is defined in the leaps package which is used for performing all subsets regression.
The Relaimpo package in R programming is used to check the relative importance of all predictors in a model.
MANOVA means Multivariate Analysis of Variance, it is used for testing more than one dependent variable simultaneously.
The robust package in R programming provides a library of robust methods that includes the regression too.
In R programming we are using the exponential and ARIMA models. The forecast package gives the functions that are used for selecting these models.
qda() function is for displaying the quadratic discriminant in R language.
If the functions discriminant is based on some centered variable we are supposed to use the
lda() in R to display that discriminant functions.
We know the R language has ARIMA models that are both seasonal and non-seasonal.
auto.arima() function handles both.
If we want to rotate or extract the principal components in R language we are supposed to use the
It is a package available in R language that includes the qualitative and quantitative variables. Also, have the observations and supplementary variables.
We have a command like “?NA” for finding the help page on missing values.
We can use a single line of code to calculate the mean in R language “ sd(n, na.rm=TRUE)
We have the command “col.max(x)” in R
Use the command “data(package = "package_name")”
We have the command “data(package = .packages(all.available = TRUE))”
pairs( formula, data)
Where the formula is the variables that are in a series of pairs and the data is the dataset where we took the variables.
is.matrix(object) will return TRUE then it will be a matrix data object.
We have the function
t() for finding the transpose of any matrix. Here we have to add the matrix name in the braces.
SEM in R programming indicates Structural Equation Modeling.
CFA in R programming indicates Confirmatory Factor Analysis.
Using cluster.stats() function in R programming we can compare two cluster solutions in their similarities using different validation methods. This function is defined in the fpc package in R.
It is used for hierarchical clustering by providing the p-values. It is defined in the pvclust package in R language.
This includes the variable and the functions [wrapper] that is for making the copy of Matlab function calls.
Chi-square test id for checking if any relationship between the categories of two variables. It helps to check the frequency table.
It is called the decision tree forest, it is a classification and regression method that is explained in detail in machine learning tutorials.
The pie chart is like a circle graph in which the components are formed as slices of the circle in various colors. Pie-chart is one of the many packages in the library of R programming for making graphs and charts.
In R programming we can write comments anywhere in the code but start with a preface “#”.
We can create a table in R language using
myTable = data.frame()
We can say that as two different systems of 32 bit and 64 bit. In a 32bit operating system, the memory limit is 2 or 3GB. If the system is 64 bit then it will be up to 8TB.
We can create a variable in R programming using the assignment operator ‘ <- ’.
It is a design that is used to calculate the effect of a sample size. Pwr package is used in R programming for power analysis.
We have different methods to export data using the r programming, some of them are
We are using “NaN” for representing the impossible values in R programming.
For saving an object into a file in R programing we are using the command “ save() “.
order() is used for sorting in R language.