In this tutorial, you will familiarize yourself with some of the most useful R packages. R is the popular language for data science. In the previous tutorial, we learned about repositories and their related installation. More than 16,000 packages are available in CRAN. We are not able to explain all the packages but some of the most used libraries by data scientists for their everyday activities are mentioned in this tutorial.
The R packages allow the data scientists to manipulate the dataset, visualize the data, support dealing with data types, object types, and structures, creating reports, interactive applications and models, and machine learning.
The R packages used for data sciences are as follows:
Dplyr is a part of the Tidyverse framework of packages and a basic package sufficient for all data manipulation. Some of the functions provided by dplyr package are listed in the below table.
This is an R package that introduces a new structure called data tables.It has a different syntax from Tidyverse .When working with big datasets data.table is more convenient than dplyr.
This package is from Tidyverse but different from dplyr. The key focus is to get the data in tidy format. A tidy dataset means a data set that satisfies three conditions such as
Every column is variable Every row is an observation Every cell is a single value.
The key function are pivot_longer() and pivot_wider().The pivot_longer() supports to move from many columns to many rows and the reverse function happens for pivot_wider().There are some other function too for separating or uniting columns and for dealing with explicit and implicit missing data.
gg stands for grammar for graphics. The ggplot2 is an essential framework for simplifying any graph for that you need some basic components such as data, coordinate mapping system and objects
plotly has implementation in python and R.The plotly is different from ggplot2 takes to next level of visualization and is dynamic.
This package provides a couple of different things such as helps to apply functions to multiple different elements of some structure. Like an alternative for creating for loop where you are doing the same thing many times or an alternative for built in apply() families. The primary function is to work with lists to filter, reshape, summarize etc.
The stringr package deals with strings. Most commonly used with string manipulations Such as detecting matches, subsetting strings, managing length of strings, mutating them, joining them and so on.
This deals with getting and setting components, extracting the various components of date times
The forcats packages deals with factors. You know that there are levels for factors and they are numeric .And there are labels built on top of levels which are categorical .There are some functions like
R Markdown is a similar concept to the Jupiter notebook. R Markdown helps to create a script. This package helps in creating an analysis of documents, and also supports collaborating and sharing codes with others. you can install the package from CRAN as follows:
install.packages("markdown")
The cryptographic hash functions also known as digest algorithms are created using the digest package in R.The digest package support cryptographic applications. Some of the functions availabe in digest package are
Function | Description |
sha1() | for numerally stable hashsums |
hmac() | for hashed message authentication codes based on a key |
AES() | for Advanced Encryption Standard block ciphers |
The statistical functions are provided by the MASS package.
The caret package provide support to performing classification and regression task.
The e1071 package provide functions data analysis like Naive Bayes, Fourier Transforms, SVMs, Clustering, and other miscellaneous functions.
The sentimentr provides packages for sentiment analysis. They support aggregation by rows and calculation of polarity level of sentences.
Shiny is another R package embedded with visualizations supporting functions for charts, plots, graphs etc.
Time series are represented using dygraphs which further allows to make interactive charts.it also includes high configurable series and axis display with interactive features like zoom/pan and series/point highlighting. Installation
You can install the dygraphs package from CRAN as follows:
install.packages("dygraphs")
The leaflet package allows for creating and customizing of interactive maps. It is one of the most popular open-source JavaScript library. Installation To install this R package, run this command at your R prompt:
The ggmap is an R packages that produces static maps and is an extension of ggplot2. The ggmap supports in combining spatial information for visualization of static maps from Google Maps, Open Street Maps, Stamen Maps etc.
The glue helps a regular expression given inside a curly brace {} to get attached to any argument string. You can install from CRAN repository using function install.packages("glue") .The glue is available in stringr package too.If stringr is already installed then you can make use of glue() from stringr.