Packages in R for data science


April 14, 2022, Learn eTutorial
1204

In this tutorial, you will familiarize yourself with some of the most useful R packages. R is the popular language for data science. In the previous tutorial, we learned about repositories and their related installation. More than 16,000 packages are available in CRAN. We are not able to explain all the packages but some of the most used libraries by data scientists for their everyday activities are mentioned in this tutorial.

R packages for data scientists

The R packages allow the data scientists to manipulate the dataset, visualize the data, support dealing with data types, object types, and structures, creating reports, interactive applications and models, and machine learning.

The R packages used for data sciences are as follows:

  1. dplyr

    Dplyr is a part of the Tidyverse framework of packages and a basic package sufficient for all data manipulation. Some of the functions provided by dplyr package are listed in the below table.

  2. data.table

    This is an R package that introduces a new structure called data tables.It has a different syntax from Tidyverse .When working with big datasets data.table is more convenient than dplyr.

  3. tidyr

    This package is from Tidyverse but different from dplyr. The key focus is to get the data in tidy format. A tidy dataset means a data set that satisfies three conditions such as

    Every column is variable Every row is an observation Every cell is a single value.

    The key function are pivot_longer() and pivot_wider().The pivot_longer() supports to move from many columns to many rows and the reverse function happens for pivot_wider().There are some other function too for separating or uniting columns and for dealing with explicit and implicit missing data.

  4. ggplot2

    gg stands for grammar for graphics. The ggplot2 is an essential framework for simplifying any graph for that you need some basic components such as data, coordinate mapping system and objects

  5. plotly

    plotly has implementation in python and R.The plotly is different from ggplot2 takes to next level of visualization and is dynamic.

  6. purrr

    This package provides a couple of different things such as helps to apply functions to multiple different elements of some structure. Like an alternative for creating for loop where you are doing the same thing many times or an alternative for built in apply() families. The primary function is to work with lists to filter, reshape, summarize etc.

  7. stringr

    The stringr package deals with strings. Most commonly used with string manipulations Such as detecting matches, subsetting strings, managing length of strings, mutating them, joining them and so on.

  8. lubridate

    This deals with getting and setting components, extracting the various components of date times

  9. forcats

    The forcats packages deals with factors. You know that there are levels for factors and they are numeric .And there are labels built on top of levels which are categorical .There are some functions like

  10. R Markdown

    R Markdown is a similar concept to the Jupiter notebook. R Markdown helps to create a script. This package helps in creating an analysis of documents, and also supports collaborating and sharing codes with others. you can install the package from CRAN as follows:

    install.packages("markdown")

    If you want to use the development version of the rmarkdown package (either with or without RStudio), you can install the package from GitHub via the remotes package: remotes::install_github('rstudio/rmarkdown')
  11. digest

    The cryptographic hash functions also known as digest algorithms are created using the digest package in R.The digest package support cryptographic applications. Some of the functions availabe in digest package are

    Function Description
    sha1()  for numerally stable hashsums
    hmac()  for hashed message authentication codes based on a key
    AES()  for Advanced Encryption Standard block ciphers
  12. MASS

    The statistical functions are provided by the MASS package.

  13. caret

    The caret package provide support to performing classification and regression task.

  14. e1071

    The e1071 package provide functions data analysis like Naive Bayes, Fourier Transforms, SVMs, Clustering, and other miscellaneous functions.

  15. sentimentr

    The sentimentr provides packages for sentiment analysis. They support aggregation by rows and calculation of polarity level of sentences.

  16. shiny

    Shiny is another R package embedded with visualizations supporting functions for charts, plots, graphs etc.

  17. dygraph

    Time series are represented using dygraphs which further allows to make interactive charts.it also includes high configurable series and axis display with interactive features like zoom/pan and series/point highlighting. Installation

    You can install the dygraphs package from CRAN as follows:

    install.packages("dygraphs")

  18. leaflet

    The leaflet package allows for creating and customizing of interactive maps. It is one of the most popular open-source JavaScript library. Installation To install this R package, run this command at your R prompt:

  19. ggmap

    The ggmap is an R packages that produces static maps and is an extension of ggplot2. The ggmap supports in combining spatial information for visualization of static maps from Google Maps, Open Street Maps, Stamen Maps etc.

  20. glue

    The glue helps a regular expression given inside a curly brace {} to get attached to any argument string. You can install from CRAN repository using function install.packages("glue") .The glue is available in stringr package too.If stringr is already installed then you can make use of glue() from stringr.

  21. reshape2

  22. dichromat