R Program to compare two data frames to find the row(s) in first data frame


December 28, 2022, Learn eTutorial
1400

How to compare two data frames to find the row in the first data frame but not the second data frame

A data frame is used for storing data tables that have a list of vectors with equal length. To write an R program for comparing the data frames, we are using data.frame() function. it is an in-built function that helps to create a data frame. The function setdiff() helps to calculate the (nonsymmetric) set difference of subsets of a probability space, which is used for comparing. The syntax of this function is, 


setdiff(x, …) 

Where

  • x, y vectors, data frames, or ps objects containing a sequence of items.
  • dots(...) indicates the arguments to be passed to or from other methods.

how to implement data frame comparison logic in an R program

In this R program, we directly give the data frame to a built-in function. Here we are using variables DF1, and DF2 for holding different data frames. Call the function data.frame() for creating a DF. Finally, compare the two DFs by calling the function setdiff() like setdiff(DF1, DF2).

ALGORITHM

STEP 1: Assign variables DF1, DF2 with data frames 

STEP 2: First print the original values 

STEP 3:  Compare it by calling like setdiff(DF1,DF2)

STEP 4: Print the final result

R Source Code

                                          DF1 = data.frame(
  "item" = c("item1", "item2", "item3"),
  "Jan" = c(12, 14, 12),
  "Feb" = c(11, 12, 15),
  "Mar" = c(12, 14, 15)
)
DF2 = data.frame(
  "item" = c("item1", "item2", "item3"),
  "Jan" = c(12, 14, 12),
  "Feb" = c(11, 12, 15),
  "Mar" = c(12, 15, 18)
)
print("Original Dataframes:")
print(DF1)
print(DF2)
print("Row(s) in first data frame that are not present in second data frame:")
print(setdiff(DF1,DF2))
                                      

OUTPUT

[1] "Original Dataframes:"
   item Jan Feb Mar
1 item1  12  11  12
2 item2  14  12  14
3 item3  12  15  15
   item Jan Feb Mar
1 item1  12  11  12
2 item2  14  12  15
3 item3  12  15  18
[1] "Row(s) in first data frame that are not present in second data frame:"
  Mar
1  12
2  14
3  15