R Program to compare two data frames to find the row(s) in first data frame

How to compare two data frames to find the row in the first data frame but not the second data frame

Here we are explaining how to write an R program to compare two data frames to find the row in the first data frame but not the second data frame. Here we are using a built-in function data.frame(). A data frame is used for storing data tables which has a list of vectors with equal length. The function setdiff() helps to calculate the (nonsymmetric) set difference of subsets of a probability space. The syntax of this function is, 


setdiff(x, …) 

Where x, y vectors, data frames, or ps objects containing a sequence of items.And dots(...) indicates the arguments to be passed to or from other methods.

How to compare two data frames to find the row in the first data frame but not the second data frame in the R program

Below are the steps used in the R program to compare two data frames to find the row in the first data frame but not the second data frame. In this R program, we directly give the data frame to a built-in function. Here we are using variables DF1, DF2 for holding different data frames. Call the function data.frame() for creating data frame. Finally, compare the two data frames  by calling the function setdiff() like setdiff(DF1,DF2).

ALGORITHM

STEP 1: Assign variables DF1,DF2 with data frames 

STEP 2: First print original data frames 

STEP 3:  Compare the two data frames by calling like setdiff(DF1,DF2)

STEP 4: Print the final data frame

R Source Code

                                          DF1 = data.frame(
  "item" = c("item1", "item2", "item3"),
  "Jan" = c(12, 14, 12),
  "Feb" = c(11, 12, 15),
  "Mar" = c(12, 14, 15)
)
DF2 = data.frame(
  "item" = c("item1", "item2", "item3"),
  "Jan" = c(12, 14, 12),
  "Feb" = c(11, 12, 15),
  "Mar" = c(12, 15, 18)
)
print("Original Dataframes:")
print(DF1)
print(DF2)
print("Row(s) in first data frame that are not present in second data frame:")
print(setdiff(DF1,DF2))
                                      

OUTPUT

[1] "Original Dataframes:"
   item Jan Feb Mar
1 item1  12  11  12
2 item2  14  12  14
3 item3  12  15  15
   item Jan Feb Mar
1 item1  12  11  12
2 item2  14  12  15
3 item3  12  15  18
[1] "Row(s) in first data frame that are not present in second data frame:"
  Mar
1  12
2  14
3  15