A Beginner's Introduction to R Through the Tidyverse

Filtering Rows

Learn how to filter rows using dplyr.

Course Sections

What is Filtering?

Datasets often contain a large amount of data, and it can be challenging to work with all of it at once. Filtering is a way to reduce the size of the dataset by selecting only the rows that meet specific criteria.

Why do we need to filter data?

There are several reasons why we might want to filter data. Some reasons include:

  • To focus on a specific subset of the data.
  • To remove rows that are not relevant to our analysis.
  • To remove missing or invalid data.
  • To prepare the data for further analysis.
  • What we’ll learn in this section

    Goals In this section, we'll learn how to:
    • Filter rows using the filter function in dplyr.
    • Filter rows based on a single condition.
    • Filter rows based on multiple conditions.
    • Filter rows using logical operators.

    Basic Syntax for Filtering

    To use the filter function in dplyr, we use the following syntax:

     data |> 
      filter(condition) 

    We specify the dataset we want to filter, followed by the |> operator, and then the filter function. Inside the filter function, we specify the condition that we want to filter on.

    The condition can be any logical expression that evaluates to TRUE or FALSE . Rows where the condition is TRUE are kept, and rows where the condition is FALSE are removed.

    For example, here is an example condition:

     column_name > 10 

    This code will keep only the rows where the value in the column column_name is greater than 10.

    If we need to filter rows that are equal to a specific value, we use the == operator. It has two equal signs, not one. For a value less than, we use < , for greater than, we use > , for less than or equal to, we use <= , and for greater than or equal to, we use >= .

    We can also use != to filter rows that are not equal to a specific value.

    OperatorDescription
    ==Equal to
    !=Not equal to
    >Greater than
    <Less than
    >=Greater than or equal to
    <=Less than or equal to

    Remember that you have to use two equal signs == for comparison, not one equal sign = .

    Filtering with a Single Condition

    Example

      Let’s say we want to filter the flowers dataset to keep only the rows where the season is “Summer”.

      nameheightseasonsunlightgrowth
      Poppy75Spring8.3fast
      Rose150Summer6.4slow
      Zinnia60Summer8.7fast
      Peony90Spring7.2slow

      To filter the dataset, we can use the following code:

       flowers |> 
        filter(season == "Summer") 
      nameheightseasonsunlightgrowth
      Rose150Summer6.4slow
      Zinnia60Summer8.7fast

      This code will return only the flowers that bloom in the summer. Since the column has character values, we use quotes around the value we are comparing it to. Use "Summer" with quotes. Don’t use Summer without quotes.

    Exercise 4.1

    Try filtering the flowers dataset to keep only the rows where growth is “fast”. Remember to use == for comparison, not = .

    nameheightseasonsunlightgrowth
    Poppy75Spring8.3fast
    Rose150Summer6.4slow
    Zinnia60Summer8.7fast
    Peony90Spring7.2slow

    Filtering with Multiple Conditions

    You can also filter rows based on multiple conditions. To do this, you can use the & operator to combine conditions.

    Syntax for when both conditions must be met

    You can use a comma , or the & operator to filter rows that meet both conditions.

     data |> 
      filter(condition1, condition2) 

    or

     data |> 
      filter(condition1 & condition2) 
    Example

      For example, let’s filter the flowers dataset to keep only the rows where season is “Spring” and sunlight is greater than 8.

      nameheightseasonsunlightgrowth
      Poppy75Spring8.3fast
      Rose150Summer6.4slow
      Zinnia60Summer8.7fast
      Peony90Spring7.2slow
       flowers |> 
        filter(season == "Spring", sunlight > 8) 

      or

       flowers |>
        filter(season == "Spring" & sunlight > 8) 

      This code will return only the flowers that bloom in spring and receive more than 8 hours of sunlight.

      nameheightseasonsunlightgrowth
      Poppy75Spring8.3fast
    Exercise 4.2

    Try filtering the flowers dataset to keep only the rows where season is “Summer” and growth is “fast”.

    nameheightseasonsunlightgrowth
    Poppy75Spring8.3fast
    Rose150Summer6.4slow
    Zinnia60Summer8.7fast
    Peony90Spring7.2slow

    Syntax for when at least one condition must be met

    You can use the | operator to filter rows that meet either of two conditions.

     data |> 
      filter(condition1 | condition2) 

    This code will return rows that satisfy either condition1 or condition2 .

    Example

      Let’s filter the flowers dataset to keep only the rows where season is “Spring” or growth is “slow”.

      nameheightseasonsunlightgrowth
      Poppy75Spring8.3fast
      Rose150Summer6.4slow
      Zinnia60Summer8.7fast
      Peony90Spring7.2slow
       flowers |> 
        filter(season == "Spring" | growth == "slow") 

      This code will return only the flowers that bloom in spring or have slow growth.

      nameheightseasonsunlightgrowth
      Poppy75Spring8.3fast
      Rose150Summer6.4slow
      Peony90Spring7.2slow
    Exercise 4.3

    Run this code to keep only the rows where height is greater than 70 or growth is “slow”.

    nameheightseasonsunlightgrowth
    Poppy75Spring8.3fast
    Rose150Summer6.4slow
    Zinnia60Summer8.7fast
    Peony90Spring7.2slow
    Exercise 4.4

    Type the code to filter the flowers dataset to keep only the rows where the height is less than 91 or the sunlight is greater than 8.5.

    nameheightseasonsunlightgrowth
    Poppy75Spring8.3fast
    Rose150Summer6.4slow
    Zinnia60Summer8.7fast
    Peony90Spring7.2slow

    Filtering rows that do not meet a condition

    You can use the ! operator to filter rows that do not meet a condition.

     data |> 
      filter(!condition) 
    Example

      Let’s filter the flowers dataset to keep only the rows where season is not “Summer”.

      nameheightseasonsunlightgrowth
      Poppy75Spring8.3fast
      Rose150Summer6.4slow
      Zinnia60Summer8.7fast
      Peony90Spring7.2slow
       flowers |> 
        filter(!(season == "Summer")) 

      You can also write it as:

       flowers |> 
        filter(season != "Summer") 

      This code will return only the flowers that do not bloom in summer.

      nameheightseasonsunlightgrowth
      Poppy75Spring8.3fast
      Peony90Spring7.2slow
    Exercise 4.5

    Filter the flowers dataset to keep only the rows where growth is not “slow”.

    nameheightseasonsunlightgrowth
    Poppy75Spring8.3fast
    Rose150Summer6.4slow
    Zinnia60Summer8.7fast
    Peony90Spring7.2slow

    Review

    Quiz

      Loading...

      Loading...

      Loading...

    Review You've learned how to:
    • Filter rows using the filter function in dplyr.
    • Filter rows based on a single condition.
    • Filter rows based on multiple conditions.
    • Filter rows using logical operators.