What is Filtering?
Datasets often contain a large amount of data, and it can be challenging to work with all of it at once. Filtering is a way to reduce the size of the dataset by selecting only the rows that meet specific criteria.
Why do we need to filter data?
There are several reasons why we might want to filter data. Some reasons include:
To focus on a specific subset of the data.
To remove rows that are not relevant to our analysis.
To remove missing or invalid data.
To prepare the data for further analysis.
What we’ll learn in this section
Goals In this section, we'll learn how to: Filter rows using the filter
function in dplyr. Filter rows based on a single condition. Filter rows based on multiple conditions. Filter rows using logical operators.
Basic Syntax for Filtering
To use the filter function in dplyr, we use the following syntax:
data |>
filter (condition)
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2">
We specify the dataset we want to filter, followed by the |>
operator, and then the filter
function. Inside the filter
function, we specify the condition that we want to filter on.
The condition can be any logical expression that evaluates to TRUE
or FALSE
. Rows where the condition is TRUE
are kept, and rows where the condition is FALSE
are removed.
For example, here is an example condition:
column_name > 10
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2">
This code will keep only the rows where the value in the column column_name
is greater than 10.
If we need to filter rows that are equal to a specific value, we use the ==
operator. It has two equal signs, not one. For a value less than, we use <
, for greater than, we use >
, for less than or equal to, we use <=
, and for greater than or equal to, we use >=
.
We can also use !=
to filter rows that are not equal to a specific value.
Operator Description == Equal to != Not equal to > Greater than < Less than >= Greater than or equal to <= Less than or equal to
Remember that you have to use two equal signs ==
for comparison, not one equal sign =
.
Filtering with a Single Condition
Example Let’s say we want to filter the flowers dataset to keep only the rows where the season
is “Summer”.
name height season sunlight growth Poppy 75 Spring 8.3 fast Rose 150 Summer 6.4 slow Zinnia 60 Summer 8.7 fast Peony 90 Spring 7.2 slow
To filter the dataset, we can use the following code:
flowers |>
filter (season == "Summer" )
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2">
name height season sunlight growth Rose 150 Summer 6.4 slow Zinnia 60 Summer 8.7 fast
This code will return only the flowers that bloom in the summer. Since the column has character values, we use quotes around the value we are comparing it to. Use "Summer"
with quotes. Don’t use Summer
without quotes.
Exercise 4.1 Try filtering the flowers dataset to keep only the rows where growth
is “fast”. Remember to use ==
for comparison, not =
.
name height season sunlight growth Poppy 75 Spring 8.3 fast Rose 150 Summer 6.4 slow Zinnia 60 Summer 8.7 fast Peony 90 Spring 7.2 slow
show solution flowers |>
filter (growth == "fast" )
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2">
Filtering with Multiple Conditions
You can also filter rows based on multiple conditions. To do this, you can use the &
operator to combine conditions.
Syntax for when both conditions must be met
You can use a comma ,
or the &
operator to filter rows that meet both conditions.
data |>
filter (condition1, condition2)
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2">
or
data |>
filter (condition1 & condition2)
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2">
Example For example, let’s filter the flowers dataset to keep only the rows where season
is “Spring” and sunlight
is greater than 8.
name height season sunlight growth Poppy 75 Spring 8.3 fast Rose 150 Summer 6.4 slow Zinnia 60 Summer 8.7 fast Peony 90 Spring 7.2 slow
flowers |>
filter (season == "Spring" , sunlight > 8 )
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2"> or
flowers |>
filter (season == "Spring" & sunlight > 8 )
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2"> This code will return only the flowers that bloom in spring and receive more than 8 hours of sunlight.
name height season sunlight growth Poppy 75 Spring 8.3 fast
Exercise 4.2 Try filtering the flowers dataset to keep only the rows where season
is “Summer” and growth
is “fast”.
name height season sunlight growth Poppy 75 Spring 8.3 fast Rose 150 Summer 6.4 slow Zinnia 60 Summer 8.7 fast Peony 90 Spring 7.2 slow
show solution flowers |>
filter (season == "Summer" , growth == "fast" )
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2">
Syntax for when at least one condition must be met
You can use the |
operator to filter rows that meet either of two conditions.
data |>
filter (condition1 | condition2)
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2">
This code will return rows that satisfy either condition1
or condition2
.
Example Let’s filter the flowers dataset to keep only the rows where season
is “Spring” or growth
is “slow”.
name height season sunlight growth Poppy 75 Spring 8.3 fast Rose 150 Summer 6.4 slow Zinnia 60 Summer 8.7 fast Peony 90 Spring 7.2 slow
flowers |>
filter (season == "Spring" | growth == "slow" )
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2"> This code will return only the flowers that bloom in spring or have slow growth.
name height season sunlight growth Poppy 75 Spring 8.3 fast Rose 150 Summer 6.4 slow Peony 90 Spring 7.2 slow
Exercise 4.3 Run this code to keep only the rows where height
is greater than 70 or growth
is “slow”.
name height season sunlight growth Poppy 75 Spring 8.3 fast Rose 150 Summer 6.4 slow Zinnia 60 Summer 8.7 fast Peony 90 Spring 7.2 slow
Exercise 4.4 Type the code to filter the flowers dataset to keep only the rows where the height is less than 91 or the sunlight is greater than 8.5.
name height season sunlight growth Poppy 75 Spring 8.3 fast Rose 150 Summer 6.4 slow Zinnia 60 Summer 8.7 fast Peony 90 Spring 7.2 slow
show solution flowers |>
filter (height < 91 | sunlight > 8.5 )
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2">
Filtering rows that do not meet a condition
You can use the !
operator to filter rows that do not meet a condition.
data |>
filter ( ! condition)
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2">
Example Let’s filter the flowers dataset to keep only the rows where season
is not “Summer”.
name height season sunlight growth Poppy 75 Spring 8.3 fast Rose 150 Summer 6.4 slow Zinnia 60 Summer 8.7 fast Peony 90 Spring 7.2 slow
flowers |>
filter ( ! (season == "Summer" ))
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2"> You can also write it as:
flowers |>
filter (season != "Summer" )
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2"> This code will return only the flowers that do not bloom in summer.
name height season sunlight growth Poppy 75 Spring 8.3 fast Peony 90 Spring 7.2 slow
Exercise 4.5 Filter the flowers dataset to keep only the rows where growth
is not “slow”.
name height season sunlight growth Poppy 75 Spring 8.3 fast Rose 150 Summer 6.4 slow Zinnia 60 Summer 8.7 fast Peony 90 Spring 7.2 slow
show solution flowers |>
filter ( ! (growth == "slow" ))
# or
flowers |>
filter (growth != "slow" )
copied = true)" @mouseleave.debounce.1000ms="copied && (copied = false)" @keydown.enter.debounce.1000ms="copied && (copied = false)" @keydown.space.debounce.1000ms="copied && (copied = false)" @touchstart.debounce.1000ms="copied && (copied = false)" class="group flex select-none items-center justify-between gap-2 rounded p-2 hover:bg-3 hover:text-brand focus-visible:outline-none focus-visible:ring-1 motion-safe:transition-colors absolute right-0 top-0 text-2">
Review
Quiz Loading...
Loading...
Loading...
Review You've learned how to: Filter rows using the filter
function in dplyr. Filter rows based on a single condition. Filter rows based on multiple conditions. Filter rows using logical operators.