Working with Rows | The School of Data

Data Wrangling in R with dplyr and tidyr Working with Rows

Row Operations

One way to approach data wrangling in R is to work with rows. This involves ordering rows, keeping only unique rows, filtering rows based on conditions, and subsetting rows based on their positions or values.

In this section, we’ll learn how to perform these operations using the dplyr package in R.

What we’ll learn in this section

Goals In this section, we will learn how to:

Order rows using the arrange() function
Keep only unique rows using the distinct() function
Filter rows based on single and multiple conditions using the filter() function
Use logical operators in filtering
Subset rows based on their positions or values using the slice() family of functions

Example Dataset

We’ll create and use this sample dataset. It contains 5 columns and 4 rows.

 students <- data.frame(
  id = 1:4,
  name = c("Alia", "Bala", "Cara", "Dana"),
  section = c("A", "B", "A", "B"),
  study = c(2, 8, NA, 4),
  play = c(5, 5, 10, 10)
)

This is how the dataset looks like:

id	name	section	study	play
1	Alia	A	2	5
2	Bala	B	8	5
3	Cara	A	NA	10
4	Dana	B	4	10

Arrange Rows

The arrange() function is used to sort rows based on one or more columns.

Syntax

By default, the arrange() function sorts in ascending order.

 dataset |> 
  arrange(column_name)

To sort in descending order, use the desc() function.

 dataset |> 
  arrange(desc(column_name))

To arrange by multiple columns, list the column names in the arrange() function.

 dataset |> 
  arrange(column_name1, column_name2)

Let’s try this with some examples.

Example

To sort the students data frame by the section column in descending order, we can:

 students |> 
  arrange(desc(section))

Output:

 # A tibble: 4 × 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     2 Bala  B           8     5
2     4 Dana  B           4    10
3     1 Alia  A           2     5
4     3 Cara  A          NA    10

To sort the data frame first by the play column in ascending order and then by the study column in descending order, we can:

 students |> 
  arrange(play, desc(study))

Output:

 # A tibble: 4 × 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     2 Bala  B           8     5
2     1 Alia  A           2     5
3     4 Dana  B           4    10
4     3 Cara  A          NA    10

Exercise 2.1

Arrange the students data frame by the section column in descending order.

id	name	section	study	play
1	Alia	A	2	5
2	Bala	B	8	5
3	Cara	A	NA	10
4	Dana	B	4	10

 students |> 
  arrange(desc(section))

Output:

 # A tibble: 4 × 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     2 Bala  B           8     5
2     4 Dana  B           4    10
3     1 Alia  A           2     5
4     3 Cara  A          NA    10

Exercise 2.2

Sort the students data frame by the section column in descending order and id in descending order.

id	name	section	study	play
1	Alia	A	2	5
2	Bala	B	8	5
3	Cara	A	NA	10
4	Dana	B	4	10

 students |> 
  arrange(desc(section), desc(id))

Output:

 # A tibble: 4 x 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     4 Dana  B           4    10
2     2 Bala  B           8     5
3     3 Cara  A          NA    10
4     1 Alia  A           2     5

Keep Only Unique Rows

The distinct() function is used to keep only unique/distinct rows from a data frame.

Syntax

To keep distinct rows based on all columns:

 dataset |> 
  distinct()

To keep distinct rows based on specific columns:

 dataset |> 
  distinct(column_name1, column_name2)

The .keep_all = TRUE argument keeps all columns in the result.

 dataset |> 
  distinct(column_name1, column_name2, .keep_all = TRUE)

Example

To keep only unique rows based on the section column:

 students |> 
  distinct(section)

Output:

 # A tibble: 2 × 1
  section
  <chr>  
1 A      
2 B

Exercise 3.1

Keep only unique rows based on the play column.

id	name	section	study	play
1	Alia	A	2	5
2	Bala	B	8	5
3	Cara	A	NA	10
4	Dana	B	4	10

Filter Rows

The filter() function is used to subset rows based on column values.

Syntax

To filter rows based on a single condition:

 dataset |> 
  filter(condition)

Example

To filter rows where study is greater than 3:

 students |> 
  filter(study > 3)

Output:

 # A tibble: 2 × 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     2 Bala  B           8     5
2     4 Dana  B           4    10

Exercise 4.1

Filter the students data frame to show only rows where id is less than 3.

id	name	section	study	play
1	Alia	A	2	5
2	Bala	B	8	5
3	Cara	A	NA	10
4	Dana	B	4	10

 students |> 
  filter(id < 3)

Output:

 # A tibble: 2 × 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     1 Alia  A           2     5
2     2 Bala  B           8     5

To filter rows with two or more conditions, we can write conditional statements using logical operators. The common logical operators are & (and), | (or), and ! (not).

Syntax

Common logical operators:

& (and)
| (or)
! (not)

Example usage:

 dataset |> 
  filter(condition1 & condition2)
 
dataset |>
  filter(condition1 | condition2)
 
dataset |> 
  filter(!condition)

Example

To filter rows where section is “A” or play is greater than 5:

 students |> 
  filter(section == "A" | play > 5)

Output:

 # A tibble: 3 × 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     1 Alia  A           2     5
2     3 Cara  A          NA    10
3     4 Dana  B           4    10

Exercise 5.1

Filter the students data frame to show rows where section is “B” and study is not NA.

id	name	section	study	play
1	Alia	A	2	5
2	Bala	B	8	5
3	Cara	A	NA	10
4	Dana	B	4	10

 students |> 
  filter(section == "B" & !is.na(study))

Output:

 # A tibble: 2 × 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     2 Bala  B           8     5
2     4 Dana  B           4    10

Subset Rows Based on Positions or Values

The slice() family of functions is used to subset rows based on their positions or values.

Syntax

To select rows by position:

 dataset |> 
  slice(row_numbers)

To select top N rows:

 dataset |> 
  slice_head(n = N)

To select bottom N rows:

 dataset |> 
  slice_tail(n = N)

To select random N rows:

 set.seed(seed_value) # this can be any integer
dataset |> 
  slice_sample(n = N)

Example

To select the first two rows:

 students |> 
  slice(1:2)

Output:

 # A tibble: 2 × 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     1 Alia  A           2     5
2     2 Bala  B           8     5

Example

To select the last two rows:

 students |> 
  slice_tail(2)

Output:

 # A tibble: 2 × 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     3 Cara  A          NA    10
2     4 Dana  B           4    10

Exercise 6.1

Select the first three rows from the students data frame.

id	name	section	study	play
1	Alia	A	2	5
2	Bala	B	8	5
3	Cara	A	NA	10
4	Dana	B	4	10

 students |> 
  slice_head(n = 3)
 
# or
 
students |> 
  slice(1:3)

Output:

 # A tibble: 3 × 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     1 Alia  A           2     5
2     2 Bala  B           8     5
3     3 Cara  A          NA    10

Exercise 6.2

Select a sample of two rows from the students data frame. Add set.seed(123) before the command to ensure reproducibility.

id	name	section	study	play
1	Alia	A	2	5
2	Bala	B	8	5
3	Cara	A	NA	10
4	Dana	B	4	10

 set.seed(123)
students |> 
  slice_sample(n = 2)

Output:

 # A tibble: 2 x 5
     id name  section study  play
  <int> <chr> <chr>   <dbl> <dbl>
1     3 Cara  A          NA    10
2     4 Dana  B           4    10

Review

In this section, we learned how different dplyr functions can be used to manipulate rows in R. Let’s review with a quiz and summarize what we’ve learned so far.

Quiz

Summary In this section, we learned how to:

Order rows using the arrange() function
Keep only unique rows using the distinct() function
Filter rows based on single and multiple conditions using the filter() function
Use logical operators in filtering
Subset rows based on their positions or values using the slice() family of functions

In the next section, we’ll learn how to apply column operations in R using dplyr functions.

Previous Introduction

Go to Next SectionWorking with Columns →

On this page

- Row Operations
- What we’ll learn in this section
- Example Dataset
- Arrange Rows
- Keep Only Unique Rows
- Filter Rows
- Subset Rows Based on Positions or Values
- Review