Another crucial aspect of data wrangling in R is working with grouped data. This involves aggregating data, performing calculations within groups, and applying operations to subsets of your data based on specific criteria.
What we’ll learn in this section
In this section, we’ll explore how to effectively work with grouped data using the dplyr package in R.
GoalsIn this section, we'll learn how to:
Group data using group_by(), ungroup(), and rowwise()
Perform per-operation grouping using .by/by
Summarize grouped data using summarise(), reframe(), count(), and tally()
Understand the difference between adding counts with add_count() and add_tally()
Apply group-wise operations using mutate() and filter()
Grouping Data
The group_by() function is used to group your data by one or more variables.
Syntax
You can group by multiple columns:
When you group data using group_by() without any function after that, you won’t notice any difference. The grouping is still applied. Only when you use a function after grouping, you will notice the effect of grouping.
Example
Group students by section:
Output:
In the output, you can see the message Groups: section [2] . This indicates that the data is grouped by the section column.
If you use a function after grouping, you will see the effect of grouping. For example, calculating the mean study time by section.
Output:
Exercise 3.1
Group the students by section and calculate the mean play time for each section.
id
name
section
study
play
1
Alia
A
2
5
2
Bala
B
8
5
3
Cara
A
NA
10
4
Dana
B
4
10
Output:
Ungrouping Data
The ungroup() function removes grouping from your data. It is a common mistake to forget to ungroup our data after grouping, which can lead to unexpected results.
Example
Group by section, then ungroup:
Calculating Values for Each Row
The rowwise() function groups data frame by rows, allowing operations to be performed row by row.
Example
Calculate total time for each student:
Output:
In this case, the total time is calculated for each row. For example, the total time for Alia is 2 (study) + 5 (play) = 7.
If we do not use rowwise(), the sum would be calculated for each column separately and this would not be the total time for each student.
For example, if we use the following code:
We get the following output:
In this case, the number 44 is the sum of study column (2 + 8 + 4 = 14) and play column (5 + 5 + 10 + 10 = 30). It is not the sum of each row. To avoid this, we must use rowwise() to calculate the sum for each row.
Exercise 3.2
Calculate the average of study and play time for each student.
id
name
section
study
play
1
Alia
A
2
5
2
Bala
B
8
5
3
Cara
A
NA
10
4
Dana
B
4
10
Output:
Group-wise Operations
After grouping your data, dplyr allows you to perform powerful operations on these groups. This lets you apply calculations to each group separately, enabling complex data analysis and transformation.
We’ll cover two key combinations:
group_by() + mutate() : Create or modify columns based on group calculations.
group_by() + filter() : Filter rows using conditions evaluated within each group.
These techniques will help you manipulate grouped data effectively and gain valuable insights. Let’s explore how they work with some practical examples.
group_by() + mutate()
This combination allows you to create or modify columns based on group-level calculations.
Example
Add column with mean study time for each section:
Output:
Exercise 3.3
Group the students by section and add a column with the total study time for each section.
id
name
section
study
play
1
Alia
A
2
5
2
Bala
B
8
5
3
Cara
A
NA
10
4
Dana
B
4
10
Output:
group_by() + filter()
This combination allows you to filter rows based on group-level conditions.
Example
Keep only students with play time less than 10 in each section:
Output:
Exercise 3.4
Group the students by section and filter out students who have a study time greater than 5 hours in each section.
id
name
section
study
play
1
Alia
A
2
5
2
Bala
B
8
5
3
Cara
A
NA
10
4
Dana
B
4
10
Output:
Review
We’ve learned how to work with groups. Let’s review what we’ve covered in this section:
Quiz
Loading...
Loading...
Loading...
SummaryIn this section, you've learned how to:
Group data using group_by() , ungroup() , and rowwise()
Summarize grouped data using summarise()
Apply group-wise operations using mutate() and filter()
In the next section, we’ll learn how to join data frames in R using dplyr functions.