In data analysis, you often need to create new columns in your dataset to perform calculations, transformations, or add additional information. In tidyverse, you can create new columns using the mutate function from the dplyr package.
What we’ll learn in this section
GoalsIn this section, we'll learn how to:
create new columns using the mutate function in dplyr.
create columns with constant values.
create columns by performing calculations.
create columns by combining other columns.
create columns based on conditions.
Here, new_column is the name of the new column you want to create, and expression is the calculation or logic used to define the values in the new column.
Let’s explore different ways to create columns using mutate .
Creating a Column with a Constant Value
You can create a new column with the same value for all rows. This could be any data type, such as a string, number, or logical value.
StringNumberLogical
Example: Adding a Constant Column
Let’s add a new column called country with the value “Netherlands” for all flowers in our dataset.
name
height
season
sunlight
growth
country
Poppy
75
Spring
8.3
fast
Netherlands
Rose
150
Summer
6.4
slow
Netherlands
Zinnia
60
Summer
8.7
fast
Netherlands
Peony
90
Spring
7.2
slow
Netherlands
Exercise 6.1
Using the flowers dataset, create a new column called color with the value “Red” for all flowers.
name
height
season
sunlight
growth
Poppy
75
Spring
8.3
fast
Rose
150
Summer
6.4
slow
Zinnia
60
Summer
8.7
fast
Peony
90
Spring
7.2
slow
Creating Columns by Calculating with a Value
You can create new columns by performing calculations on existing columns.
Here, expression can be any mathematical operation or function that uses existing columns.
Example: Calculating Height in Meters
Let’s create a new column height_m that converts the height from centimeters to meters.
name
height
season
sunlight
growth
height_m
Poppy
75
Spring
8.3
fast
0.75
Rose
150
Summer
6.4
slow
1.5
Zinnia
60
Summer
8.7
fast
0.6
Peony
90
Spring
7.2
slow
0.9
Exercise 6.2
Create a new column called sunlight_hours that converts the sunlight value from decimal to whole hours by rounding down.
Hint: You can use the floor() function to round down.
name
height
season
sunlight
growth
Poppy
75
Spring
8.3
fast
Rose
150
Summer
6.4
slow
Zinnia
60
Summer
8.7
fast
Peony
90
Spring
7.2
slow
Creating Columns by Combining Other Columns
You can create new columns by combining or manipulating multiple existing columns.
For example, you can use the paste() function to combine strings from different columns.
Example: Combining Columns
Let’s create a new column description that combines the name and growth columns.
name
height
season
sunlight
growth
description
Poppy
75
Spring
8.3
fast
Poppy is a fast growing flower
Rose
150
Summer
6.4
slow
Rose is a slow growing flower
Zinnia
60
Summer
8.7
fast
Zinnia is a fast growing flower
Peony
90
Spring
7.2
slow
Peony is a slow growing flower
Exercise 6.3
Create a new column called flower_info that combines the name , season , and height columns.
name
height
season
sunlight
growth
Poppy
75
Spring
8.3
fast
Rose
150
Summer
6.4
slow
Zinnia
60
Summer
8.7
fast
Peony
90
Spring
7.2
slow
Creating Columns Based on Conditions
You can create new columns based on conditions using case_when() .
Here, condition1 , condition2 , etc., are logical conditions, and value1 , value2 , etc., are the values assigned to the new column based on the conditions. The TRUE ~ value3 is the default value if none of the conditions are met.
Example: Conditional Column
Let’s create a new column height_category based on the height of the flowers.
name
height
season
sunlight
growth
height_category
Poppy
75
Spring
8.3
fast
Medium
Rose
150
Summer
6.4
slow
Tall
Zinnia
60
Summer
8.7
fast
Short
Peony
90
Spring
7.2
slow
Medium
Exercise 6.4
Create a new column called sunlight_category based on the sunlight value of the flowers. Use the following categories:
“Low” for sunlight less than 7 hours
“Medium” for sunlight between 7 and 8 hours
“High” for sunlight more than 8 hours
name
height
season
sunlight
growth
Poppy
75
Spring
8.3
fast
Rose
150
Summer
6.4
slow
Zinnia
60
Summer
8.7
fast
Peony
90
Spring
7.2
slow
Review
Quiz
Loading...
Loading...
SummaryYou've learned how to:
Create new columns using the mutate function in dplyr.