In data analysis, you often need to create new columns in your dataset to perform calculations, transformations, or add additional information. In tidyverse, you can create new columns using the mutate function from the dplyr package.
What we’ll learn in this section
GoalsIn this section, we'll learn how to:
create new columns using the mutate function in dplyr.
create columns with constant values.
create columns by performing calculations.
create columns by combining other columns.
create columns based on conditions.
dataset |> mutate(new_column = expression)
Here, new_column is the name of the new column you want to create, and expression is the calculation or logic used to define the values in the new column.
Let’s explore different ways to create columns using mutate .
Creating a Column with a Constant Value
You can create a new column with the same value for all rows. This could be any data type, such as a string, number, or logical value.
String
dataset |> mutate(new_column = "value")
Number
dataset |> mutate(new_column = 7)
Logical
dataset |> mutate(new_column = TRUE)
Example: Adding a Constant Column
Let’s add a new column called country with the value “Netherlands” for all flowers in our dataset.
flowers |> mutate(country = "Netherlands")
name
height
season
sunlight
growth
country
Poppy
75
Spring
8.3
fast
Netherlands
Rose
150
Summer
6.4
slow
Netherlands
Zinnia
60
Summer
8.7
fast
Netherlands
Peony
90
Spring
7.2
slow
Netherlands
Exercise 6.1
Using the flowers dataset, create a new column called color with the value “Red” for all flowers.
name
height
season
sunlight
growth
Poppy
75
Spring
8.3
fast
Rose
150
Summer
6.4
slow
Zinnia
60
Summer
8.7
fast
Peony
90
Spring
7.2
slow
flowers |> mutate(color = "Red")
Creating Columns by Calculating with a Value
You can create new columns by performing calculations on existing columns.
dataset |> mutate(new_column = expression)
Here, expression can be any mathematical operation or function that uses existing columns.
Example: Calculating Height in Meters
Let’s create a new column height_m that converts the height from centimeters to meters.
flowers |> mutate(height_m = height / 100)
name
height
season
sunlight
growth
height_m
Poppy
75
Spring
8.3
fast
0.75
Rose
150
Summer
6.4
slow
1.5
Zinnia
60
Summer
8.7
fast
0.6
Peony
90
Spring
7.2
slow
0.9
Exercise 6.2
Create a new column called sunlight_hours that converts the sunlight value from decimal to whole hours by rounding down.
Hint: You can use the floor() function to round down.
Create a new column called flower_info that combines the name , season , and height columns.
name
height
season
sunlight
growth
Poppy
75
Spring
8.3
fast
Rose
150
Summer
6.4
slow
Zinnia
60
Summer
8.7
fast
Peony
90
Spring
7.2
slow
flowers |> mutate(flower_info = paste(name, "is a", season, "flower with a height of", height, "cm"))
# output name height season sunlight growth flower_info1 Poppy 75 Spring 8.3 fast Poppy is a Spring flower with a height of 75 cm2 Rose 150 Summer 6.4 slow Rose is a Summer flower with a height of 150 cm3 Zinnia 60 Summer 8.7 fast Zinnia is a Summer flower with a height of 60 cm4 Peony 90 Spring 7.2 slow Peony is a Spring flower with a height of 90 cm
Creating Columns Based on Conditions
You can create new columns based on conditions using case_when() .
Here, condition1 , condition2 , etc., are logical conditions, and value1 , value2 , etc., are the values assigned to the new column based on the conditions. The TRUE ~ value3 is the default value if none of the conditions are met.
Example: Conditional Column
Let’s create a new column height_category based on the height of the flowers.