A Beginner's Introduction to R Through the Tidyverse

Creating Columns

Learn how to create new columns using dplyr.

Course Sections

How to Create Columns

In data analysis, you often need to create new columns in your dataset to perform calculations, transformations, or add additional information. In tidyverse, you can create new columns using the mutate function from the dplyr package.

What we’ll learn in this section

Goals In this section, we'll learn how to:
  • create new columns using the mutate function in dplyr.
  • create columns with constant values.
  • create columns by performing calculations.
  • create columns by combining other columns.
  • create columns based on conditions.
 dataset |> 
  mutate(new_column = expression) 

Here, new_column is the name of the new column you want to create, and expression is the calculation or logic used to define the values in the new column.

Let’s explore different ways to create columns using mutate .

Creating a Column with a Constant Value

You can create a new column with the same value for all rows. This could be any data type, such as a string, number, or logical value.

String
 dataset |> 
  mutate(new_column = "value") 
Number
 dataset |>
  mutate(new_column = 7) 
Logical
 dataset |>
  mutate(new_column = TRUE) 
Example: Adding a Constant Column

    Let’s add a new column called country with the value “Netherlands” for all flowers in our dataset.

     flowers |> 
      mutate(country = "Netherlands") 
    nameheightseasonsunlightgrowthcountry
    Poppy75Spring8.3fastNetherlands
    Rose150Summer6.4slowNetherlands
    Zinnia60Summer8.7fastNetherlands
    Peony90Spring7.2slowNetherlands
Exercise 6.1

Using the flowers dataset, create a new column called color with the value “Red” for all flowers.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

Creating Columns by Calculating with a Value

You can create new columns by performing calculations on existing columns.

 dataset |> 
  mutate(new_column = expression) 

Here, expression can be any mathematical operation or function that uses existing columns.

Example: Calculating Height in Meters

    Let’s create a new column height_m that converts the height from centimeters to meters.

     flowers |> 
      mutate(height_m = height / 100) 
    nameheightseasonsunlightgrowthheight_m
    Poppy75Spring8.3fast0.75
    Rose150Summer6.4slow1.5
    Zinnia60Summer8.7fast0.6
    Peony90Spring7.2slow0.9
Exercise 6.2

Create a new column called sunlight_hours that converts the sunlight value from decimal to whole hours by rounding down. Hint: You can use the floor() function to round down.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

Creating Columns by Combining Other Columns

You can create new columns by combining or manipulating multiple existing columns.

For example, you can use the paste() function to combine strings from different columns.

 dataset |> 
  mutate(new_column = paste(column1, column2, sep = " ") 
Example: Combining Columns

    Let’s create a new column description that combines the name and growth columns.

     flowers |> 
      mutate(description = paste(name, "is a", growth, "growing flower")) 
    nameheightseasonsunlightgrowthdescription
    Poppy75Spring8.3fastPoppy is a fast growing flower
    Rose150Summer6.4slowRose is a slow growing flower
    Zinnia60Summer8.7fastZinnia is a fast growing flower
    Peony90Spring7.2slowPeony is a slow growing flower
Exercise 6.3

Create a new column called flower_info that combines the name , season , and height columns.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

Creating Columns Based on Conditions

You can create new columns based on conditions using case_when() .

 dataset |> 
  mutate(new_column = case_when(
    condition1 ~ value1,
    condition2 ~ value2,
    TRUE ~ value3
  )) 

Here, condition1 , condition2 , etc., are logical conditions, and value1 , value2 , etc., are the values assigned to the new column based on the conditions. The TRUE ~ value3 is the default value if none of the conditions are met.

Example: Conditional Column

    Let’s create a new column height_category based on the height of the flowers.

     flowers |> 
      mutate(height_category = case_when(
        height < 70 ~ "Short",
        height < 100 ~ "Medium",
        TRUE ~ "Tall"
      )) 
    nameheightseasonsunlightgrowthheight_category
    Poppy75Spring8.3fastMedium
    Rose150Summer6.4slowTall
    Zinnia60Summer8.7fastShort
    Peony90Spring7.2slowMedium
Exercise 6.4

Create a new column called sunlight_category based on the sunlight value of the flowers. Use the following categories:

  • “Low” for sunlight less than 7 hours
  • “Medium” for sunlight between 7 and 8 hours
  • “High” for sunlight more than 8 hours
nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

Review

Quiz

    Loading...

    Loading...

Summary You've learned how to:
  • Create new columns using the mutate function in dplyr.
  • Create columns with constant values.
  • Create columns by performing calculations.
  • Create columns by combining other columns.
  • Create columns based on conditions.