Data Wrangling in R with dplyr and tidyr

Reshaping Data

Learn how to reshape data between wide and long formats using tidyr functions

Course Sections

What is reshaping data?

Reshaping data is the process of changing the layout of your data to make it easier to analyze. Data can be in different formats, such as wide or long, depending on how it is structured. We can convert data from wide to long or from long to wide formats.

What we’ll learn in this section

Goals In this section, we'll:
  • understand the concepts of wide and long data formats
  • use pivot_longer() to reshape data from wide to long format
  • use pivot_wider() to reshape data from long to wide format

Wide Format

In wide format, each row represents a unique observation, and each column represents a variable. This format is useful for storing data where each variable has its own column.

Wide Format

    Consider the following data frame students in wide format:

     # A tibble: 4 × 4
         id name  section study  play
      <int> <chr> <chr>   <dbl> <dbl>
    1     1 Alia  A           2     5
    2     2 Bala  B           8     5
    3     3 Cara  A          NA    10
    4     4 Dana  B           4    10 

    In this wide format, each student’s study and play hours are stored in separate columns.

Long Format

In long format, each row represents a unique observation. This format is useful for storing data where each variable is stored in a single column.

Long Format

    Consider the following data frame activity_hours in long format:

     # A tibble: 8 × 4
         id name  section activity hours
      <dbl> <chr> <chr>   <chr>    <dbl>
    1     1 Alia  A       study        2
    2     1 Alia  A       play         5
    3     2 Bala  B       study        8
    4     2 Bala  B       play         5
    5     3 Cara  A       study       NA
    6     3 Cara  A       play        10
    7     4 Dana  B       study        4
    8     4 Dana  B       play        10 

    In this long format, each student’s study and play hours are stored in separate rows.

Reshaping Data with tidyr

The tidyr package in R provides functions to reshape data between wide and long formats. Two key functions for reshaping data are pivot_longer() and pivot_wider() .

Pivot Longer

The pivot_longer() function is used to reshape data from wide to long format.

Syntax
     pivot_longer(data, cols, names_to, values_to) 
Example

    Let’s use the pivot_longer() function to reshape the students data frame from wide to long format.

    idnamesectionstudyplay
    1AliaA25
    2BalaB85
    3CaraANA10
    4DanaB410
     students |> 
      pivot_longer(cols = c(study, play), 
                   names_to = "activity", 
                   values_to = "hours") 

    Output:

     # A tibble: 8 × 5
         id name  section activity hours
      <int> <chr> <chr>   <chr>    <dbl>
    1     1 Alia  A       study        2
    2     1 Alia  A       play         5
    3     2 Bala  B       study        8
    4     2 Bala  B       play         5
    5     3 Cara  A       study       NA
    6     3 Cara  A       play        10
    7     4 Dana  B       study        4
    8     4 Dana  B       play        10 

    Before (Wide Format)

    idnamesectionstudyplay
    1AliaA25
    2BalaB85
    3CaraANA10
    4DanaB410

    After (Long Format)

    idnamesectionactivityhours
    1AliaAstudy2
    1AliaAplay5
    2BalaBstudy8
    2BalaBplay5
    3CaraAstudyNA
    3CaraAplay10
    4DanaBstudy4
    4DanaBplay10
Exercise 5.1: Pivot Longer

Use pivot_longer() to reshape the following grades data frame from wide to long format. The new columns should be named “subject” and “score”.

studentmathsciencehistory
Alice859278
Bob918589
Charlie768895

Pivot Wider

The pivot_wider() function is used to reshape data from long to wide format.

Syntax
     pivot_wider(data, names_from, values_from) 
Example

    Let’s use the pivot_wider() function to reshape a long format data frame into a wide format.

    idnamesectionactivityhours
    1AliaAstudy2
    1AliaAplay5
    2BalaBstudy8
    2BalaBplay5
    3CaraAstudyNA
    3CaraAplay10
    4DanaBstudy4
    4DanaBplay10
     activity_hours |>
      pivot_wider(names_from = activity, values_from = hours) 

    Output:

     # A tibble: 4 × 5
         id name  section study  play
      <dbl> <chr> <chr>   <dbl> <dbl>
    1     1 Alia  A           2     5
    2     2 Bala  B           8     5
    3     3 Cara  A          NA    10
    4     4 Dana  B           4    10 

    Before (Long Format)

    idnamesectionactivityhours
    1AliaAstudy2
    1AliaAplay5
    2BalaBstudy8
    2BalaBplay5
    3CaraAstudyNA
    3CaraAplay10
    4DanaBstudy4
    4DanaBplay10

    After (Wide Format)

    idnamesectionstudyplay
    1AliaA25
    2BalaB85
    3CaraANA10
    4DanaB410
Exercise 5.2: Pivot Wider

Use pivot_wider() to reshape the following grades_long data frame from long to wide format. The new columns should be named “math”, “science”, and “history”.

studentsubjectscore
Alicemath85
Alicescience92
Alicehistory78
Bobmath91
Bobscience85
Bobhistory89
Charliemath76
Charliescience88
Charliehistory95

Review

In this section, we’ve learned about reshaping data between wide and long formats using the tidyr package in R.

Quiz

    Loading...

    Loading...

    Loading...

Summary In this section, we learned how to:
    • Reshape data from wide to long format using pivot_longer()
    • Reshape data from long to wide format using pivot_wider()

We are almost done with the course! In the next section, we will conclude our learning journey with a summary of what we’ve covered and some self-assessment tasks. 🎉