A Beginner's Introduction to R Through the Tidyverse

Section 2 of 5

Working with Data

The tidyverse offers wonderful functions that make it easy to work with data.

Installing and loading the tidyverse package

To install the tidyverse package, we need to run this code. You only need to do this once on your computer.

install.packages(“tidyverse”)

Now, whenever you are using R and need to use the tidyverse, make sure to load the package by adding this line in the beginning and running it.

library(tidyverse)

A step by step approach

In R, there is a nice feature called the pipe operator. It is represented using two characters on the keyboard placed together like this: |>

Whenever you see |>, say “and then”.

This allows us to think about code in a human readable and step-by-step way. We will use this approach to write some code.

Filtering rows

Let’s answer a question: In the mtcars dataset, what are the cars with milage greater than 30 miles per gallon(mpg)?

To do this, let’s use the filter function from the tidyverse package.

You can read the following line of code as:

Take the mtcars datset, and then filter the rows where miles per gallon is greater than 30.

filter

In the output, we can see that the rows with mpg greater than 30 are displayed.

Your turn! Try filtering the rows where the number of mpg is less than 20. In the code editor, replace the filter() function with filter(mpg < 20) and run the code.

filter

Selecting columns

To select columns, we use the select function.

For example, if we want to select the mpg column we can do this by passing then column name inside select() like this:

select

We can also select multiple columns by passing their names inside select() like this:

select

Your turn! Try selecting the mpg and hp columns. In the code editor, replace the select() function with select(mpg, hp) and run the code.

select

Combining filter and select

We can also combine the filter and select functions in the same step.

For example, if we want to select the mpg and hp columns from the mtcars dataset where the mpg is greater than 30, we can do this:

filter and select

Your turn! Try selecting the mpg and hp columns from the mtcars dataset where the mpg is less than 20. In the code editor, replace the filter() function with filter(mpg < 20) and run the code.

filter and select

Creating a new column

To create a new column with tidyverse, we will use a function called mutate.

For example, if we want to create a kilometer per liter column from the miles per gallon column, we can do this:

We can also create a new column using the existing columns based on some condition.

For example, if we want to create a new column called mpg_category based on the miles per gallon column, we can do this using case_when function:

In the above code, we are creating a new column called mpg_category based on the miles per gallon column. If the mpg is greater than 30, the category will be “high”, if the mpg is greater than 20, the category will be “medium”, otherwise, the category will be “low”. The TRUE ~ "low" part is used to handle the default case. This can also be in one line, but it is easier to read when it is written in multiple lines.

Exercise

Your turn! Try creating a new column called hp_category based on the horsepower column. If the horsepower is greater than 200, the category should be “high”, if the horsepower is greater than 100, the category should be “medium”, otherwise, the category should be “low”. In the code editor, replace the mutate() function with mutate(hp_category = case_when(hp > 200 ~ "high", hp > 100 ~ "medium", TRUE ~ "low")) and run the code.

Review

In this section, we learned about the filter, select, and mutate functions from the tidyverse package. We used these functions to filter rows, select columns, and create new columns in the mtcars dataset.

Which function is used to filter rows in the tidyverse?

Which function is used to select columns in the tidyverse?

Which function is used to create a new column in the tidyverse?

Go to Next Section Visualizing Data →

Sections in this course

A Beginner's Introduction to R Through the Tidyverse

0%0/9