A Beginner's Introduction to R Through the Tidyverse
Section 2 of 5
Working with Data
The tidyverse offers wonderful functions that make it easy to work with data.
Installing and loading the tidyverse package
To install the tidyverse package, we need to run this code. You only need to do this once on your computer.
install.packages(“tidyverse”)
Now, whenever you are using R and need to use the tidyverse, make sure to load the package by adding this line in the beginning and running it.
library(tidyverse)
A step by step approach
In R, there is a nice feature called the pipe operator. It is represented using two characters on the keyboard placed together like this: |>
Whenever you see |>
, say “and then”.
This allows us to think about code in a human readable and step-by-step way. We will use this approach to write some code.
Filtering rows
Let’s answer a question: In the mtcars dataset, what are the cars with milage greater than 30 miles per gallon(mpg)?
To do this, let’s use the filter
function from the tidyverse package.
You can read the following line of code as:
Take the mtcars datset, and then filter the rows where miles per gallon is greater than 30.
In the output, we can see that the rows with mpg greater than 30 are displayed.
Your turn! Try filtering the rows where the number of mpg is less than 20. In the code editor, replace the filter()
function with filter(mpg < 20)
and run the code.
Selecting columns
To select columns, we use the select function.
For example, if we want to select the mpg
column we can do this by passing then column name inside select()
like this:
We can also select multiple columns by passing their names inside select()
like this:
Your turn! Try selecting the mpg
and hp
columns. In the code editor, replace the select()
function with select(mpg, hp)
and run the code.
Combining filter and select
We can also combine the filter
and select
functions in the same step.
For example, if we want to select the mpg
and hp
columns from the mtcars dataset where the mpg is greater than 30, we can do this:
Your turn! Try selecting the mpg
and hp
columns from the mtcars dataset where the mpg is less than 20. In the code editor, replace the filter()
function with filter(mpg < 20)
and run the code.
Creating a new column
To create a new column with tidyverse, we will use a function called mutate
.
For example, if we want to create a kilometer per liter column from the miles per gallon column, we can do this:
We can also create a new column using the existing columns based on some condition.
For example, if we want to create a new column called mpg_category
based on the miles per gallon column, we can do this using case_when
function:
In the above code, we are creating a new column called mpg_category
based on the miles per gallon column. If the mpg is greater than 30, the category will be “high”, if the mpg is greater than 20, the category will be “medium”, otherwise, the category will be “low”. The TRUE ~ "low"
part is used to handle the default case. This can also be in one line, but it is easier to read when it is written in multiple lines.
Exercise
Your turn! Try creating a new column called hp_category
based on the horsepower column. If the horsepower is greater than 200, the category should be “high”, if the horsepower is greater than 100, the category should be “medium”, otherwise, the category should be “low”. In the code editor, replace the mutate()
function with mutate(hp_category = case_when(hp > 200 ~ "high", hp > 100 ~ "medium", TRUE ~ "low"))
and run the code.
Review
In this section, we learned about the filter
, select
, and mutate
functions from the tidyverse package. We used these functions to filter rows, select columns, and create new columns in the mtcars dataset.
Which function is used to filter rows in the tidyverse?
Which function is used to select columns in the tidyverse?
Which function is used to create a new column in the tidyverse?
Sections in this course
A Beginner's Introduction to R Through the Tidyverse