Why select columns?
It is a good practice to only use relevant information for analysis. Selecting only specific columns can make your work more efficient, easier to manage, reduce complexity, improve performance, and protect sensitive data. You can always go back and choose other columns if necessary.
What we’ll learn in this section
- select columns using the
select
function - select columns based on their positions
- select a range of columns
- unselect columns using the
-
operator - select columns based on their data type using the
where
function
A step-by-step way to think about data
Before we learn how to select columns in R, we’ll quickly learn about an operator known as the pipe operator. It is represented using two characters on the keyboard placed together like this: |>
The pipe operator connects previous steps to the next step.
Whenever you see |>
, say “and then”.
This allows us to think about code in a human readable and step-by-step way. We will use this approach to write some code.
You’ll find the |
key above the enter key. To use it, press Shift + |
.
The >
key is before the ?
. To use it, press Shift + >
.
By combining them, we can type: |>
Selecting columns with column names
In tidyverse
, we can select columns from a dataset using the select
function.
Syntax
You can read the following syntax like “take the dataset, and then select column1 and column2”.
In our flowers dataset, we have these coloumns: name
, height
, season
, sunlight
, and growth
.
name | height | season | sunlight | growth |
---|---|---|---|---|
Poppy | 75 | Spring | 8.3 | fast |
Rose | 150 | Summer | 6.4 | slow |
Zinnia | 60 | Summer | 8.7 | fast |
Peony | 90 | Spring | 7.2 | slow |
If we want to select the name
and height
columns, we can do this using the select
function.
This code will select the name
and height
columns from the flowers dataset.
name | height |
---|---|
Poppy | 75 |
Rose | 150 |
Zinnia | 60 |
Peony | 90 |
Now it is your turn to practice. Let’s try the next two exercises. For the first exercise, you simply have to click the run button. For the second exercise, you’ll type in the column names separated by a comma.
Run the code below to see the output of selecting the name
and height
columns from the flowers dataset.
Select the season
and sunlight
columns from the flowers dataset.
name | height | season | sunlight | growth |
---|---|---|---|---|
Poppy | 75 | Spring | 8.3 | fast |
Rose | 150 | Summer | 6.4 | slow |
Zinnia | 60 | Summer | 8.7 | fast |
Peony | 90 | Spring | 7.2 | slow |
Good work! Next, we’ll learn how to select columns using column positions.
Selecting columns using column positions
We can also select columns using numbers to indicate their positions. For example, to choose the first column, we can use select(1)
. To select more columns, we can separate them with commas.
In the flowers dataset, we can select the first and third columns using their positions.
This code will select the first and third columns from the flowers dataset.
Run the code below to see the output of selecting the first and third columns from the flowers dataset.
name | height | season | sunlight | growth |
---|---|---|---|---|
Poppy | 75 | Spring | 8.3 | fast |
Rose | 150 | Summer | 6.4 | slow |
Zinnia | 60 | Summer | 8.7 | fast |
Peony | 90 | Spring | 7.2 | slow |
Try selecting the second and fourth columns from the flowers dataset.
name | height | season | sunlight | growth |
---|---|---|---|---|
Poppy | 75 | Spring | 8.3 | fast |
Rose | 150 | Summer | 6.4 | slow |
Zinnia | 60 | Summer | 8.7 | fast |
Peony | 90 | Spring | 7.2 | slow |
Next, we’ll learn how to select a range of columns.
Selecting a range of columns
We can select a range of columns using the colon symbol and also known as the :
operator.
This code will select the columns from name
to season
from the flowers dataset.
Run the code below to see the output of selecting the columns from name
to season
from the flowers dataset.
name | height | season | sunlight | growth |
---|---|---|---|---|
Poppy | 75 | Spring | 8.3 | fast |
Rose | 150 | Summer | 6.4 | slow |
Zinnia | 60 | Summer | 8.7 | fast |
Peony | 90 | Spring | 7.2 | slow |
This will select the name
, height
, and season
columns.
You can also select a range based on column numbers. Try selecting the columns from the first to the third column.
Hint: 1:3
name | height | season | sunlight | growth |
---|---|---|---|---|
Poppy | 75 | Spring | 8.3 | fast |
Rose | 150 | Summer | 6.4 | slow |
Zinnia | 60 | Summer | 8.7 | fast |
Peony | 90 | Spring | 7.2 | slow |
Unselecting columns
If you want to unselect columns, you can use the -
operator.
This code will unselect the name
column from the flowers dataset.
To unselect multiple columns, we wrap them with the c()
function.
This code will unselect the name
and ` column from the flowers dataset.
Selecting Columns based on Data Type
You can select columns based on their data type. For example, to select all columns that are numeric, you can use the where
function and wrap it around is.numeric
.
Selecting numeric columns
Selecting character columns
In the flowers dataset, we select only the numeric columns using the where
function.
This code will select only the height
and sunlight
columns from the flowers dataset.
height | sunlight |
---|---|
75 | 8.3 |
150 | 6.4 |
60 | 8.7 |
90 | 7.2 |
Run the code below to see the output of selecting only the numeric columns from the flowers dataset.
name | height | season | sunlight | growth |
---|---|---|---|---|
Poppy | 75 | Spring | 8.3 | fast |
Rose | 150 | Summer | 6.4 | slow |
Zinnia | 60 | Summer | 8.7 | fast |
Peony | 90 | Spring | 7.2 | slow |
Select only the character columns from the flowers dataset.
name | height | season | sunlight | growth |
---|---|---|---|---|
Poppy | 75 | Spring | 8.3 | fast |
Rose | 150 | Summer | 6.4 | slow |
Zinnia | 60 | Summer | 8.7 | fast |
Peony | 90 | Spring | 7.2 | slow |
You have reached the end of this section. Let’s review what we’ve learned!
Review
Great job! We have made our first steps in selecting columns in R. Selecting columns is an essential skill in data analysis.
Loading...
Loading...
Loading...
- Select columns using the
select
function - Select columns based on their positions
- Select a range of columns
- Unselect columns using the
-
operator - Select columns based on their data type using the
where
function