Why select columns?
It is a good practice to only use relevant information for analysis. Selecting only specific columns can make your work more efficient, easier to manage, reduce complexity, improve performance, and protect sensitive data. You can always go back and choose other columns if necessary.
What we’ll learn in this section
- select columns using the
selectfunction - select columns based on their positions
- select a range of columns
- unselect columns using the
-operator - select columns based on their data type using the
wherefunction
A step-by-step way to think about data
Before we learn how to select columns in R, we’ll quickly learn about an operator known as the pipe operator. It is represented using two characters on the keyboard placed together like this: |> The pipe operator connects previous steps to the next step.
Whenever you see |> , say “and then”.
This allows us to think about code in a human readable and step-by-step way. We will use this approach to write some code.
You’ll find the | key above the enter key. To use it, press Shift + | .
The > key is before the ? . To use it, press Shift + > .
By combining them, we can type: |>
Selecting columns with column names
In tidyverse , we can select columns from a dataset using the select function.
Syntax
You can read the following syntax like “take the dataset, and then select column1 and column2”.
dataset |>
select(column1, column2) In our flowers dataset, we have these coloumns: name , height , season , sunlight , and growth .
| name | height | season | sunlight | growth |
|---|---|---|---|---|
| Poppy | 75 | Spring | 8.3 | fast |
| Rose | 150 | Summer | 6.4 | slow |
| Zinnia | 60 | Summer | 8.7 | fast |
| Peony | 90 | Spring | 7.2 | slow |
If we want to select the name and height columns, we can do this using the select function.
flowers |>
select(name, height) This code will select the name and height columns from the flowers dataset.
| name | height |
|---|---|
| Poppy | 75 |
| Rose | 150 |
| Zinnia | 60 |
| Peony | 90 |
Now it is your turn to practice. Let’s try the next two exercises. For the first exercise, you simply have to click the run button. For the second exercise, you’ll type in the column names separated by a comma.
Run the code below to see the output of selecting the name and height columns from the flowers dataset.
Select the season and sunlight columns from the flowers dataset.
| name | height | season | sunlight | growth |
|---|---|---|---|---|
| Poppy | 75 | Spring | 8.3 | fast |
| Rose | 150 | Summer | 6.4 | slow |
| Zinnia | 60 | Summer | 8.7 | fast |
| Peony | 90 | Spring | 7.2 | slow |
Good work! Next, we’ll learn how to select columns using column positions.
Selecting columns using column positions
We can also select columns using numbers to indicate their positions. For example, to choose the first column, we can use select(1) . To select more columns, we can separate them with commas.
In the flowers dataset, we can select the first and third columns using their positions.
flowers |>
select(1, 3) This code will select the first and third columns from the flowers dataset.
Run the code below to see the output of selecting the first and third columns from the flowers dataset.
| name | height | season | sunlight | growth |
|---|---|---|---|---|
| Poppy | 75 | Spring | 8.3 | fast |
| Rose | 150 | Summer | 6.4 | slow |
| Zinnia | 60 | Summer | 8.7 | fast |
| Peony | 90 | Spring | 7.2 | slow |
Try selecting the second and fourth columns from the flowers dataset.
| name | height | season | sunlight | growth |
|---|---|---|---|---|
| Poppy | 75 | Spring | 8.3 | fast |
| Rose | 150 | Summer | 6.4 | slow |
| Zinnia | 60 | Summer | 8.7 | fast |
| Peony | 90 | Spring | 7.2 | slow |
Next, we’ll learn how to select a range of columns.
Selecting a range of columns
We can select a range of columns using the colon symbol and also known as the : operator.
flowers |>
select(name:season) This code will select the columns from name to season from the flowers dataset.
Run the code below to see the output of selecting the columns from name to season from the flowers dataset.
| name | height | season | sunlight | growth |
|---|---|---|---|---|
| Poppy | 75 | Spring | 8.3 | fast |
| Rose | 150 | Summer | 6.4 | slow |
| Zinnia | 60 | Summer | 8.7 | fast |
| Peony | 90 | Spring | 7.2 | slow |
This will select the name , height , and season columns.
You can also select a range based on column numbers. Try selecting the columns from the first to the third column.
Hint: 1:3
| name | height | season | sunlight | growth |
|---|---|---|---|---|
| Poppy | 75 | Spring | 8.3 | fast |
| Rose | 150 | Summer | 6.4 | slow |
| Zinnia | 60 | Summer | 8.7 | fast |
| Peony | 90 | Spring | 7.2 | slow |
Unselecting columns
If you want to unselect columns, you can use the - operator.
flowers |>
select(-name) This code will unselect the name column from the flowers dataset.
To unselect multiple columns, we wrap them with the c() function.
flowers |>
select(-c(name, season)) This code will unselect the name and ` column from the flowers dataset.
Selecting Columns based on Data Type
You can select columns based on their data type. For example, to select all columns that are numeric, you can use the where function and wrap it around is.numeric .
Selecting numeric columns
dataset |>
select(where(is.numeric)) Selecting character columns
dataset |>
select(where(is.character)) In the flowers dataset, we select only the numeric columns using the where function.
flowers |>
select(where(is.numeric)) This code will select only the height and sunlight columns from the flowers dataset.
| height | sunlight |
|---|---|
| 75 | 8.3 |
| 150 | 6.4 |
| 60 | 8.7 |
| 90 | 7.2 |
Run the code below to see the output of selecting only the numeric columns from the flowers dataset.
| name | height | season | sunlight | growth |
|---|---|---|---|---|
| Poppy | 75 | Spring | 8.3 | fast |
| Rose | 150 | Summer | 6.4 | slow |
| Zinnia | 60 | Summer | 8.7 | fast |
| Peony | 90 | Spring | 7.2 | slow |
Select only the character columns from the flowers dataset.
| name | height | season | sunlight | growth |
|---|---|---|---|---|
| Poppy | 75 | Spring | 8.3 | fast |
| Rose | 150 | Summer | 6.4 | slow |
| Zinnia | 60 | Summer | 8.7 | fast |
| Peony | 90 | Spring | 7.2 | slow |
You have reached the end of this section. Let’s review what we’ve learned!
Review
Great job! We have made our first steps in selecting columns in R. Selecting columns is an essential skill in data analysis.
Loading...
Loading...
Loading...
- Select columns using the
selectfunction - Select columns based on their positions
- Select a range of columns
- Unselect columns using the
-operator - Select columns based on their data type using the
wherefunction