A Beginner's Introduction to R Through the Tidyverse

Selecting columns

Learn how to select columns.

Course Sections

Why select columns?

It is a good practice to only use relevant information for analysis. Selecting only specific columns can make your work more efficient, easier to manage, reduce complexity, improve performance, and protect sensitive data. You can always go back and choose other columns if necessary.

What we’ll learn in this section

GOALS In this section, we'll learn how to:
  • select columns using the select function
  • select columns based on their positions
  • select a range of columns
  • unselect columns using the - operator
  • select columns based on their data type using the where function

A step-by-step way to think about data

Before we learn how to select columns in R, we’ll quickly learn about an operator known as the pipe operator. It is represented using two characters on the keyboard placed together like this: |> The pipe operator connects previous steps to the next step.

Whenever you see |> , say “and then”.

This allows us to think about code in a human readable and step-by-step way. We will use this approach to write some code.

You’ll find the | key above the enter key. To use it, press Shift + | .

The > key is before the ? . To use it, press Shift + > .

By combining them, we can type: |>

Selecting columns with column names

In tidyverse , we can select columns from a dataset using the select function.

Syntax

You can read the following syntax like “take the dataset, and then select column1 and column2”.

 dataset |>
  select(column1, column2) 
Example

    In our flowers dataset, we have these coloumns: name , height , season , sunlight , and growth .

    nameheightseasonsunlightgrowth
    Poppy75Spring8.3fast
    Rose150Summer6.4slow
    Zinnia60Summer8.7fast
    Peony90Spring7.2slow

    If we want to select the name and height columns, we can do this using the select function.

     flowers |>
      select(name, height) 

    This code will select the name and height columns from the flowers dataset.

    nameheight
    Poppy75
    Rose150
    Zinnia60
    Peony90

Now it is your turn to practice. Let’s try the next two exercises. For the first exercise, you simply have to click the run button. For the second exercise, you’ll type in the column names separated by a comma.

Exercise 2.1

Run the code below to see the output of selecting the name and height columns from the flowers dataset.

Exercise 2.2

Select the season and sunlight columns from the flowers dataset.


nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

Good work! Next, we’ll learn how to select columns using column positions.

Selecting columns using column positions

We can also select columns using numbers to indicate their positions. For example, to choose the first column, we can use select(1) . To select more columns, we can separate them with commas.

Example

    In the flowers dataset, we can select the first and third columns using their positions.

     flowers |>
      select(1, 3) 

    This code will select the first and third columns from the flowers dataset.

Exercise 2.3

Run the code below to see the output of selecting the first and third columns from the flowers dataset.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow
Exercise 2.4

Try selecting the second and fourth columns from the flowers dataset.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

Next, we’ll learn how to select a range of columns.

Selecting a range of columns

We can select a range of columns using the colon symbol and also known as the : operator.

Example
     flowers |>
      select(name:season) 

    This code will select the columns from name to season from the flowers dataset.

Exercise 2.5

Run the code below to see the output of selecting the columns from name to season from the flowers dataset.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

This will select the name , height , and season columns.

Exercise 2.6

You can also select a range based on column numbers. Try selecting the columns from the first to the third column. Hint: 1:3

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

Unselecting columns

If you want to unselect columns, you can use the - operator.

Example
     flowers |>
      select(-name) 

    This code will unselect the name column from the flowers dataset.

To unselect multiple columns, we wrap them with the c() function.

Example
     flowers |>
      select(-c(name, season)) 

    This code will unselect the name and ` column from the flowers dataset.

Exercise

Selecting Columns based on Data Type

You can select columns based on their data type. For example, to select all columns that are numeric, you can use the where function and wrap it around is.numeric .

Selecting numeric columns

 dataset |>
  select(where(is.numeric)) 

Selecting character columns

 dataset |>
  select(where(is.character)) 
Example

    In the flowers dataset, we select only the numeric columns using the where function.

     flowers |>
      select(where(is.numeric)) 

    This code will select only the height and sunlight columns from the flowers dataset.

    heightsunlight
    758.3
    1506.4
    608.7
    907.2
Exercise 2.7

Run the code below to see the output of selecting only the numeric columns from the flowers dataset.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow
Exercise 2.8

Select only the character columns from the flowers dataset.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

You have reached the end of this section. Let’s review what we’ve learned!

Review

Great job! We have made our first steps in selecting columns in R. Selecting columns is an essential skill in data analysis.

Quiz

    Loading...

    Loading...

    Loading...

Summary In this section, we've learned how to:
  • Select columns using the select function
  • Select columns based on their positions
  • Select a range of columns
  • Unselect columns using the - operator
  • Select columns based on their data type using the where function