Introduction to Python for Data Analysis

Selecting columns

Learn how to select columns in Python using pandas, with visible output.

Course Sections
goals In this section, we'll learn how to:
  • Select columns using pandas
  • Select columns based on their positions
  • Select a range of columns
  • Drop columns
  • Select columns based on their data type

A step by step approach using method chaining

In Python with pandas, we can use method chaining to perform operations step by step. This is similar to the pipe operator in R, but we use the dot (.) to chain methods in Python.

Selecting columns with column names

In pandas, we can select columns from a DataFrame using square brackets [] or the loc accessor.

Syntax

You can read the following syntax as “take the dataset, and then select column1 and column2”.

 df = dataset[['column1', 'column2']]
print(df)
# or
df = dataset.loc[:, ['column1', 'column2']]
print(df) 
Example

    In our flowers dataset, we have these columns: name , height , season , sunlight , and growth .

    nameheightseasonsunlightgrowth
    Poppy75Spring8.3fast
    Rose150Summer6.4slow
    Zinnia60Summer8.7fast
    Peony90Spring7.2slow

    If we want to select the name and height columns, we can do this:

     df = flowers[['name', 'height']]
    print(df)
    # or
    df = flowers.loc[:, ['name', 'height']]
    print(df) 

    This code will select the name and height columns from the flowers dataset and print the df.

Exercise 2.1

Run the code below to see the output of selecting the name and height columns from the flowers dataset.

Exercise 2.2

Select the season and sunlight columns from the flowers dataset.


nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

Selecting columns using column positions

We can also select columns using their positions with the iloc accessor.

Example

    In the flowers dataset, we can select the first and third columns using their positions.

     df = flowers.iloc[:, [0, 2]]
    print(df) 

    This code will select the first and third columns from the flowers dataset and print the df.

Exercise 2.3

Run the code below to see the output of selecting the first and third columns from the flowers dataset.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow
Exercise 2.4

Try selecting the second and fourth columns from the flowers dataset.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

Selecting a range of columns

We can select a range of columns using slice notation.

Example
     df = flowers.iloc[:, 0:3]
    print(df) 

    This code will select the first three columns from the flowers dataset and print the df.

Exercise 2.5

Run the code below to see the output of selecting the first three columns from the flowers dataset.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow
Exercise 2.6

Try selecting the columns from the second to the fourth column. Hint: Use 1:4 in the slice notation.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow

Dropping columns

If you want to drop columns, you can use the drop method.

Example
     df = flowers.drop('name', axis=1)
    print(df) 

    This code will drop the name column from the flowers dataset and print the df.

Selecting Columns based on Data Type

You can select columns based on their data type using the select_dtypes method.

Selecting numeric columns

 df = dataset.select_dtypes(include=['number'])
print(df) 

Selecting object (string) columns

 df = dataset.select_dtypes(include=['object'])
print(df) 
Example

    In the flowers dataset, we can select only the numeric columns:

     df = flowers.select_dtypes(include=['number'])
    print(df) 

    This code will select only the height and sunlight columns from the flowers dataset and print the df.

Exercise 2.7

Run the code below to see the output of selecting only the numeric columns from the flowers dataset.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow
Exercise 2.8

Select only the object (string) columns from the flowers dataset.

nameheightseasonsunlightgrowth
Poppy75Spring8.3fast
Rose150Summer6.4slow
Zinnia60Summer8.7fast
Peony90Spring7.2slow
Quiz

    Loading...

    Loading...

    Loading...

summary In this section, we've learned how to:
  • Select columns using pandas
  • Select columns based on their positions
  • Select a range of columns
  • Drop columns
  • Select columns based on their data type