Introduction to Python for Data Analysis

Summarizing Data

Learn how to summarize data in Python using the pandas library.

Course Sections
Goals In this section, you will learn how to:
  • Calculate basic statistics of a DataFrame
  • Group data and perform aggregate functions
  • Use different methods to summarize data

Sample DataFrame

Let’s start with a sample DataFrame that we’ll use throughout this section:

idnameagecitysalary
1Alice25New York50000
2Bob30San Francisco75000
3Carol35Chicago60000
4David40New York80000
5Emily28Chicago55000

Basic Statistics

Pandas provides several methods to calculate basic statistics of a DataFrame.

Syntax

 # Basic statistics
df.describe()
 
# Mean of a column
df['column_name'].mean()
 
# Median of a column
df['column_name'].median() 

Exercise 3.1: Basic Statistics

Run the code below to get basic statistics of the numerical columns:

Basic Statistics

Exercise 3.2: Specific Statistics

Modify the code below to calculate the mean age and median salary:

Calculate mean age and median salary

Grouping and Aggregation

Pandas allows you to group data by one or more columns and perform aggregate functions on other columns.

Syntax

 # Group by one column and calculate mean
df.groupby('column_name')['other_column'].mean()
 
# Group by multiple columns and calculate multiple aggregations
df.groupby(['column1', 'column2']).agg({'column3': 'mean', 'column4': 'sum'}) 

Exercise 3.3: Grouping by One Column

Modify the code below to calculate the average salary for each city:

Average salary by city

Exercise 3.4: Grouping by Multiple Columns

Modify the code below to calculate the average age and total salary for each city:

Average age and total salary by city

Exercise 3.5: Advanced Summarization

Modify the code below to get the following summary:

  • Total number of employees
  • Average age of employees
  • Highest salary
  • City with the most employees
Advanced summary

Quiz

Loading...

Loading...

summary We've learned how to:
  • Calculate basic statistics using methods like describe() , mean() , and median()
  • Group data using groupby() and perform aggregate functions
  • Combine multiple aggregations in a single operation
  • Create custom summaries using various pandas functions