What is Exploratory Data Analysis?
Exploratory Data Analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. We try to understand the data, discover patterns, and identify relationships between variables. EDA is an essential step in the data analysis process as it provides us with preliminary insights that guide further analysis and modeling.
What you’ll learn in this course
In this course, you’ll learn how to explore and analyze data using R. You’ll learn how to summarize data, visualize data, and identify patterns in data. You’ll also learn how to ask questions about the data and answer them using data analysis techniques.
Here is a brief overview of what you’ll learn in this course:
1
- 1
- 2
2
- 1
- 2
3
- 1
- 2
4
- 1
- 2
5
- 1
- 2
Datasets used in this course
In this course, we’ll be using two datasets: ice_cream
and monthly_sales
.
Ice Cream Dataset
The ice_cream
dataset has 6 rows and 5 columns: id
, name
, category
, rating
, price
.
id | name | category | rating | price |
---|---|---|---|---|
1 | Vanilla | Classic | 4.1 | 2.5 |
2 | Chocolate | Classic | 4.3 | 3 |
3 | Strawberry | Fruit | 4.2 | 3.25 |
4 | Mango | Fruit | 4.7 | 3.25 |
5 | Cookie Dough | Specialty | 4 | 3.5 |
6 | Blueberry | Fruit | 3.8 | 3 |
Here is the code to create the ice_cream
dataset:
Monthly Sales Dataset
The monthly_sales
dataset has 288 rows and 4 columns: flavor_id
, date
, region
, sales
.
This are the first 10 rows of the monthly_sales
dataset
flavor_id | date | region | sales |
---|---|---|---|
1 | 2021-01-01 | North | 100 |
2 | 2021-01-01 | North | 150 |
3 | 2021-01-01 | North | 200 |
4 | 2021-01-01 | North | 250 |
5 | 2021-01-01 | North | 300 |
6 | 2021-01-01 | North | 350 |
1 | 2021-01-01 | South | 100 |
2 | 2021-01-01 | South | 150 |
3 | 2021-01-01 | South | 200 |
4 | 2021-01-01 | South | 250 |
Here is the code to create the full monthly_sales
dataset: