Introduction | The School of Data

Exploratory Data Analysis with R Introduction

What is Exploratory Data Analysis?

Exploratory Data Analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. We try to understand the data, discover patterns, and identify relationships between variables. EDA is an essential step in the data analysis process as it provides us with preliminary insights that guide further analysis and modeling.

What you’ll learn in this course

In this course, you’ll learn how to explore and analyze data using R. You’ll learn how to summarize data, visualize data, and identify patterns in data. You’ll also learn how to ask questions about the data and answer them using data analysis techniques.

Here is a brief overview of what you’ll learn in this course:

Datasets used in this course

In this course, we’ll be using two datasets: ice_cream and monthly_sales .

Ice Cream Dataset

The ice_cream dataset has 6 rows and 5 columns: id , name , category , rating , price .

id	name	category	rating	price
1	Vanilla	Classic	4.1	2.5
2	Chocolate	Classic	4.3	3
3	Strawberry	Fruit	4.2	3.25
4	Mango	Fruit	4.7	3.25
5	Cookie Dough	Specialty	4	3.5
6	Blueberry	Fruit	3.8	3

Here is the code to create the ice_cream dataset:

 ice_cream <- data.frame(id = 1:6, 
                    name = c("Vanilla", "Chocolate", "Strawberry", "Mango", "Cookie Dough", "Blueberry"), 
                    category = c("Classic", "Classic", "Fruit", "Fruit", "Specialty", "Fruit"), 
                    rating = c(4.1, 4.3, 4.2, 4.7, 4.0, 3.8),
                    price = c(2.5, 3.0, 3.25, 3.25, 3.5, 3.0))

Monthly Sales Dataset

The monthly_sales dataset has 288 rows and 4 columns: flavor_id , date , region , sales .

This are the first 10 rows of the monthly_sales dataset

flavor_id	date	region	sales
1	2021-01-01	North	100
2	2021-01-01	North	150
3	2021-01-01	North	200
4	2021-01-01	North	250
5	2021-01-01	North	300
6	2021-01-01	North	350
1	2021-01-01	South	100
2	2021-01-01	South	150
3	2021-01-01	South	200
4	2021-01-01	South	250

Here is the code to create the full monthly_sales dataset:

 monthly_sales <- data.frame(flavor_id = rep(1:6, 48), 
                            date = rep(seq(as.Date("2021-01-01"), by = "month", length.out = 12), each = 24), 
                            region = rep(rep(c("North", "South"), each = 12), 24), 
                            sales = sample(100:500, 288, replace = TRUE))

Go to Next SectionFinding Quick Insights →

On this page

- What is Exploratory Data Analysis?
- What you’ll learn in this course
- Datasets used in this course