Course

Exploratory Data Analysis with R

Learn how to explore and analyze data using R

Course Sections

What is Exploratory Data Analysis?

Exploratory Data Analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. We try to understand the data, discover patterns, and identify relationships between variables. EDA is an essential step in the data analysis process as it provides us with preliminary insights that guide further analysis and modeling.

What you’ll learn in this course

In this course, you’ll learn how to explore and analyze data using R. You’ll learn how to summarize data, visualize data, and identify patterns in data. You’ll also learn how to ask questions about the data and answer them using data analysis techniques.

Here is a brief overview of what you’ll learn in this course:

1

  • 1
  • 2

2

  • 1
  • 2

3

  • 1
  • 2

4

  • 1
  • 2

5

  • 1
  • 2

Datasets used in this course

In this course, we’ll be using two datasets: ice_cream and monthly_sales .

Ice Cream Dataset

The ice_cream dataset has 6 rows and 5 columns: id , name , category , rating , price .

idnamecategoryratingprice
1VanillaClassic4.12.5
2ChocolateClassic4.33
3StrawberryFruit4.23.25
4MangoFruit4.73.25
5Cookie DoughSpecialty43.5
6BlueberryFruit3.83

Here is the code to create the ice_cream dataset:

 ice_cream <- data.frame(id = 1:6, 
                    name = c("Vanilla", "Chocolate", "Strawberry", "Mango", "Cookie Dough", "Blueberry"), 
                    category = c("Classic", "Classic", "Fruit", "Fruit", "Specialty", "Fruit"), 
                    rating = c(4.1, 4.3, 4.2, 4.7, 4.0, 3.8),
                    price = c(2.5, 3.0, 3.25, 3.25, 3.5, 3.0)) 

Monthly Sales Dataset

The monthly_sales dataset has 288 rows and 4 columns: flavor_id , date , region , sales .

This are the first 10 rows of the monthly_sales dataset

flavor_iddateregionsales
12021-01-01North100
22021-01-01North150
32021-01-01North200
42021-01-01North250
52021-01-01North300
62021-01-01North350
12021-01-01South100
22021-01-01South150
32021-01-01South200
42021-01-01South250

Here is the code to create the full monthly_sales dataset:

 monthly_sales <- data.frame(flavor_id = rep(1:6, 48), 
                            date = rep(seq(as.Date("2021-01-01"), by = "month", length.out = 12), each = 24), 
                            region = rep(rep(c("North", "South"), each = 12), 24), 
                            sales = sample(100:500, 288, replace = TRUE))