A Beginner's Introduction to R Through the Tidyverse

Visualizing Data

Learn how to create simple charts using ggplot2

Course Sections

Why visualize data?

Data visualization is a graphical approach for quickly communicating key insights and sharing information. The ggplot2 package in the tidyverse offers a way to create beautiful and customizable charts. Creating charts with code might initially seem slow and challenging. However, learning to create with code will simplify and automate the data analysis process, ultimately saving time and effort.

What we’ll learn in this section

goals In this section, we'll learn how to:
  • create a simple bar chart
  • create a simple scatter plot
  • customize the appearance of the charts
  • add titles and labels to the charts
  • change the color of the bars and points

By the end of this section, we’ll be able to create the following two charts:

Bar Chart & Scatter Plot
    A barchart with flower names and heights.A scatterplot with height and sunlight.

What is ggplot2?

ggplot2 is a data visualization package that is part of the tidyverse . It is based on a concept called the grammar of graphics, which is a way of thinking about creating graphics in a structured and consistent way.

If you don’t have tidyverse or ggplot2 installed, you can install using:

 install.packages("tidyverse")
 
# or 
 
install.packages("ggplot2") 

And then, you can load the library using:

 library(tidyverse)
 
# or
 
library(ggplot2) 

Making a Bar Chart

To create a ggplot, we pass the name of the dataset and then the x and y axis columns inside the aes() function.

 ggplot(dataset, aes(x = x_variable, y = y_variable))  

We can add a new layer using + and specify the type of chart we want to create. In this case, we are creating a bar chart using the geom_bar function.

 ggplot(data, aes(x = x_variable, y = y_variable)) +
  geom_bar() 

Inside the geom_bar() function we specify the type of statistic we want to use.

In this case, we’ll use stat = "identity" which uses the values we have in a column.

 ggplot(data, aes(x = x_variable, y = y_variable)) +
  geom_bar(stat = "identity") 
Example

    Let’s create a simple bar chart using the ggplot function with the flowers dataset.

    nameheightseasonsunlightgrowth
    Poppy75Spring8.3fast
    Rose150Summer6.4slow
    Zinnia60Summer8.7fast
    Peony90Spring7.2slow
     ggplot(flowers, aes(x = name, y = height)) + 
      geom_bar(stat = "identity") 
    A Bar chart with height and name of flower
Exercise 3.1

Run this code to create a bar chart using the flowers dataset.

Bar Chart
Exercise 3.2

Replace y = height with y = sunlight and see what happens. (Double click on the _____ and edit the code)

Bar Chart

Adding Text

A chart is more informative when we add text to it. We can add text to the chart using the labs() function.

 labs(alt = "Alternative text for the chart",
     title = "Title of the chart",
     subtitle = "Subtitle of the chart",
     x = "Label for the x-axis",
     y = "Label for the y-axis",
     caption = "Caption for the chart") 

The alt argument is used to add alternative text to the chart. This is essential for screen readers to read aloud the text for those who may experience vision challenges. The title and subtitle give context to the chart. The x and y arguments are used to label the x and y axes. The caption argument is a good place to add the source of the data.

argumentdescription
altAlternative text for the chart
titleTitle of the chart
subtitleSubtitle of the chart
xLabel for the x-axis
yLabel for the y-axis
captionCaption for the chart
Example

    If we add the following arguments to the labs() function:

     ggplot(flowers, aes(x = name, y = height)) + 
      geom_bar(stat = "identity") + 
      labs(alt = "A bar chart with the height of flowers (peony: 90, rose: 150, zinnia: 60, poppy: 75)",
           title = "Flower Heights",
           subtitle = "Height of flowers in inches",
           x = NULL, y = NULL,
           caption = "Source: The School of Data") 

    We’ll get this chart with the text added.

    A bar chart with the height of flowers (peony: 90, rose: 150, zinnia: 60, poppy: 75)

    The alt-text will not be visible in the chart but is important for screen readers.

Exercise 3.3

Run the code below to add labels to the plot.

Bar Chart

Formatting Theme Elements

We can change the appearance of the chart by adding themes. We will use the inbuilt theme_minimal() function to remove the grid lines and add a white background.

And we’ll add modifications to the theme by removing the x-axis grid lines and moving the title to the top of the plot area.

 theme_minimal() +
theme(panel.grid.major.x = element_blank(),
      panel.grid.minor.x = element_blank(),
      plot.title.position = "plot") 
Example

    If we add the following arguments to the theme() function:

     ggplot(flowers, aes(x = name, y = height)) + 
      geom_bar(stat = "identity") + 
       labs(alt = "A bar chart with the height of flowers (peony: 90, rose: 150, zinnia: 60, poppy: 75)",
           title = "Flower Heights",
           subtitle = "Height of flowers in inches",
           x = NULL, y = NULL,
           caption = "Source: The School of Data") +
      theme_minimal() +
      theme(panel.grid.major.x = element_blank(),
          panel.grid.minor.x = element_blank(),
          plot.title.position = "plot") 

    We’ll get this chart with the theme modifications added.

    A bar chart with the height of flowers (peony: 90, rose: 150, zinnia: 60, poppy: 75)
Exercise 3.4

Run the code below to add a theme to the plot.

Bar Chart

Making a Scatter Plot

A scatter plot shows the relationship between two numerical variables.

In this section, we’ll make this plot:

A scatter plot with height and sunlight

We can create a scatter plot using the geom_point() function.

The geom_point() function takes in two arguments: x and y. We pass an aesthetic or aes() inside mapping. And then inside aes() we add our x and y variables.

 ggplot(data, aes(x = x_variable, y = y_variable)) +
  geom_point() 
Example

    Let’s create a simple scatter plot using the ggplot function with the flowers dataset.

    nameheightseasonsunlightgrowth
    Poppy75Spring8.3fast
    Rose150Summer6.4slow
    Zinnia60Summer8.7fast
    Peony90Spring7.2slow

    Syntax to create sample data:

     flowers <- data.frame(
      name = c("Poppy", "Rose", "Zinnia", "Peony"),
      height = c(75, 150, 60, 90),
      season = c("Spring", "Summer", "Summer", "Spring"),
      sunlight = c(8.3, 6.4, 8.7, 7.2),
      growth = c("fast", "slow", "fast", "slow")
    )
     
    flowers 
     ggplot(flowers, aes(x = sunlight, y = height)) + 
      geom_point() 
    A scatter plot with height and sunlight

    By default, ggplot will zoom in on the dots by adjusting the x and y axis limits.

    We can change the limits by adding the xlim() and ylim() functions. The first value before the comma is the lower limit and the second value is the upper limit.

     xlim(0, 10) + ylim(0, 200) 
     ggplot(flowers, aes(x = sunlight, y = height)) + 
      geom_point() +
      xlim(0, 10) + ylim(0, 200) 
    A scatter plot with height and sunlight
Exercise 3.5

Run this code to create a scatter plot using the flowers dataset.

Bar Chart

Adding Titles and Labels

Example

    We’ll add alt text, a title, and a subtitle to our scatter plot using the labs() function.

     labs(alt = "A scatter plot of sunlight and height (75, 8.3; 150, 6.4; 60, 8.7; 90, 7.2)",
         title = "Height vs Sunlight",
         subtitle = "A scatter plot of sunlight and height",
         caption = "Source: The School of Data") 

    Modifying Theme Elements

    We can change the appearance of the chart by adding themes. We will use the inbuilt theme_minimal() function to remove the grid lines and add a white background.

    And we’ll add modifications to the theme by removing the x-axis grid lines and moving the title to the top of the plot area.

     theme_minimal() 
    A scatter plot of sunlight and height (75, 8.3; 150, 6.4; 60, 8.7; 90, 7.2)
Exercise 3.6

Run this code to add labels to the scatter plot.

Bar Chart

Review

Nice work! In this section, we learned how to create simple charts using ggplot2. We learned how to create a bar chart and a scatter plot. We also learned how to customize the appearance of the charts, add titles and labels, and change the color of the bars and points.

Quiz

    Loading...

    Loading...

summary We've learned how to
  • Select columns using the select function
  • Select columns based on their positions
  • Select a range of columns
  • Unselect columns using the - operator
  • Select columns based on their data type using the where function