GGPlot and the Grammar of Graphics

Programming for Data Science

GGPlot is the graphics package associated with the Tidyverse.

In this notebook, we will present a quick overview of the design principles behind the package.

As with Dplyr, GGPlot2 is a an entirely new system that supplants the older graph functions that are built into R.

It is founded on a principled analysis of its domain and from this develops a basic grammar which can then be expressed in R.

Just as Dplyr is built on a grammar of data consisting of pipable verbs that correspond to the logic data transformation, GGPlot2 is built on a grammar of graphics consisting of nouns that correspond to the architecture of a graphic (aka plot).

The phrase “grammar of graphics” actually comes from the book by that name written by statistician and computer scientist Leland Wilkinson in 1999 and later revised:

Image of book cover

It is worth reading if you want to get a solid grounding in visualization, which belongs to the design area of data science.

The Grammar of Graphics takes an object-oriented approach to visualization and formalizes two main principles:

  1. Graphics are built out of distinct layers.

  2. In each layer, meaningful plots are constructed through mappings of data onto aesthetics.

The primary layers are these:

According to Wickham, who adopted these principles and applied them to R,

A grammar of graphics is a tool that enables us to concisely describe the components of a graphic. Such a grammar allows us to move beyond named graphics (e.g., the “scatterplot”) and gain insight into the deep structure that underlies statistical graphics (Wickham 2012).

Wickham takes this idea and develops it into this:

Source (see also ScienceCraft).

In this model, everything starts with data.

Then data are mapped on aesthetics within geometries.

Also note that aesthetics make use of visual channels to signify features and their properties.

These compose the primary layers.

The other layers apply downstream modifications that add more information and style to the graph.