R Visualization Exercises¶

Programming for Data Science Bootcamp

Import Libraries¶

In [2]:
library(vctrs)
In [3]:
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::data_frame() masks tibble::data_frame(), vctrs::data_frame()
✖ dplyr::filter()     masks stats::filter()
✖ dplyr::lag()        masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Get Data¶

See info on the mpg dataset.

In [4]:
head(mpg)
A tibble: 6 × 11
manufacturermodeldisplyearcyltransdrvctyhwyflclass
<chr><chr><dbl><int><int><chr><chr><int><int><chr><chr>
audia41.819994auto(l5) f1829pcompact
audia41.819994manual(m5)f2129pcompact
audia42.020084manual(m6)f2031pcompact
audia42.020084auto(av) f2130pcompact
audia42.819996auto(l5) f1626pcompact
audia42.819996manual(m5)f1826pcompact

Exercise 1¶

Run mpg %>% ggplot() what do you see?

In [6]:
mpg %>% ggplot()
No description has been provided for this image

Exercise 2¶

Make a scatter plot of hwy vs. cyl in the mpg data set.

In [7]:
mpg %>%
    ggplot(aes(x = cyl, y = hwy)) +
      geom_point()
No description has been provided for this image

aes is actually a value for the mapping argument.

In [ ]:
mpg %>%
    ggplot(mapping = aes(x = cyl, y = hwy)) +
      geom_point()
No description has been provided for this image

aes can also be treated as a separate operation.

In [ ]:
mpg %>%
    ggplot() +
    aes(x = cyl, y = hwy) +
    geom_point()
No description has been provided for this image

Exercise 3¶

What happens if you make a scatter plot of class vs drv?

Why is the plot not useful?

In [ ]:
mpg %>% 
    ggplot(aes(x = class, y = drv)) + 
    geom_point()
No description has been provided for this image

The resulting scatterplot has only a few points.

A scatter plot is not a useful display of these variables since both drv and class are categorical variables.

👉 Categorical variables typically take a small number of values, so there are a limited number of unique combinations of $(x, y)$ values that can be displayed.

👉 In this data, drv takes $3$ values and class takes $7$ values, meaning that there are only $21$ values that could be plotted on a scatterplot of drv vs. class.

👉 In this data, there $12$ values of (drv, class) are observed.

Exercise 4¶

Plot the mathematical function $sin(x)/x$.

Hint: Use this to create your data and convert to a data frame or tibble:

x <- seq(-6 * pi, 6 * pi, length.out = 100)
In [8]:
x <- seq(-6 * pi, 6 * pi, length.out = 100)
dat <- data.frame(x = x, y = sin(x)/x)
head(dat)
A data.frame: 6 × 2
xy
<dbl><dbl>
1-18.84956-3.898172e-17
2-18.46876-2.012385e-02
3-18.08796-3.815130e-02
4-17.70716-5.137086e-02
5-17.32636-5.765016e-02
6-16.94556-5.576687e-02
In [9]:
ggplot(data = dat, 
    mapping = aes(x = x, y = y)) + 
    geom_line()
No description has been provided for this image
In [10]:
dat %>% 
    ggplot(aes(x = x, y = y)) + 
    geom_line()
No description has been provided for this image

Exercise 5¶

Plot the cars data set as a scatter plot using speed vs dist.

In [11]:
cars %>% 
    ggplot(aes(x = speed, y = dist)) +
    geom_point()
No description has been provided for this image

Exercise 6¶

Create the same plot plot, this time using color to distinguish data points with distances taken to stop greater than $80$.

In [ ]:
head(cars)
A data.frame: 6 × 2
speeddist
<dbl><dbl>
14 2
2410
37 4
4722
5816
6910
In [ ]:
cars %>%
    ggplot(aes(x = speed, y = dist)) +
    geom_point(mapping = aes(color = dist > 80))
No description has been provided for this image
In [ ]:
cars %>%
    ggplot() +
    aes(x = speed, y = dist) +
    geom_point() +
    aes(color = dist > 80)
No description has been provided for this image

Exercise 7¶

Change the plot so that values $> 80$ are in red and the other in blue.

Hint: Define the colors using a manual color scale in scale_color_manual().

In [ ]:
cars %>%
    ggplot(aes(x = speed, y = dist)) + 
    geom_point(mapping = aes(color = dist > 80)) + 
    scale_color_manual(values = c("blue", "red"))
No description has been provided for this image

Another way, using ifelse():

In [ ]:
cars %>%
    ggplot(aes(speed, dist)) + 
    geom_point(color = ifelse(cars$dist > 80, 'red', 'blue'))
No description has been provided for this image

Exercise 8¶

Add a second geom that produces a smoothed line.

Use lm as your smoothing method.

Hint: Add geom_smooth() to your graphic.

In [ ]:
cars %>%
    ggplot(aes(x = speed, y = dist)) + 
      geom_point(aes(color = dist > 80)) + 
      scale_color_manual(values = c("black", "red")) +
      geom_smooth(method = 'lm')
`geom_smooth()` using formula = 'y ~ x'
No description has been provided for this image

Smoothing method (function) to use includes: lm, glm, gam, loess, rlm.

loess: locally weighted smoothing

In [ ]:
cars %>%
    ggplot(aes(x = speed, y = dist)) + 
      geom_point(aes(color = dist > 80)) + 
      scale_color_manual(values = c("black", "red")) +
      geom_smooth(method = 'loess')
`geom_smooth()` using formula = 'y ~ x'
No description has been provided for this image

Exercise 9¶

Plot histograms for speed and dist in cars.

In [ ]:
cars %>%
    ggplot(aes(x = speed)) + 
    geom_histogram(bins = 10)
No description has been provided for this image
In [ ]:
cars %>%
    ggplot(aes(x = dist)) + 
    geom_histogram(bins = 10)
No description has been provided for this image

Exercise 10¶

Create a faceted plot of a scatterplot of hwy and cty of the mpg with drv as rows and cyl as cols.

What do the empty cells mean?

In [ ]:
mpg %>% 
    ggplot() +
      geom_point(aes(x = hwy, y = cty)) +
      facet_grid(drv ~ cyl)
No description has been provided for this image

The empty cells (facets) in this plot are combinations of drv and cyl that have no observations.

These are the same locations in the scatter plot of drv and cyl that have no points.

In [ ]:
mpg %>% 
    ggplot() +
      geom_point(aes(y = drv, x = cyl))
No description has been provided for this image

Without faceting:

In [ ]:
mpg %>% 
    ggplot() +
      geom_point(aes(x = hwy, y = cty, 
                     color=drv, 
                     shape=as.factor(cyl)))
No description has been provided for this image

Exercise 11¶

Reproduce this graphic from the iris dataset:

Hint: This graphic uses two geometries, one title, and one theme function.

One of the geometries is geom_density2d() and the theme function is theme_light().

In [ ]:
iris %>%
  ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = Species, shape = Species)) +
  geom_point() +
  geom_density2d() +
  ggtitle('IRIS') +
  theme_light()
No description has been provided for this image

Exercise 12¶

Reproduce this graphc from the iris dataset:

Hints:

(1) Preprocess your data as follows:

{r}
iris %>%
  mutate(Species = 'ALL') %>%   # Create a copy of iris where Species has only 'ALL'
  bind_rows(iris)               # concatenate to the original iris

(2) This graphic uses faceting with facet_wrap() and theme function theme_bw().

In [ ]:
iris %>%
  mutate(Species = 'ALL') %>%
  bind_rows(iris) %>%
  ggplot(aes(x = Petal.Length, y = Petal.Width, color = Species)) +
      geom_point() +
      geom_smooth(method = 'loess') +
      xlab('Petal Length') +
      ylab('Petal Width') +
      facet_wrap(~Species, scales = 'free') +
      theme_bw()
`geom_smooth()` using formula = 'y ~ x'
No description has been provided for this image

Exercise 13¶

Reproduce this graphic using the mtcars dataset:

In [ ]:
mtcars %>%
  rownames_to_column() %>%
  mutate(rowname = forcats::fct_reorder(rowname, mpg)) %>%
  ggplot(aes(rowname, mpg, label = rowname)) +
  geom_point() +
  geom_text(nudge_y = .3, hjust = 'left') +
  coord_flip() +
  ylab('Miles per gallon fuel consumption') +
  ylim(10, 40) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0, size = 16),
        axis.title.x = element_text(face = 'bold'),
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.line.y = element_blank())
No description has been provided for this image

Exercise 14¶

Reproduce this graphic using the mtcars dataset:

In [ ]:
mtcars %>%
  ggplot(aes(x = mpg, y = qsec, size = disp, color = as.factor(am))) +
  geom_point() +
  scale_colour_discrete(name  ="Gear",
                        breaks=c(0, 1),
                        labels=c("Manual", "Automatic")) +
  scale_size_continuous(name = 'Displacement') +
  xlab('Miles per gallon') +
  ylab('1/4 mile time') +
  theme_light()
No description has been provided for this image

Exercise 15¶

Reproduce this image using the diamonds dataset:

https://www.r-exercises.com/wp-content/uploads/2018/02/ggplot-exercises-5.png

Process the data.

In [34]:
diamonds2plot <- diamonds %>%
  group_by(cut, color) %>%
  # See note re .groups in the following
  summarize(price = mean(price), .groups = 'drop') %>% 
  arrange(color, price) %>%
  ungroup() %>%
  mutate(id = row_number(), 
         angle = 90 - 360 * (id - 0.5) / n())

Note the we add the .groups = 'drop' argument to summarize() to avoid this error message:

  • summarise() has grouped output by 'cut'. You can override using the .groups argument.

Build the visualization.

In [33]:
diamonds2plot  %>%
  ggplot(aes(factor(id), price, fill = color, group = cut, label = cut)) +
  geom_bar(stat = 'identity', position = 'dodge') +
  geom_text(hjust = 0, angle = diamonds2plot$angle, alpha = .5) +
  coord_polar() +
  ggtitle('Mean dimond price') +
  ylim(-3000, 7000) +
  theme_void() +
  theme(plot.title = element_text(hjust = 0.5, size = 16, face = 'bold'))
No description has been provided for this image