# 14 Build a plot layer by layer

## 14.1 Exercises

1. The first two arguments to ggplot are `data` and `mapping`. The first two arguments to all layer functions are `mapping` and `data`. Why does the order of the arguments differ? (Hint: think about what you set most commonly.)

• Commonly, you first set the data in `ggplot()` and then set aesthetics inside your layer functions, like `geom_point()`, `geom_boxplot()`, or `geom_histogram()`.

2.

``````library(dplyr)
class <- mpg %>%
group_by(class) %>%
summarise(n = n(), hwy = mean(hwy))``````
``````mpg %>%
ggplot(aes(class, hwy)) +
geom_jitter(width = 0.15, height = 0.35) +
geom_point(data = class, aes(class, hwy),
color = "red",
size = 6) +
geom_text(data = class, aes(y = 10, x = class, label = paste0("n = ", n)))``````
• I plotted 3 different layers: jittered points, red point for the summary measure, mean, and text for the sample size (n).

## 14.2 Exercises

1. Simplify the following plot specifications:

``````####################################
####################################
# ggplot(mpg) +
#   geom_point(aes(mpg\$displ, mpg\$hwy))

# The above can be simplified:
# ggplot(mpg) +
#   geom_point(aes(displ, hwy))
####################################
####################################

####################################
####################################
# ggplot() +
#  geom_point(mapping = aes(y = hwy, x = cty),
#             data = mpg) +
#  geom_smooth(data = mpg,
#              mapping = aes(cty, hwy))

# The above can be simplified:
# ggplot(mpg, aes(cty, hwy)) +
#  geom_point() +
#  geom_smooth()
####################################
####################################

####################################
####################################
# ggplot(diamonds, aes(carat, price)) +
#   geom_point(aes(log(brainwt), log(bodywt)),
#              data = msleep)

# The above can be simplified:
# msleep_processed <- msleep %>%
#   mutate(brainwt_log = log(brainwt),
#          bodywt_log = log(bodywt))

# ggplot(diamonds, aes(carat, price)) +
#   geom_point(aes(brainwt_log, bodywt_log),
#              data = msleep_processed)
####################################
####################################``````

2. What does the following code do? Does it work? Does it make sense? Why/why not?

``````ggplot(mpg) +
geom_point(aes(class, cty)) +
geom_boxplot(aes(trans, hwy))``````
• It plots points of `class` vs `cty` and then a boxplot of `trans` vs `hwy`. It doesn’t make sense to plot layers with different `x` and `y` variables.

3. What happens if you try to use a continuous variable on the x axis in one layer, and a categorical variable in another layer? What happens if you do it in the opposite order?

• Not sure

## 14.3 Exercises

1,2,3 omitted.

1. Starting from top left, clockwise direction:
• `geom_violin()`, `geom_point()`, `geom_point()`, `geom_path()`, `geom_area()`, `geom_hex()`.

## 14.4 Exercises

1.

``````mod <- loess(hwy ~ displ, data = mpg)
smoothed <- data.frame(displ = seq(1.6, 7, length = 50))
pred <- predict(mod, newdata = smoothed, se = TRUE)
smoothed\$hwy <- pred\$fit
smoothed\$hwy_lwr <- pred\$fit - 1.96 * pred\$se.fit
smoothed\$hwy_upr <- pred\$fit + 1.96 * pred\$se.fit

smoothed %>%
ggplot(aes(displ, hwy)) +
geom_line(color = "dodgerblue1") +
geom_ribbon(aes(ymin = hwy_lwr,
ymax = hwy_upr),
alpha = 0.4)``````

2. From left to right,

`stat_ecdf()`, `stat_qq()`, `stat_function()`

3.

``````mpg %>%
ggplot(aes(drv, trans)) +
geom_count(aes(size = after_stat(prop), group = 1)) ``````

## 14.5 Exercises

1. According to the help page, `position_nudge()` is generally useful for adjusting the position of items on discrete scales by a small amount. Nudging is built in to geom_text() because it’s so useful for moving labels a small distance from what they’re labelling.

2. Not sure

3. `geom_jitter()` adds a small amount of random variation to the location of each point. It is useful for looking at all the overplotted points. On the other hand, `geom_count()` counts the number of overlapping observations at each location. It is useful for understanding the number of points in a location.

4. Stacked area plot seems useful when you want to portray an area whereas a line plot seems useful when you just need a line.