library(tidyverse)
library(scales) # For nicer scales
set.seed(1234) # Make all random draws the same
<- tibble(x1 = rnorm(10000),
example_data x2 = rnorm(10000),
y1 = sample(1:100, 10000, replace = TRUE),
y2 = sample(LETTERS[1:4], 10000, replace = TRUE),
y3 = sample(LETTERS[10:11], 10000, replace = TRUE),
year = sample(2010:2017, 10000, replace = TRUE)) |>
arrange(y2, year)
# write_csv(example_data, "data/example_data.csv")
Analysis of life in the Good Place
Final project— PMAP 8551, Spring 2025
Executive summary
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?
Data background
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Data cleaning
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
To make life easier, I created a custom ggplot theme that I can use in all my figures:
<- theme_minimal(base_family = "Source Sans Pro") +
my_beautiful_fancy_theme theme(legend.position = "bottom",
panel.background = element_rect(fill = "transparent", colour = NA),
plot.background = element_rect(fill = "transparent", colour = NA),
axis.title.x = element_text(margin = margin(t = 15)),
axis.title.y = element_text(margin = margin(r = 15)),
strip.text = element_text(family = "Source Sans Pro", face = "bold",
size = rel(1.3)))
Individual figures
Figure 1: Lollipop chart
First, I was interested in blah because blah, so I created a lollipop chart to show blah. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
<- example_data |>
example_data_summarized group_by(y2, y3) |>
summarize(n = n())
<- ggplot(example_data_summarized, aes(x = n, y = fct_rev(y2), color = y3)) +
figure1 geom_pointrange(aes(xmin = 0, xmax = n), position = position_dodge(width = 0.5),
linewidth = 1, fatten = 5) +
labs(x = "Total number of things", y = NULL) +
guides(color = guide_legend(title = NULL)) +
scale_color_manual(values = c("#FF4266", "#82B09C")) +
+
my_beautiful_fancy_theme theme(panel.grid.minor = element_blank(),
panel.grid.major.y = element_blank())
figure1
ggsave(figure1, filename = "output/figure1.pdf", device = cairo_pdf,
width = 6, height = 4, units = "in", bg = "transparent")
Figure 2: Changes over time
Next, I wanted to see how things have changed over time, so I created a blah because blah. I found blah. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
<- example_data |>
example_data_time pivot_longer(cols = c(x1, x2), names_to = "x_names", values_to = "value") |>
group_by(x_names, year, y2) |>
summarize(x_avg = mean(value),
error = sd(value) / sqrt(length(value))) |>
ungroup() |>
mutate(upper = x_avg + (1.96 * error),
lower = x_avg - (1.96 * error)) |>
mutate(x_names = recode(x_names,
x1 = "X1 (average)",
x2 = "X2 (average)"))
<- ggplot(example_data_time, aes(x = year, y = x_avg, color = x_names)) +
figure2 geom_hline(yintercept = 0, linewidth = 0.75, color = "#CC3340", linetype = "dotted") +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = x_names, color = NULL), alpha = 0.2) +
geom_line(linewidth = 1) +
scale_color_manual(values = c("#FA6900", "#69D1E8")) +
scale_y_continuous(labels = label_percent()) +
guides(color = guide_legend(title = NULL), fill = "none") +
labs(x = NULL, y = "Whatever this is measuring") +
facet_wrap(~ y2, nrow = 1) +
+
my_beautiful_fancy_theme theme(panel.grid.minor = element_blank())
figure2
ggsave(figure2, filename = "output/figure2.pdf", device = cairo_pdf,
width = 16, height = 3, units = "in", bg = "transparent")
Figure 3: Relationships
I was also interested in the relationship between blah and blah, so I blahed. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
# There are a lot of points here and they're all random and pointless, so I
# simplify this graphic by just taking a subset of them
<- example_data |>
example_data_subset sample_n(500)
<- ggplot(example_data_subset, aes(x = y1, y = x2, color = y2)) +
figure3 geom_point(size = 1, alpha = 0.75) +
geom_smooth(method = "lm", color = "#85144A", linewidth = 2) +
labs(x = "Some variable", y = "Some other variable") +
guides(color = guide_legend(title = NULL)) +
scale_color_manual(values = c("#188146", "#004259", "#B00DC9", "#FFE01C")) +
facet_wrap(~ y3) +
+
my_beautiful_fancy_theme theme(panel.grid.minor = element_blank())
figure3
ggsave(figure3, filename = "output/figure3.pdf", device = cairo_pdf,
width = 6, height = 4, units = "in", bg = "transparent")
Final figure
I took these three graphs and combined them and enhanced them in Illustrator. I chose the colors, fonts, alignment, etc. because blah and the final figure represents truth because of blah.