Lecture 2
Duke University
STA 199 Spring 2026
2026-01-14
What to expect:
Last time:
We introduced you to the course toolkit;
You cloned your ae repositories and rendered your first Quarto document;
Today:
We will finish the application exercise, and get an introduction to Git, GitHub, and Quarto;
We will introduce data visualization;
Time permitting, we’ll start a new AE to practice.
Option 2:
Go to RStudio and open the document ae-01-income-inequality.qmd.

Once we made changes to our Quarto document, we
went to the Git pane in RStudio
staged our changes by clicking the checkboxes next to the relevant files
committed our changes with an informative commit message
pulled from GitHub to make sure we had the latest version of our repo
pushed our changes to our application exercise repos
confirmed on GitHub that we could see our changes pushed from RStudio
Remember this visualization from the code along video – what was it about?
Are you asking a question that your data could actually answer?
This class is about technique bolstered by good taste.
If a reader has to squint at your picture for more than 30 seconds (and that’s being generous) in order to understand it, you need to start over.
how the sausage is made!
us_uk_tr_votes <- un_votes |>
inner_join(un_roll_calls, by = "rcid") |>
inner_join(un_roll_call_issues, by = "rcid", relationship = "many-to-many") |>
filter(country %in% c("United Kingdom", "United States", "Turkey")) |>
mutate(year = year(date)) |>
group_by(country, year, issue) |>
summarize(percent_yes = mean(vote == "yes"), .groups = "drop")Note
Let’s leave these details aside for a bit, we’ll revisit this code at a later point in the semester. For now, let’s agree that we need to do some “data wrangling” to get the data into the right format for the plot we want to create. Just note that we called the data frame we’ll visualize us_uk_tr_votes.
# A tibble: 1,212 × 4
country year issue percent_yes
<chr> <dbl> <fct> <dbl>
1 Turkey 1946 Colonialism 0.8
2 Turkey 1946 Economic development 0.6
3 Turkey 1946 Human rights 0
4 Turkey 1947 Colonialism 0.222
5 Turkey 1947 Economic development 0.5
6 Turkey 1947 Palestinian conflict 0.143
7 Turkey 1948 Colonialism 0.417
8 Turkey 1948 Arms control and disarmament 0
9 Turkey 1948 Economic development 0.375
10 Turkey 1948 Human rights 0.167
# ℹ 1,202 more rows
Each line of code adds an element to the plot.
Let’s take it one line at a time.
Map year to the x aesthetic
Map percent_yes to the y aesthetic
Aesthetics are visual properties of a plot
In the grammar of graphics, variables from the data frame are mapped to aesthetics
It’s common practice in R to omit the names of first two arguments of a function:
with a geom
Map country to the color aesthetic
with another geom
geom_smooth() resulted in the following warning:`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
with alpha
with se = FALSE

The commands are the layers of sponge, and the plus signs are the icing. Don’t forget the icing!
We built a plot layer-by-layer


ae-02-penguin-peekaboo.qmd;