Compare the average state sales tax rates of swing states (Arizona, Georgia, Michigan, Nevada, North Carolina, Pennsylvania, and Wisconsin) vs. non-swing states.
How would you approach this task?
Create a new variable called swing_state with levels "Swing" and "Non-swing"
Group by swing_state
Summarize to find the mean sales tax in each type of state
ae-07-taxes-join
Go to your ae project in RStudio.
If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
If you haven’t yet done so, click Pull to get today’s application exercise file: ae-07-taxes-join.qmd.
Work through the application exercise in class, and render, commit, and push your edits by the end of class.
mutate() with if_else()
Create a new variable called swing_state with levels "Swing" and "Non-swing".
Compare the average state sales tax rates of states in various regions (Midwest - 12 states, Northeast - 9 states, South - 16 states, West - 13 states).
How would you approach this task?
Create a new variable called region with levels "Midwest", "Northeast", "South", and "West".
Group by region
Summarize to find the mean sales tax in each type of state
mutate() with case_when()
Who feels like filling in the blanks lists of states in each region? Who feels like it’s simply too tedious to write out names of all states?
list_of_midwest_states <-c(___)list_of_northeast_states <-c(___)list_of_south_states <-c(___)list_of_west_states <-c(___)sales_taxes <- sales_taxes |>mutate(coast =case_when( state %in% list_of_west_states ~"Midwest", state %in% list_of_northeast_states ~"Northeast", state %in% list_of_south_states ~"South", state %in% list_of_west_states ~"West" ) )
Joining data
Why join?
Suppose we want to answer questions like:
Is there a relationship between
- number of QS courses taken
- having scored a 4 or 5 on the AP stats exam
- motivation for taking course
- …
and performance in this course?”
Each of these would require joining class performance data with an outside data source so we can have all relevant information (columns) in a single data frame.
Why join?
Suppose we want to answer questions like:
Compare the average state sales tax rates of states in various regions (Midwest - 12 states, Northeast - 9 states, South - 16 states, West - 13 states).
This can also be solved with joining region information with the state-level sales tax data.
Setup
For the next few slides…
x <-tibble(id =c(1, 2, 3),value_x =c("x1", "x2", "x3") )x
# A tibble: 3 × 2
id value_x
<dbl> <chr>
1 1 x1
2 2 x2
3 3 x3
y <-tibble(id =c(1, 2, 4),value_y =c("y1", "y2", "y4") )y
# A tibble: 3 × 2
id value_y
<dbl> <chr>
1 1 y1
2 2 y2
3 4 y4