AE 07: Sales taxes + data joining

Suggested answers

Important

These are suggested answers. This document should be used as a reference only; it’s not designed to be an exhaustive key.

Getting started

Packages

We’ll use the tidyverse package for this analysis.

Data

The data are available in the data folder:

  • sales-taxes.csv

  • us-regions.csv

sales_taxes <- read_csv("data/sales-taxes-25.csv")
us_regions <- read_csv("data/us-regions.csv")

And let’s take a look at the data.

glimpse(sales_taxes)
Rows: 51
Columns: 7
$ state              <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "Califo…
$ state_tax_rate     <dbl> 4.000, 0.000, 5.600, 6.500, 7.250, 2.900, 6.350, 0.…
$ state_tax_rank     <dbl> 40, 46, 28, 9, 1, 45, 12, 46, 17, 40, 40, 17, 13, 2…
$ avg_local_tax_rate <dbl> 5.44, 1.82, 2.92, 2.98, 1.73, 4.96, 0.00, 0.00, 1.0…
$ max_local          <dbl> 11.000, 7.850, 5.300, 6.125, 5.250, 8.300, 0.000, 0…
$ combined_tax_rate  <dbl> 9.44, 1.82, 8.52, 9.48, 8.98, 7.86, 6.35, 0.00, 7.0…
$ combined_rank      <dbl> 5, 46, 11, 3, 7, 16, 33, 47, 24, 19, 45, 37, 8, 25,…
glimpse(us_regions)
Rows: 50
Columns: 2
$ state_name <chr> "Maine", "New Hampshire", "Vermont", "Massachusetts", "Rhod…
$ region     <chr> "Northeast", "Northeast", "Northeast", "Northeast", "Northe…

Sales tax in swing states: if_else

Create new swing_state variable using if_else:

list_of_swing_states <- c("Arizona", "Georgia", "Michigan", "Nevada", 
                          "North Carolina", "Pennsylvania", "Wisconsin")

sales_taxes <- sales_taxes |>
  mutate(
    swing_state = if_else(state %in% list_of_swing_states,
                          "Swing",
                          "Non-swing")) |>
  relocate(swing_state)

sales_taxes
# A tibble: 51 × 8
   swing_state state  state_tax_rate state_tax_rank avg_local_tax_rate max_local
   <chr>       <chr>           <dbl>          <dbl>              <dbl>     <dbl>
 1 Non-swing   Alaba…           4                40               5.44     11   
 2 Non-swing   Alaska           0                46               1.82      7.85
 3 Swing       Arizo…           5.6              28               2.92      5.3 
 4 Non-swing   Arkan…           6.5               9               2.98      6.12
 5 Non-swing   Calif…           7.25              1               1.73      5.25
 6 Non-swing   Color…           2.9              45               4.96      8.3 
 7 Non-swing   Conne…           6.35             12               0         0   
 8 Non-swing   Delaw…           0                46               0         0   
 9 Non-swing   Flori…           6                17               1.02      2   
10 Swing       Georg…           4                40               3.44      5   
# ℹ 41 more rows
# ℹ 2 more variables: combined_tax_rate <dbl>, combined_rank <dbl>

Summarize to find the mean sales tax in each type of state:

sales_taxes |>
  group_by(swing_state) |>
  summarize(mean_state_tax = mean(state_tax_rate))
# A tibble: 2 × 2
  swing_state mean_state_tax
  <chr>                <dbl>
1 Non-swing             5.05
2 Swing                 5.46

Sales tax in coastal states: case_when

Create new coast variable using case_when:

pacific_coast <- c("Alaska", "Washington", "Oregon", "California", "Hawaii")

atlantic_coast <- c(
  "Connecticut", "Delaware", "Georgia", "Florida", "Maine", "Maryland", 
  "Massachusetts", "New Hampshire", "New Jersey", "New York", 
  "North Carolina", "Rhode Island", "South Carolina", "Virginia"
)

sales_taxes <- sales_taxes |>
  mutate(
    coast = case_when(
      state %in% pacific_coast ~ "Pacific",
      state %in% atlantic_coast ~ "Atlantic",
      .default = "Neither"
    )
  ) |>
  relocate(coast)

sales_taxes
# A tibble: 51 × 9
   coast    swing_state state   state_tax_rate state_tax_rank avg_local_tax_rate
   <chr>    <chr>       <chr>            <dbl>          <dbl>              <dbl>
 1 Neither  Non-swing   Alabama           4                40               5.44
 2 Pacific  Non-swing   Alaska            0                46               1.82
 3 Neither  Swing       Arizona           5.6              28               2.92
 4 Neither  Non-swing   Arkans…           6.5               9               2.98
 5 Pacific  Non-swing   Califo…           7.25              1               1.73
 6 Neither  Non-swing   Colora…           2.9              45               4.96
 7 Atlantic Non-swing   Connec…           6.35             12               0   
 8 Atlantic Non-swing   Delawa…           0                46               0   
 9 Atlantic Non-swing   Florida           6                17               1.02
10 Atlantic Swing       Georgia           4                40               3.44
# ℹ 41 more rows
# ℹ 3 more variables: max_local <dbl>, combined_tax_rate <dbl>,
#   combined_rank <dbl>

Summarize to find the mean sales tax in each type of state:

sales_taxes |>
  group_by(coast) |>
  summarize(mean_state_tax = mean(state_tax_rate))
# A tibble: 3 × 2
  coast    mean_state_tax
  <chr>             <dbl>
1 Atlantic           4.84
2 Neither            5.46
3 Pacific            3.55

Sales tax in US regions: joining

Join the sales tax data with region data and save the joined data frame as a new data frame, not overwriting either data frame that goes into the join.

sales_taxes_regions <- sales_taxes |>
  full_join(us_regions, 
            by = join_by(state == state_name)) |>
  relocate(region)

sales_taxes_regions
# A tibble: 51 × 10
   region    coast    swing_state state       state_tax_rate state_tax_rank
   <chr>     <chr>    <chr>       <chr>                <dbl>          <dbl>
 1 South     Neither  Non-swing   Alabama               4                40
 2 West      Pacific  Non-swing   Alaska                0                46
 3 West      Neither  Swing       Arizona               5.6              28
 4 South     Neither  Non-swing   Arkansas              6.5               9
 5 West      Pacific  Non-swing   California            7.25              1
 6 West      Neither  Non-swing   Colorado              2.9              45
 7 Northeast Atlantic Non-swing   Connecticut           6.35             12
 8 South     Atlantic Non-swing   Delaware              0                46
 9 South     Atlantic Non-swing   Florida               6                17
10 South     Atlantic Swing       Georgia               4                40
# ℹ 41 more rows
# ℹ 4 more variables: avg_local_tax_rate <dbl>, max_local <dbl>,
#   combined_tax_rate <dbl>, combined_rank <dbl>

Calculate the average sales tax of states in each region. What is surprising in the output?

sales_taxes_regions |>
  group_by(region) |>
  summarize(mean_state_tax = mean(state_tax_rate))
# A tibble: 5 × 2
  region    mean_state_tax
  <chr>              <dbl>
1 Midwest             5.69
2 Northeast           5.30
3 South               5.27
4 West                4.16
5 <NA>                6   

Identify the state with NA for region.

sales_taxes_regions |>
  filter(is.na(region)) |>
  select(state)
# A tibble: 1 × 1
  state               
  <chr>               
1 District of Columbia

Apply a fix for the NA in region, and calculate the mean sales taxes for regions again. Display the results in ascending order of mean sales tax.

sales_taxes_regions |>
  mutate(
    region = if_else(state == "District of Columbia", "Northeast", region)
  ) |>
  group_by(region) |>
  summarize(mean_state_tax = mean(state_tax_rate))
# A tibble: 4 × 2
  region    mean_state_tax
  <chr>              <dbl>
1 Midwest             5.69
2 Northeast           5.37
3 South               5.27
4 West                4.16

Render, commit, and push

  1. Render your Quarto document.

  2. Go to the Git pane and check the box next to each file listed, i.e., stage your changes. Commit your staged changes using a simple and informative message.

  3. Click on push (the green arrow) to push your changes to your application exercise repo on GitHub.

  4. Go to your repo on GitHub and confirm that you can see the updated files. Once your updated files are in your repo on GitHub, you’re good to go!