films <-read_csv("data/films.csv")films |>mutate(Films =if_else(is.na(Films), "(I can't recall)", Films)) |>mutate(Films =fct_lump_min(Films, min =6, other_level ="Other")) |>filter(Films !="Other") |>mutate(Films =fct_infreq(Films)) |>ggplot(aes(y = Films)) +geom_bar(fill ="skyblue") +theme_minimal() +labs(y ="",title ="What's the last film you saw in a theater?",subtitle ="From the sta199-s26 survey",caption ="(Films with fewer than 6 responses were omitted for readability.)")
“We did a statistical analysis of films to test two claims: first, that films that pass the Bechdel test — featuring women in stronger roles — see a lower return on investment, and second, that they see lower gross profits. We found no evidence to support either claim.”
ae-03-bechdel-dataviz
Let’s go:
Open your container;
Switch to your ae- project if you’re not already there;
If there are any un-committed changes in the Git tab (upper right), commit them;
The Git tab should now be completely empty;
Lastly, pull the new files I’ve distributed to you.
If that all went smoothly, open the new ae-03-bechdel-dataviz.qmd file.
Recap
Code cells (aka code chunks)
. . .
Cell labels are helpful for describing what the code is doing, for jumping between code cells in the editor, and for troubleshooting
message: false hides any messages emitted by the code in your rendered document
Describing distributions and relationships
Talking about one numerical variable
center: what is the “typical” value (mean, median, mode) the data are concentrating around?
spread: how concentrated are the data around a typical value?
shape: does the distribution have one peak, or many? is it symmetric or skewed?
Interaction between shape and center
If there is only one peak:
Histograms provide more detail…
Box plots hide multi-modality.
…but boxplots are nicer for side-by-side comparisons, especially with many groups