Lecture 0
Duke University
STA 199 Spring 2026
2026-01-07
Bethany Akinola
Federico Arboleda
Jannis Bolik
Arijit Dey
Cael Elmore
Oliver Gao
Natasha Harris
Dwija Kakkad
Abuzar Khudaverdiyeva
Hyunjin Lee
Liane Ma
Chelsea Nguyen
Max Niu
Tory Norton
Patrick Pham
Kenna Roberts
Katie Solarz
Sarah Wu
Edward Zhang
Lisa Zhang
Mary Knox
John Zito
Office hours begin Monday January 12.
First half:
Data science
Second half:
Statistical thinking
Quantifying our uncertainty about that knowledge.
Campaign manager: What is the probability that our candidate wins the election?
(A flurry of analysis takes place.)
Data scientist: Our best guess is 54%.
Campaign manager: How reliable is that estimate? How confident are we in that? What’s the margin of error?
Parallel Universe 1
Data scientist: It’s 54% give or take 3%.
Parallel Universe 2
Data scientist: It’s 54% give or take 20%.
It’s all about decision-making under uncertainty
The manager is going to make wildly different decisions about campaign strategy and spending depending on how uncertain the environment is.

All linked from the course website:
| Category | Percentage |
|---|---|
| Lectures (attendance + participation) | 5% |
| Labs | 8% |
| HW | 12% |
| Project | 15% |
| Midterm 1 | 20% |
| Midterm 2 | 20% |
| Final | 20% |
See course syllabus for how the final letter grade will be determined.
Daily in lecture
Tracked for credit, but not based on correctness, only participation (and often they will be questions designed to make you think that might not have a single right answer!)
Lecture meetings will typically involve me babbling for 45 - 60 minutes, and then we work through a guided activity where you try out the latest material for yourself;
Not graded, but tracked for feedback on workflow
Hands-on practice with data analysis
A single exercise per lab, graded based on being there and turning in something reasonable + correctness
Completed in-person, in lab, in teams
Teams randomized each week until project teams assigned
Developed collaboratively, but turned in individually by the end of the lab session
8 throughout semester, two lowest scores dropped
No late work accepted
Hands-on practice with data analysis
Some questions for practice with instant feedback by AI
Some questions to be graded by humans for correctness
Can start in lab if time permits, but completed at home
Can consult with course team and peers, but completed and turned in individually by the end of the week
7 throughout semester, lowest score dropped
Up to 3 days late (-5% per day), no late work accepted after that
One-time late penalty waiver: Can be used on any homework assignment, no questions asked, must be requested from Dr. Knox before the deadline
Two midterm exams during semester, comprised of two parts:
(80%) In-class: 75 minute in-class exam. Closed book, one sheet of notes;
(20%) Take-home: Follow from the in class exam and focus on the analysis of a dataset introduced in the take home exam.
Final, in-class only: Closed book, one sheet of notes;
Notes for exams: Both sides of a single 8.5” x 11” sheet prepared by you and you alone;
No extensions or make-ups.
Caution
Exam dates cannot be changed and no make-up exams will be given. If you can’t take the exams on these dates, you should drop this class.
If you need testing accomodations
Make sure I get a letter, and make your appointments in the Testing Center now.
Dataset of your choice, method of your choice
Teamwork
Interim deadlines throughout semester
Final milestone: Presentation in lab and write-up
Must be in lab, in-person to present
Peer review between teams for content, peer evaluation within teams for contribution
Some lab sessions allocated to project progress
Caution
Project due date cannot be changed. You must complete the project to pass this class.
Randomized at first for weekly labs
Then selected by you or assigned by me if you don’t express a preference for project and remaining labs
Expectations and roles
For the project: Peer evaluation during teamwork and after completion
Ethel Merman

| Born | January 16, 1908 |
| Died | February 15, 1984 |
| Age | 76 |
| Claim to fame | JZ’s favorite singer |
Megan Pete

| Born | February 15, 1995 |
| Age | 30 |
| Claim to fame | Rapper |
봉준호

| Born | September 14, 1969 |
| Age | 56 |
| Claim to fame | Directed Parasite, Snowpiercer, etc |
When the picture was taken, how old was the person?


# A tibble: 299 × 6
celeb1 celeb2 celeb3 celeb4 celeb5 celeb6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 32 26 73 68 50 78
2 22 42 75 55 35 77
3 22 31 79 40 43 68
4 30 28 65 40 45 70
5 35 39 67 54 49 60
6 30 48 67 37 45 70
7 49 42 78 47 62 75
8 28 35 70 50 60 70
9 31 36 62 43 41 68
10 26 33 67 60 42 71
# ℹ 289 more rows


| Born | 2/10/1987 |
| Age in pic | 36 |
| Claim to fame | Classical pianist |


A secret she took to her grave:

| Born | 3/23/(1904 - 1908) |
| Died | 5/10/1977 |
| Age in pic | 38 - 42 |
| Claim to fame | Oscar-winning actor |


His actual birthday was not known at the time:

| Born | 2/7/1887 |
| Died | 2/12/1983 |
| Age in pic | 82 |
| Claim to fame | Composer |




| Born | 2/3/1963 |
| Age in pic | 48 |
| Claim to fame | UChicago economist |
| RBI governor |



| Born | 5/27/1911 |
| Died | 10/25/1993 |
| Age in pic | 38 - 39 |
| Claim to fame | Horror actor |



| Born | 10/21/1925 |
| Died | 7/16/2003 |
| Age in pic | 76 |
| Claim to fame | Queen of Salsa |

Domain knowledge and modeling assumptions: data do not speak for themselves. You need some subject-matter expertise about what you’re studying, as well as an interpretive lens;
Are you asking questions the data can actually answer?
Uncertainty has many sources, and in some cases, it may be simply irreducible, no matter how hard you try;
Data quality and data cleaning: Data are not gospel. There could be noise and mistakes. Then what?
Wisdom of crowds: aggregating many imperfect guesses can do better than any one individual guess.