Based on Chapter 7 of ModernDive. Code for Quiz 11.
7.2.4 in Modern Dive with different sample sizes and repetitions
tidyverse
and the moderndive packagesModify the code for comparing differnet sample sizes from the virtual bowl
Segment 1: sample size = 28
bowl dataset. Assign the output to
virtual_samples_28virtual_samples_28 <- bowl %>%
rep_sample_n(size = 28, reps = 1150)
virtual_prop_red_28 <- virtual_samples_28 %>%
group_by(replicate) %>%
summarise(red = sum(color == "red")) %>%
mutate(prop_red = red / 28)
ggplot(virtual_prop_red_28, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 28 balls that were red", title = "28")

Segment 2: sample size = 53
virtual_samples_53 <- bowl %>%
rep_sample_n(size = 53, reps = 1150)
virtual_prop_red_53 <- virtual_samples_53 %>%
group_by(replicate) %>%
summarise(red = sum(color == "red")) %>%
mutate(prop_red = red / 53)
ggplot(virtual_prop_red_53, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 53 balls that were red", title = "53")

Segment 3: sample size = 118
virtual_samples_118 <- bowl %>%
rep_sample_n(size = 118, reps = 1150)
virtual_prop_red_118 <- virtual_samples_118 %>%
group_by(replicate) %>%
summarise(red = sum(color == "red")) %>%
mutate(prop_red = red / 118)
ggplot(virtual_prop_red_118, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 118 balls that were red", title = "118")

Calculate the standard deviations for your three sets of SEE QUIZ
values of prop_red using the
standard deviation
n = 28
# A tibble: 1 × 1
sd
<dbl>
1 0.0903
n = 53
# A tibble: 1 × 1
sd
<dbl>
1 0.0648
n = 118
# A tibble: 1 × 1
sd
<dbl>
1 0.0429
The distribution with sample size, n = 118, has the smallest standard deviation (spread) around the estimated proportion of red balls.