Complete the exercises, update the “author” and “date” fields in the header, knit it, and submit both the HTML and RMD files to Canvas. Due date: **Oct 13, 2021 at 11:59pm".
[20 points]
The birthday problem asks for the probability that in a group of \(n\) people, at least 2 people will share the same birthday. This is easy to solve, and the solution is easily found online.
We can generalize this to a more difficult problem and solve it using a Monte Carlo approach: in \(n\) people, what is the probability that at least \(k\) people have the same birthday?
Write a function birthday(n,k,i)
that returns a probability estimate given 3 arguments:
n=50
is used, we are asking “in 50 people, what is the probability that…”k=4
is used, we asking "…what is the probability that at least 4 people share the same birthday?i=1000
is used, your function should run 1000 simulationsNotes:
Hints:
R
. In particular, we recommend using the sample()
function with the x
, size
, and replace
arguments set appropriately. See the help page ?sample
for details.table()
function. This will produce a named vector showing how many of each value there are. For example, running table(c(1,3,5,5,7,9,9,9))
will show you there is one 1, one 3, two 5s, one 7, and three 9s.for
loop to repeat the simulation i
times. You will also need a variable outside your for
loop to keep track of how many simulations satisfy that # of birthdays \(\geq k\).birthday(n=23, k=2)
, birthday(n=87, k=3)
and birthday(n=188, k=4)
should all be approximately \(50\%\).Table
function from the Rfast package, which is 4-5 times faster than the normal table()
function.# complete the function
# note i=1000 sets the default value of i to be 1000
birthday = function(n,k,i=1000){
# code goes here
}
This class currently has 162 enrolled students. What is the approximate probability that at least \(4\) students have the same birthdays?
ANSWER HERE
[15 points]
\(X\) is a random variable defined between 0 and 1 with the probability density function \(f(x)=2x\). Note this means the cumulative distribution function is \[F(x)=\int_0^xf(x)dx=x^2\] Write a function rx(n)
to sample from this random variable, where n
is the size of the sample to be drawn. Then, use your function to draw a sample of 500 and plot a histogram of the output.
# defining pdf of X
pdf_x = Vectorize(function(x){
if(x>0 & x<1){2*x} else 0
})
# showing pdf on plot
ggplot() + geom_function(fun=pdf_x,n=10001) + theme_minimal() +
xlim(c(-1,2)) + ylim(-1,3) + labs(x='x',y='f(x)')
# complete the function
rx = function(n){
# code goes here
}
# uncomment the following line of code and check it looks correct
# hist(rx(500))
[15 points]
In the six sequences below, only one of them is actually randomly generated from a fair coin. Use a combination of everything you know (common sense, monte carlo, hypothesis testing, etc.) to identify which is actually random and explain your reasoning.
flips1 = "HTHTHTHTHTHTHTHTHTHTHTHTHTHTHTHTHHTHTHTHTHTHTHTTHTHTHTHTHTHTHHTHTHTHTHTHTHTHTHTHTHTHTHTHTHHTTHTHTHTHTHTHTHTHTHTHTHTHTHHTHTHTHTHTHTHTHTHTHTHTHTTHTHTHTHTHTHTHTHTHTHTHTHTHHTHTHTHTHTHTHTHTHTHTHTHHTHTHTHTH"
flips2 = "HHHTHTTTHHTHHTHHHTTTTHTHTHHTTHTHHHTHHTHTTTHTHHHTHTTTHTHTHHTHTHTTHTHHTHTHTTTHTHHHTHTHTTHTHTHHTHTHTHHHTHTTTHTHHTHTHTHHTTTHTHHTHHTTTTHTHTHHHTHTTHTHHTHTHTTHTHHTHTHHHTHHHTHTTTHTTHTTTHTHHHTHTHTTHTHHTHHTHTTT"
flips3 = "HHTHTHTTTHTHHHTHHTTTHTHHTHTTTHTHTHHTHTHTTHTHHHHHHTTTHTHTHHTHTTTHTHHTHTHTTTHTHHHTTHTTTHTHTHHHHTHTTHHTTTTTHTHHHTHTHTTTTTHHHTHHTHHTHHHTTTTHTHTHHHTHHTTTTTHTHHHTHTHTHTTTHTHHHTHTHTHTTHTHHTHTHTHTTTTHTHHHTHTH"
flips4 = "HTHHHHHHHTHTTHHTTHHHTHTHTTTHHTHHHTHHTTHTTTTTTTTTHTHHTTTTTHTHTHTHHTTHTTHTTTTTHHHTHTTTHTHTHHHTHTTTTHTHTHHTTHTHTTHHTHTHHHHTHTTHHTTHTTHTTHTHHHHHHTTTTTTHHHTTHTHHHHTTTHTTHHHTTHTHHTTTHHTHHTTTHTHHTHHHTHHTTHHH"
flips5 = "HHHHHHHHHHHTTTTTTTTTTTHHHHHHHHHHHHTTTTTTTTTTTHHHHHHHHHHHHHTTTTTTTTTTHHHHHHHHHHTTTTTTTTHHHHHHHHTTTTTTTHHHHHHHHHTTTTTTTTTHHHHHHHHTTTHHHHHHHHHHHTTTTTTTTTTTHHHHHHHHHHHHTTTTTTTTTTTHHHHHHHHHHHHHTTTTTTTTTTHH"
flips6 = "TTHTTTHTTTTTTTHTHTHTHTTHTTHTHHTHHTTTHHTHTTTHTHHTHHHTHTTHHTHHTTHTHTTTTHTHTTTHHTTTTTTTTHTHHTTHTTTTTTHTHTHTHTTTHTTHHTTHTTTHHTTTHTTHTTTTHTTTTHHTTTHTHTHHHTTTTTTHTHHTTTTTTTTTTTTHHHTTTHHHTTTHTTTHTHTTHTTTTTHT"
# you can use the function below to split the above sequences in vectors of flips
split = function(str) strsplit(str, split="")[[1]]
split(flips1)
# [1] "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H"
# [24] "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H"
# [47] "T" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "H" "T" "H" "T" "H" "T" "H" "T"
# [70] "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "H" "T"
# [93] "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T"
# [116] "H" "T" "H" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T"
# [139] "H" "T" "H" "T" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T"
# [162] "H" "T" "H" "T" "H" "T" "H" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T"
# [185] "H" "T" "H" "T" "H" "T" "H" "H" "T" "H" "T" "H" "T" "H" "T" "H"
Response goes here: