link to source

Instructions

Complete the exercises, update the “author” and “date” fields in the header, knit it, and submit both the HTML and RMD files to Canvas. Due date: **Oct 13, 2021 at 11:59pm".


Exercise 1: Generalized birthday problem

[20 points]

The birthday problem asks for the probability that in a group of \(n\) people, at least 2 people will share the same birthday. This is easy to solve, and the solution is easily found online.

We can generalize this to a more difficult problem and solve it using a Monte Carlo approach: in \(n\) people, what is the probability that at least \(k\) people have the same birthday?

Write a function birthday(n,k,i) that returns a probability estimate given 3 arguments:

Notes:

Hints:

  1. There’s no need to use actual dates in the simulation process. Numbers can represent dates and are easier to generate and manipulate in R. In particular, we recommend using the sample() function with the x, size, and replace arguments set appropriately. See the help page ?sample for details.
  2. Given a vector of numbers, you can easily find duplicates by using the table() function. This will produce a named vector showing how many of each value there are. For example, running table(c(1,3,5,5,7,9,9,9)) will show you there is one 1, one 3, two 5s, one 7, and three 9s.
  3. In your function, you will need to use a for loop to repeat the simulation i times. You will also need a variable outside your for loop to keep track of how many simulations satisfy that # of birthdays \(\geq k\).
  4. If your function is running correctly, then birthday(n=23, k=2), birthday(n=87, k=3) and birthday(n=188, k=4) should all be approximately \(50\%\).
  5. If your function is very slow, consider using the Table function from the Rfast package, which is 4-5 times faster than the normal table() function.
# complete the function
# note i=1000 sets the default value of i to be 1000
birthday = function(n,k,i=1000){
  # code goes here
}

This class currently has 162 enrolled students. What is the approximate probability that at least \(4\) students have the same birthdays?

ANSWER HERE


Exercise 2: Simulate RV

[15 points]

\(X\) is a random variable defined between 0 and 1 with the probability density function \(f(x)=2x\). Note this means the cumulative distribution function is \[F(x)=\int_0^xf(x)dx=x^2\] Write a function rx(n) to sample from this random variable, where n is the size of the sample to be drawn. Then, use your function to draw a sample of 500 and plot a histogram of the output.

# defining pdf of X
pdf_x = Vectorize(function(x){
  if(x>0 & x<1){2*x} else 0
})

# showing pdf on plot
ggplot() + geom_function(fun=pdf_x,n=10001) + theme_minimal() + 
  xlim(c(-1,2)) + ylim(-1,3) + labs(x='x',y='f(x)')

# complete the function
rx = function(n){
  # code goes here
}

# uncomment the following line of code and check it looks correct
# hist(rx(500))

Exercise 3: Testing coin flips

[15 points]

In the six sequences below, only one of them is actually randomly generated from a fair coin. Use a combination of everything you know (common sense, monte carlo, hypothesis testing, etc.) to identify which is actually random and explain your reasoning.

flips1 = "HTHTHTHTHTHTHTHTHTHTHTHTHTHTHTHTHHTHTHTHTHTHTHTTHTHTHTHTHTHTHHTHTHTHTHTHTHTHTHTHTHTHTHTHTHHTTHTHTHTHTHTHTHTHTHTHTHTHTHHTHTHTHTHTHTHTHTHTHTHTHTTHTHTHTHTHTHTHTHTHTHTHTHTHHTHTHTHTHTHTHTHTHTHTHTHHTHTHTHTH"

flips2 = "HHHTHTTTHHTHHTHHHTTTTHTHTHHTTHTHHHTHHTHTTTHTHHHTHTTTHTHTHHTHTHTTHTHHTHTHTTTHTHHHTHTHTTHTHTHHTHTHTHHHTHTTTHTHHTHTHTHHTTTHTHHTHHTTTTHTHTHHHTHTTHTHHTHTHTTHTHHTHTHHHTHHHTHTTTHTTHTTTHTHHHTHTHTTHTHHTHHTHTTT"

flips3 = "HHTHTHTTTHTHHHTHHTTTHTHHTHTTTHTHTHHTHTHTTHTHHHHHHTTTHTHTHHTHTTTHTHHTHTHTTTHTHHHTTHTTTHTHTHHHHTHTTHHTTTTTHTHHHTHTHTTTTTHHHTHHTHHTHHHTTTTHTHTHHHTHHTTTTTHTHHHTHTHTHTTTHTHHHTHTHTHTTHTHHTHTHTHTTTTHTHHHTHTH"

flips4 = "HTHHHHHHHTHTTHHTTHHHTHTHTTTHHTHHHTHHTTHTTTTTTTTTHTHHTTTTTHTHTHTHHTTHTTHTTTTTHHHTHTTTHTHTHHHTHTTTTHTHTHHTTHTHTTHHTHTHHHHTHTTHHTTHTTHTTHTHHHHHHTTTTTTHHHTTHTHHHHTTTHTTHHHTTHTHHTTTHHTHHTTTHTHHTHHHTHHTTHHH"

flips5 = "HHHHHHHHHHHTTTTTTTTTTTHHHHHHHHHHHHTTTTTTTTTTTHHHHHHHHHHHHHTTTTTTTTTTHHHHHHHHHHTTTTTTTTHHHHHHHHTTTTTTTHHHHHHHHHTTTTTTTTTHHHHHHHHTTTHHHHHHHHHHHTTTTTTTTTTTHHHHHHHHHHHHTTTTTTTTTTTHHHHHHHHHHHHHTTTTTTTTTTHH"

flips6 = "TTHTTTHTTTTTTTHTHTHTHTTHTTHTHHTHHTTTHHTHTTTHTHHTHHHTHTTHHTHHTTHTHTTTTHTHTTTHHTTTTTTTTHTHHTTHTTTTTTHTHTHTHTTTHTTHHTTHTTTHHTTTHTTHTTTTHTTTTHHTTTHTHTHHHTTTTTTHTHHTTTTTTTTTTTTHHHTTTHHHTTTHTTTHTHTTHTTTTTHT"

# you can use the function below to split the above sequences in vectors of flips
split = function(str) strsplit(str, split="")[[1]]
split(flips1)
 #   [1] "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H"
 #  [24] "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H"
 #  [47] "T" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "H" "T" "H" "T" "H" "T" "H" "T"
 #  [70] "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "H" "T"
 #  [93] "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T"
 # [116] "H" "T" "H" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T"
 # [139] "H" "T" "H" "T" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T"
 # [162] "H" "T" "H" "T" "H" "T" "H" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T" "H" "T"
 # [185] "H" "T" "H" "T" "H" "T" "H" "H" "T" "H" "T" "H" "T" "H" "T" "H"

Response goes here: