link to source

Monte Carlo is a class of simulation methods that use random number generation to solve a wide range of problems from prediction to estimation to testing. They have been used as early as the 1930s, and are now more powerful than ever due to modern technological advancements.

Origin of name

Stanisław Ulam was working on top secret nuclear weapons projects in the 1940s at Los Alamos National Laboratory when he invented one of the most modern Monte Carlo methods. John von Neumann, one of the fathers of computing, helped him write the first computer program to run the algorithm.

Due to the secret nature of the research, they needed a codename for it. Their colleague Nicholas Metropolis suggested the name Monte Carlo after the Monte Carlo Casino in Monaco, due to the inherent randomness of the method and due to Ulam’s uncle frequently borrowing gambling money because he “just had to go to Monte Carlo”. 1

Despite being limited by the computational power available at the time, this method proved very powerful and is widely used today in many fields of research.

Example 0: estimating \(\pi\)

As an initial motivating example, suppose we don’t know any digits of \(\pi\), but we do know it’s the ratio of the area of a circle to its radius. How can we use random numbers to estimate \(\pi\)?

One thing we can do is generate uniformly-spaced points in the square below:

If we then count how many of the points are inside the circle, we can estimate the percentage of the square that is covered by the circle, and use that to estimate \(\pi\).

# choose N (simulation size)
N = 1000

# use this to count how many points are in circle
in.circ = 0

# repeat steps N times
for(i in 1:N){
    
    # for each point, generate 2 coordinates (x,y) randomly between -1 and 1
    point = runif(n=2, min=-1, max=1)
    
    # to be inside circle, must satisfy x^2 + y^2 < 1
    if(point[1]^2 + point[2]^2 < 1){
        
        # if inside, add to count
        in.circ = in.circ+1
    }
}

# to get proportion of square covered, take in.circ/N
prop = in.circ/N

# to get estimate of pi, take prop * 4
pi.mc = prop * 4
# what are our estimate and percentage error?
cat(sprintf("estimate: %.4f\n %% error: %.2f%%",pi.mc,abs(pi.mc-pi)/pi*100))
## estimate: 3.1160
##  % error: 0.81%

Some theory

It should make sense intuitively why this works, but it can also be shown mathematically to be correct.

Let \(X\) be the RV of whether a uniformly randomly sampled point in the square lies inside the circle. So, \(X=1\) with probability \(\dfrac\pi4\) and \(X=0\) with probability \(1-\dfrac\pi4\). Let \(X_i\) represent repeated I.I.D. samples of this RV.

Now, let’s think carefully about how we got our previous estimate. First, we randomly and uniformly sample \(N\) points inside the square. This can be represented by sampling \(X_1,X_2,...,X_N\). Then, we counted the proportion of them that lie inside the circle. This can be represented by adding the \(X_i\) in our sample and dividing by \(N\). This proportion gives us the percentage of the area of the square that is covered by the circle. Thus, to get our final estimate for \(\pi\), we just need to multiply by \(4\).

Thus, our estimator can be represented as \[\hat\pi:=4*\frac1N\sum_{i=1}^NX_i\]

We can easily derive the expectation and variance of this estimator: \[\begin{align} E(\hat\pi)&=E\left(\frac4N\sum_{i=1}^NX_i\right)\\ &=\frac4NE\left(\sum_{i=1}^NX_i\right)\\ &=\frac4N\sum_{i=1}^NE(X_i)\\ &=\frac4N\sum_{i=1}^N\left[(1)(\tfrac\pi4)+(0)(1-\tfrac\pi4)\right]\\ &=\frac4N\sum_{i=1}^N\frac\pi4\\ &=\frac4N\left(N*\frac\pi4\right)\\ &=\pi \end{align}\]

\[\begin{align} Var(\hat\pi)&=Var\left(\frac4N\sum_{i=1}^NX_i\right)\\ &=\left(\frac4N\right)^2Var\left(\sum_{i=1}^NX_i\right)\\ &=\frac{16}{N^2}\sum_{i=1}^NVar(X_i)\\ &=\frac{16}{N^2}\sum_{i=1}^N\left[(1-\tfrac\pi4)^2(\tfrac\pi4)+(0-\tfrac\pi4)^2(1-\tfrac\pi4)\right]\\ &=\frac{16}{N^2}\sum_{i=1}^N\frac\pi4\left(1-\frac\pi4\right)\\ &=\frac{16}{N^2}\left(N*\frac\pi4\left(1-\frac\pi4\right)\right)\\ &=\frac{\pi(4-\pi)}N \end{align}\]

bonus exercise: derive same equations for \(E\) and \(Var\) by treating \(\sum X_i\) as a binomial variable and applying the binomial RV equations for \(E\) and \(Var\).

We can easily verify the truth of these formulae by simulating multiple “runs” of this experiment and looking at the distribution of estimates we obtain.

# choose M (number of times to repeat MC experiment)
M = 1000

# create vector to save results in
mc.est = rep(NA,M)

# for each experiment, do all the steps done before, get an estimate, and save it
for(j in 1:M){
    
    # these lines are copied exactly from above
    N = 1000
    in.circ = 0
    for(i in 1:N){
        point = runif(n=2, min=-1, max=1)
        if(point[1]^2 + point[2]^2 < 1){
            in.circ = in.circ+1
        }
    }
    prop = in.circ/N
    pi.mc = prop * 4
    
    # save result in vector
    mc.est[j] = pi.mc
}
# what do the estimates look like?
options(max.print=500)
mc.est
##   [1] 3.132 3.188 3.164 3.144 3.152 3.244 3.212 3.148 3.168 3.224 3.136 3.076
##  [13] 3.088 3.128 3.152 3.072 3.108 3.140 3.200 3.168 3.172 3.144 3.068 3.176
##  [25] 3.108 3.084 3.136 3.092 3.136 3.088 3.128 3.148 3.120 3.120 3.156 3.232
##  [37] 3.148 3.204 3.148 3.164 3.160 3.140 3.144 3.232 3.084 3.128 3.168 3.104
##  [49] 3.136 3.072 3.088 3.072 3.104 3.172 3.260 3.056 3.220 3.064 3.116 3.200
##  [61] 3.200 3.092 3.104 3.140 3.028 3.092 3.040 3.092 3.164 3.112 3.076 3.128
##  [73] 3.220 3.204 3.080 3.116 3.108 3.168 3.120 3.144 3.112 3.168 3.104 3.036
##  [85] 3.116 3.116 3.140 3.160 3.188 3.184 3.068 3.188 3.184 3.000 3.152 3.140
##  [97] 3.156 3.112 3.136 3.152 3.180 3.128 3.124 3.168 3.192 3.120 3.172 3.168
## [109] 3.116 3.228 3.148 3.232 3.100 3.140 3.160 3.188 3.172 3.116 3.204 3.112
## [121] 2.972 3.168 3.096 3.140 3.132 3.148 3.120 3.120 3.196 3.080 3.156 3.152
## [133] 3.152 3.028 3.172 3.128 3.088 3.208 3.164 3.096 3.176 3.120 3.156 3.140
## [145] 3.092 3.088 3.028 3.140 3.168 3.128 3.168 3.104 3.200 3.112 3.200 3.156
## [157] 3.192 3.180 3.084 3.128 3.140 3.112 3.112 3.152 3.228 3.188 3.084 3.124
## [169] 3.112 3.144 3.116 3.128 3.256 3.132 3.136 3.168 3.108 3.260 3.184 3.148
## [181] 3.152 3.128 3.108 3.156 3.204 2.976 3.064 3.040 3.148 3.072 3.144 3.092
## [193] 3.168 3.176 3.196 3.172 3.148 3.148 3.168 3.164 3.080 3.156 3.096 3.100
## [205] 3.136 3.180 3.180 3.048 3.144 3.232 3.232 3.120 3.076 3.180 3.152 3.176
## [217] 3.080 3.152 3.152 3.148 3.104 3.140 3.184 3.260 3.148 3.176 3.196 3.080
## [229] 3.176 3.084 3.156 3.156 3.092 3.076 3.132 3.124 3.188 3.216 3.084 3.184
## [241] 3.216 3.096 3.148 3.196 3.212 3.124 3.180 3.120 3.152 3.212 3.120 3.076
## [253] 3.060 3.116 3.128 3.180 3.040 3.200 3.140 3.124 3.116 3.096 3.168 3.160
## [265] 3.104 3.184 3.144 3.140 3.208 3.156 3.084 3.132 3.164 3.128 3.140 3.080
## [277] 3.224 3.028 3.060 3.124 3.088 3.168 2.976 3.044 3.096 3.284 3.240 3.088
## [289] 3.164 3.084 3.148 3.168 3.076 3.116 3.152 3.140 3.116 3.148 3.084 3.164
## [301] 3.236 3.100 3.124 3.116 3.148 3.144 3.172 3.180 3.080 3.184 3.104 3.000
## [313] 3.148 3.124 3.108 3.204 3.136 3.248 3.128 3.236 3.120 3.148 3.208 3.232
## [325] 3.196 3.192 3.172 3.204 3.084 3.148 3.148 3.132 3.080 3.076 3.184 3.196
## [337] 3.088 3.084 3.116 3.116 3.176 3.136 3.100 3.216 3.184 3.184 3.188 3.168
## [349] 3.204 3.156 3.120 3.044 3.264 3.184 3.060 3.180 3.120 3.128 3.140 3.128
## [361] 3.084 3.236 3.176 3.092 3.016 3.100 3.192 3.120 3.088 3.104 3.156 3.152
## [373] 3.100 3.108 3.108 3.112 3.176 3.128 3.096 3.176 3.152 3.176 3.172 3.104
## [385] 3.220 3.228 3.124 3.132 3.092 3.136 3.176 3.008 3.168 3.196 3.172 3.228
## [397] 3.132 3.132 3.080 3.104 3.136 3.224 3.080 3.112 3.108 3.120 3.076 3.224
## [409] 3.228 3.160 3.212 3.124 3.048 3.100 3.184 3.180 3.168 3.216 3.132 3.160
## [421] 3.132 3.064 3.136 3.140 3.156 3.100 3.076 3.188 3.112 3.108 3.160 3.204
## [433] 3.140 3.152 3.136 3.164 3.184 3.120 3.192 3.128 3.124 3.128 3.128 3.116
## [445] 3.076 3.160 3.136 3.136 3.188 3.196 3.168 3.204 3.088 3.132 3.212 3.140
## [457] 3.180 3.048 3.192 3.132 3.144 3.152 3.176 3.244 3.156 3.140 3.096 3.112
## [469] 3.140 3.124 3.096 3.132 3.152 3.144 3.148 3.144 3.100 3.216 3.080 3.232
## [481] 3.108 3.124 3.128 3.060 3.100 3.084 3.168 3.188 3.104 3.004 3.148 3.172
## [493] 3.176 3.144 3.128 3.236 3.088 3.116 3.088 3.116
##  [ reached getOption("max.print") -- omitted 500 entries ]
mean(mc.est)
## [1] 3.139716
var(mc.est)
## [1] 0.002657185
var.theory = pi*(4-pi)/N
var.theory
## [1] 0.002696766
# deviation of our mean and variance from theory:
cat(sprintf("%% deviation from E  : %.3f%% \n%% deviation from Var: %.3f%%",
            abs(mean(mc.est)-pi)/pi*100,abs(var(mc.est)-var.theory)/var.theory*100))
## % deviation from E  : 0.060% 
## % deviation from Var: 1.468%

Important notes:

Generalization

We can generalize this a little so you can see a bigger picture of what we’re doing (without going into too much detail).

Suppose we want to find the following integral \[I=\int_Df(x)\,dx\] where the integral is taken over some arbitrary region \(D\). If we can write \[f(x)=g(x)\cdot p(x)\] for some \(g\) and \(p\), then we have \[I=\int_Dg(x)p(x)\,dx=E(g(x))\] Then, we just need to take a large sample of \(g(x_i)\) and find its expectation to estimate the integral. This is all analogous to what we did before in Example 0

Example 1: Buffon’s needle

Asked by Georges-Louis Leclerc, Comte de Buffon in 1700s:

Suppose we have a floor made of parallel strips of wood, each the same width, and we drop a needle onto the floor. What is the probability that the needle will lie across a line between two strips? 2

This question is actually not that hard to derive, but we can very easily solve it using MC!

Suppose we again take the same square from the previous example with endpoints at \((\pm1,\pm1)\) and divide it into 2 halves: the top half is one strip of wood, and the bottom half is another strip of wood.

library(ggplot2)
ggplot() + geom_rect(aes(xmin=-1,xmax=1,ymin=0,ymax=1),fill="grey30",color=NA) + 
  geom_rect(aes(xmin=-1,xmax=1,ymin=-1,ymax=0),fill="grey80",color=NA) +
  scale_x_continuous(expand=c(0,0)) + scale_y_continuous(expand=c(0,0))

We start by randomly sampling a point somewhere in this square, then we randomly pick an angle and extend the ends of the needle each 0.5 units of length. Finally, we check if the needle crosses any of the lines \(y=0,\pm1\).

Let’s start by plotting our function to make sure it’s doing what we want.

# define needle dropping function
needle = function(){
  
  # randomly pick center of needle
  xy = runif(2,-1,1)
  
  # randomly pick angle
  angle = runif(1,0,pi)
  
  # calculate delta x and delta y of ends from center
  # we are defining the angle using the standard definition
  # (going counterclockwise from positive x axis)
  dx = cos(angle)*0.5
  dy = sin(angle)*0.5
  
  # calculate coordinates of end points
  # note in this way, our dy is always positive, so p1 is always higher than p2
  # this will be useful later for checking crossing
  p1 = setNames(xy + c(dx,dy), c("p1x","p1y"))
  p2 = setNames(xy - c(dx,dy), c("p2x","p2y"))
  
  # return endpoints
  return(c(p1,p2))
}

# plot a few needle drops

# create dummy variable for ldply
library(plyr)
ndl = function(i) needle()
needles = ldply(1:10,ndl)

ggplot() + geom_rect(aes(xmin=-1,xmax=1,ymin=0,ymax=1),fill="grey30",color=NA) + 
  geom_rect(aes(xmin=-1,xmax=1,ymin=-1,ymax=0),fill="grey80",color=NA) + 
  geom_segment(data=needles,mapping=aes(x=p1x,y=p1y,xend=p2x,yend=p2y), color="red",size=1.5)

Now, we run our function a large number of times and check how many cross the lines.

# set size
N = 1000000

# create vector to count crossings
mc.cross = 0

# run function
for(i in 1:N){
  
  # drop a needle
  points = needle()
  p1y = points['p1y']
  p2y = points['p2y']
  
  # check if p1 and p2 are on opposite sides of a line
  if( (p1y > -1 & p2y < -1) || (p1y > 0 & p2y < 0) || (p1y > 1 & p2y < 1) ){
    mc.cross = mc.cross+1
  }
}

Now, we have the solution to our problem.

mc.cross/N
## [1] 0.637215

We can compare with the correct answer and see how close we are.

true.cross = 2/pi
cat(sprintf(" true value: %.5f \n  our value: %.5f \n%% deviation: %.3f%%",
            true.cross,mc.cross/N,abs(true.cross-mc.cross/N)/true.cross*100))
##  true value: 0.63662 
##   our value: 0.63721 
## % deviation: 0.093%

_exercise: Write a function that takes 3 arguments: L for length of needle, T for width of floor, and N for number of replicates; performs the above process; and returns the probability of a crossing.

Random v. pseudorandom

For MC, it’s important to have a good source of random numbers whose distribution is precisely known. This is a surprisingly difficult problem. There are ways of generating (as far as we can tell), almost perfectly uniformly random numbers, such as measuring atmospheric noise, measuring radioactive decay, or even lava lamps (used by Cloudflare). These sources are generally considered capable of producing the most truly random numbers.

Your computer (unless it’s attached to a Geiger counter or a wall of lava lamps) is only capable of producing pseudorandom numbers. These are made by running a pseudorandom number generator algorithm which is deterministic (i.e. same output for same input) such as the Mersenne-Twister (which is used by default in R). Even though these are deterministic, they pass the test of statistical randomness which means as far as we can tell, there are no discernable patterns in the output, and are usually close enough to random for computational purposes.

Some resources if you want to learn more:



Example 2: sampling from arbitrary CDF

Onto more examples! Here’s a much more interesting one: given an arbitrary CDF (or PDF, which you can obviously just integrate to get the associated CDF), how can you draw samples from it?

It can be easily shown that the CDF of a random variable follows the uniform distribution. What this means is that for any arbitrary distribution, if you draw many samples from it, and then plot a histogram of the percentile of each observation (i.e. what proportion of samples in the population are less than your observation), it would always appear uniform.

Examples:

N = 100000
options(repr.plot.width=16)
par(mfrow=c(2,4))
hist(rnorm(N))        ; hist(rchisq(N,10))            ; hist(rexp(N,1))         ; hist(rbeta(N,.5,.5))
hist(pnorm(rnorm(N))) ; hist(pchisq(rchisq(N,10),10)) ; hist(pexp(rexp(N,1),1)) ; hist(pbeta(rbeta(N,.5,.5),.5,.5))

We can use this fact to generate observations from any arbitrary distribution.

Let \(F_X(x)\) be the CDF of some arbitrary variable \(X\), and let \(Y_i\) be uniformly distributed on \([0,1]\). Then, we have that \(F_X^{-1}(Y_i)\simeq X\) where \(F_X^{-1}\) is the inverse of the CDF (which is always nondecreasing and right-continuous), and \(\simeq\) means having the same distribution.

In other words, randomly sampling from the uniform distribution and then applying the inverse of the given CDF function gives the desired target distribution.

We can first demonstrate this by sampling from a chi-square distribution with 12 degrees of freedom:

# reset print width
options(repr.plot.width=7)

# generate N uniform observations between 0,1
x.unif = runif(10000)

# apply built in inverse CDF function for chi-square
x.chi.mc = qchisq(x.unif,12)

# plot results
hist(x.chi.mc)

# compare with build in R method for random generation
hist(rchisq(10000,12))

# plot our sample quantiles against theoretical quantiles for comparison
plot(sort(x.chi.mc),qchisq(ppoints(10000,1),12))

As you can see, the above method produces the exact same output as the built in rchisq( ) generator.

Example 3: hot hands

For the next example, let’s do something slightly more complicated.

A certain professional basketball player believes he has “hot hands” when shooting 3-point shots (i.e. if he makes a shot, he’s more likely to also make the next shot). His friend doesn’t believe him, so they make a wager and hire you, a statistician, to settle the bet.

As a sample, you observe the next morning as the player takes the same 3-point shot 200 times in a row (assume he is well rested, in good physical shape, and doesn’t feel significantly more tired after the experiment), so his level of mental focus doesn’t change during the experiment). You obtain the following results, where Y denotes a success and N denotes a miss:

YNNNNYYNNNYNNNYYYNNNNNYNNNNNNNNNYNNNNNYYNYYNNNYNNNNYNNYYYYNNYYNNNNNNNNNNNNNNNYYYNNNYYYYNNNNNYNYYNNNNYNNNNNNYNNNYNNYNNNNNYNYYYNNYYYNYNNNNYNNNNNNNYYNNYYNNNNNNYNNNYNNNNNNNNYNNNYNNNNNYYNNNNNNYYYYYYNYYNNYN

Note that the existence of a “hot hands” effect means the shots are not indepedent. Also note that there’s a third possibility: that the player is more likely to “choke” and miss the next shot if he scored the previous one (e.g. maybe scoring a shot makes him feel more nervous because he feels like he’s under pressure).

Attempt 1: run length

Since the existence of a hot hands effect tends to increase the run lengths of Ys compared to if the shots were independent, we can use the longest run length as a way of comparing independence vs hot hands (note if the player is a choker, they will tend to have shorter runs of Ys than if they were independent, so you can simply ignore this case for now and compare hot hands v. independence for simplicity).

Now, how exactly do you compare these two situations and determine which is a better fit for the data?

One thing that’s worth noting is that if a sequence of repeated experiments is independent, then it shouldn’t matter what order the results are in. This should be fairly easy to understand and agree with.

Let’s assume that the throws are totally independent. Recall we also assume he doesn’t get tired so his baseline shot-making ability doesn’t change over the course of the experiment. Therefore, we should be able to (under these assumptions) arbitrarily reorder his shots without affecting any statistical properties of his shot sequence. So let’s do that!

We begin by parsing the throws into a vector of Y and N.

# the sequence of throws is broken up into 4 chunks for readbility, then
# paste0 is used to merge them into a single sequence, then
# strplit("YN...N",split="") is used to split the string at every "", so
# we get a vector of each character, and finally
# [[1]] is used to get the vector itself (strsplit actually outputs a list
# with the vector as the first element; [[1]] removes the list wrapper)
# 
# for more info about the strsplit function, see
# https://www.journaldev.com/43001/strsplit-function-in-r

throws = strsplit(
   paste0("YNNNNYYNNNYNNNYYYNNNNNYNNNNNNNNNYNNNNNYYNYYNNNYNNN",
          "NYNNYYYYNNYYNNNNNNNNNNNNNNNYYYNNNYYYYNNNNNYNYYNNNN",
          "YNNNNNNYNNNYNNYNNNNNYNYYYNNYYYNYNNNNYNNNNNNNYYNNYY",
          "NNNNNNYNNNYNNNNNNNNYNNNYNNNNNYYNNNNNNYYYYYYNYYNNYN"), split="")[[1]]

throws
##   [1] "Y" "N" "N" "N" "N" "Y" "Y" "N" "N" "N" "Y" "N" "N" "N" "Y" "Y" "Y" "N"
##  [19] "N" "N" "N" "N" "Y" "N" "N" "N" "N" "N" "N" "N" "N" "N" "Y" "N" "N" "N"
##  [37] "N" "N" "Y" "Y" "N" "Y" "Y" "N" "N" "N" "Y" "N" "N" "N" "N" "Y" "N" "N"
##  [55] "Y" "Y" "Y" "Y" "N" "N" "Y" "Y" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N"
##  [73] "N" "N" "N" "N" "N" "Y" "Y" "Y" "N" "N" "N" "Y" "Y" "Y" "Y" "N" "N" "N"
##  [91] "N" "N" "Y" "N" "Y" "Y" "N" "N" "N" "N" "Y" "N" "N" "N" "N" "N" "N" "Y"
## [109] "N" "N" "N" "Y" "N" "N" "Y" "N" "N" "N" "N" "N" "Y" "N" "Y" "Y" "Y" "N"
## [127] "N" "Y" "Y" "Y" "N" "Y" "N" "N" "N" "N" "Y" "N" "N" "N" "N" "N" "N" "N"
## [145] "Y" "Y" "N" "N" "Y" "Y" "N" "N" "N" "N" "N" "N" "Y" "N" "N" "N" "Y" "N"
## [163] "N" "N" "N" "N" "N" "N" "N" "Y" "N" "N" "N" "Y" "N" "N" "N" "N" "N" "Y"
## [181] "Y" "N" "N" "N" "N" "N" "N" "Y" "Y" "Y" "Y" "Y" "Y" "N" "Y" "Y" "N" "N"
## [199] "Y" "N"

Next, we write a function to get the longest run of Ys in the throw sequence. Here we use a convenient function called rle( ) which is short for run length encoding, which turns our sequence of throws into sequences of runs (e.g. YNNNNYYNNNY becomes something like “1 Y, 4 Ns, 2 Ys, 3 Ns, and 1 Y”). We can then simply take the longest of the Y runs.

longestRun = function(x,target = 'Y'){
    max(0,with(rle(x), lengths[values==target]))
}

longestRun(throws)
## [1] 6

Now, we randomly shuffle the sequence of throws many times and see what the longest Y runs look like for these shuffled sequences.

# set number of reps to use
N = 10000

# create vector to save results in
mc.runs = rep(NA,N)

# for each rep, randomize sequence and find longest run of Y
for(i in 1:N){
    mc.runs[i] = longestRun(sample(throws))
}
options(max.print=500)
mc.runs
##   [1] 5 5 4 3 4 4 3 4 3 4 4 4 4 3 3 4 4 4 4 5 5 3 3 4 3 5 4 3 3 5 4 3 8 5 6 4 4
##  [38] 4 4 4 3 6 4 5 4 4 6 5 4 4 6 3 3 7 4 4 3 3 4 6 3 3 5 4 4 4 7 4 4 3 4 4 4 4
##  [75] 4 5 5 3 3 4 4 4 4 6 5 5 5 4 5 4 4 4 6 3 4 7 4 3 4 4 3 4 4 4 5 5 4 4 4 6 5
## [112] 3 3 4 5 6 3 3 4 5 4 3 4 5 3 5 4 4 4 4 6 3 3 4 4 3 6 4 5 4 4 5 5 4 3 4 6 4
## [149] 3 5 5 4 5 3 3 3 4 4 4 4 3 4 5 3 5 7 4 4 5 5 6 5 4 4 5 4 5 3 4 6 5 5 4 4 3
## [186] 4 4 3 3 3 3 4 3 6 6 3 5 4 4 4 4 3 4 7 5 4 3 4 5 3 4 3 4 3 5 4 3 6 5 7 5 5
## [223] 4 4 4 4 4 3 3 4 4 5 6 4 4 3 4 4 3 6 3 5 4 4 6 5 3 5 3 5 3 3 3 4 5 4 5 8 4
## [260] 5 3 6 4 5 5 4 4 6 5 4 4 5 5 3 3 4 6 3 5 4 4 5 3 5 3 6 3 5 4 4 4 6 4 5 3 3
## [297] 3 5 4 4 4 5 3 5 5 4 4 5 5 4 4 3 4 3 4 4 6 6 4 5 3 3 4 5 6 5 4 5 5 7 4 4 3
## [334] 4 3 6 4 3 6 5 3 4 5 3 6 4 6 4 6 5 7 5 4 3 3 3 4 4 5 3 5 4 3 6 4 5 7 4 4 6
## [371] 5 4 4 3 5 4 5 5 5 5 4 5 5 4 4 3 6 4 5 4 4 4 3 5 3 4 3 4 3 5 5 3 4 4 4 6 4
## [408] 3 6 4 4 4 3 7 5 3 3 4 5 4 4 4 3 3 5 4 3 6 5 4 4 7 5 3 3 4 3 3 5 4 3 5 5 4
## [445] 5 5 3 4 4 3 5 5 4 5 5 4 3 3 8 6 4 3 5 5 4 6 4 4 3 6 3 3 3 5 5 5 3 4 9 4 5
## [482] 4 3 5 3 5 4 5 7 3 3 5 3 4 4 4 4 4 3 9
##  [ reached getOption("max.print") -- omitted 9500 entries ]
hist(mc.runs)

compared to other shuffled sequences, our run length doesn’t seem that unlikely. Therefore, this method seems inconclusive.

Can we find an even better “statistic” to use?

Attempt 2: running odds ratio

Consider every pair of consecutive throws and make a table of the outcomes. For example, the first 8 throws in the sequence are YNNNNYYN. Breaking this into consecutive pairs, we have YN, NN, NN, NN, NY, YY, YN. This gives the table:

NN NY YN YY
3 1 2 1

Suppose we do this for the entire sequence of 200 throws (note this gives you 199 pairs). If we divide the number of NY by the number of NN, we get an estimate for how much more likely he is to make the next shot assuming he missed his last shot.

Similarly, we can divide the number of YY by the number of YN to get an estimate for how much more likely he is to make the next shot assuming he scored his last shot.

Now, note that if the “hot hands” effect really exists in the data, then YY/YN should be larger than NY/NN in a large enough sample. We use this fact to define the following quantity:

\[R=\frac{(\text{# of YY})/(\text{# of YN})}{(\text{# of NY})/(\text{# of NN})}\]

The ratio \(R\) represents, in some sense, how much more likely the player is to make the next shot if he made the previous shot vs if he didn’t make the previous shot (note the vs). This is exactly what we’re trying to investigate!

If there is a “hot hands” effect, the numerator should be greater than the denominator and we should have \(R>1\). If the throws are independent and do not affect each other then in theory we should have \(R=1\). If the player is actually a choker (i.e. he is more likely to miss after a successful shot), then we should have \(R<1\). (Side note: this is basically an odds ratio).

Now, we can use the same general method as the first attempt. If we assume his throws are independent and his shot probability doesn’t change significantly during the experiment, then we can randomly shuffle his throws and no properties should change. So let’s do that!

First, I wrote a function to split the sequence of throws into consecutive pairs and then tabulates them.

# install the runner package if necessary
if(!"runner" %in% rownames(installed.packages())) install.packages("runner")

# define function for tabulating consecutive pairs
tableOfPairs = function(vec){
  return(table(runner::runner(vec,k=2,f=paste,collapse="")[-1]))
}

# test function for correct output
tableOfPairs(strsplit("YNNNNYYN",split="")[[1]])
## 
## NN NY YN YY 
##  3  1  2  1
# run function on original sequence of throws
tableOfPairs(throws)
## 
##  NN  NY  YN  YY 
## 102  34  35  28

Next, I wrote a function that takes the above table as an input and returns the ratio R as defined above.

ratioFromTable = function(tb){
  return(setNames((tb["YY"]/tb["YN"])/(tb["NY"]/tb["NN"]),"R"))
}

# run on our data
ratioFromTable(tableOfPairs(throws))
##   R 
## 2.4
# we can check this is correct by manually computing it
(28/35)/(34/102)
## [1] 2.4

Now we just need to shuffle the sequence and see what this ratio looks like for other sequences.

# set number of reps to use
N = 10000

# create another vector to save results in
mc.runs2 = rep(NA,N)

# for each rep, randomize sequence and find ratio R
for(i in 1:N){
    mc.runs2[i] = ratioFromTable(tableOfPairs(sample(throws)))
}
options(max.print=500)
round(mc.runs2,2)
##   [1] 0.64 1.24 1.65 0.64 1.43 0.81 1.16 1.76 0.77 0.93 1.20 0.90 1.33 0.72 1.24
##  [16] 0.64 1.24 0.84 1.43 0.52 0.84 0.90 0.90 0.93 1.38 1.88 1.12 1.70 1.76 0.84
##  [31] 1.24 0.81 1.16 1.24 1.43 1.04 1.04 2.31 0.81 0.57 0.81 0.59 0.75 0.67 1.29
##  [46] 0.81 1.70 1.53 0.81 0.81 1.20 0.57 1.29 1.95 0.72 0.75 1.38 1.38 1.24 0.72
##  [61] 0.90 0.97 1.01 0.93 0.77 0.61 1.04 1.53 0.57 1.01 1.24 1.12 0.81 1.43 0.72
##  [76] 1.12 0.57 1.08 1.12 0.72 1.38 0.86 1.59 0.69 1.65 1.16 1.43 0.64 1.01 0.93
##  [91] 1.43 1.12 0.64 0.84 0.61 1.43 0.52 0.35 0.93 1.24 1.88 0.67 0.67 1.20 0.84
## [106] 0.90 1.33 1.08 0.93 1.33 0.77 0.81 1.01 0.81 0.93 0.72 1.29 0.81 1.01 0.81
## [121] 0.67 0.52 1.24 0.84 0.75 1.53 1.12 0.59 0.64 0.69 1.04 0.90 1.01 0.64 0.90
## [136] 1.04 0.67 0.59 0.90 1.53 0.81 1.04 0.90 0.51 0.75 0.90 0.72 0.93 1.01 0.34
## [151] 0.81 0.86 0.52 0.81 1.20 1.08 1.24 0.81 1.01 0.90 0.72 1.16 0.93 1.38 0.52
## [166] 1.12 0.93 0.81 0.67 0.93 1.76 0.59 0.84 0.72 1.04 0.84 0.93 1.01 0.75 1.16
## [181] 0.46 0.72 0.67 0.51 1.24 1.01 1.12 1.53 0.93 0.57 0.67 0.72 1.29 1.16 1.38
## [196] 0.84 1.08 2.31 0.90 0.84 0.90 0.67 1.01 1.24 0.90 0.67 1.88 0.93 1.33 1.01
## [211] 0.90 1.24 1.24 1.01 0.75 1.53 1.59 0.59 0.72 1.53 1.29 0.75 1.01 1.16 0.81
## [226] 1.04 1.01 1.12 0.81 1.01 0.51 0.90 1.12 1.16 1.01 0.81 0.45 2.08 0.57 0.84
## [241] 0.67 0.93 0.90 1.88 1.16 0.39 1.70 1.04 0.35 0.81 0.84 1.16 0.84 0.67 0.46
## [256] 1.12 1.24 2.08 1.53 1.01 0.90 1.24 0.64 2.08 1.16 1.04 0.97 1.04 1.70 1.04
## [271] 0.97 0.72 1.04 1.08 0.72 0.90 0.90 1.12 1.59 0.90 0.75 1.33 0.46 1.20 1.01
## [286] 0.90 0.67 0.59 1.01 0.75 1.12 1.65 1.04 1.43 1.12 0.93 1.01 1.12 1.29 1.38
## [301] 1.12 2.08 1.04 0.67 1.16 1.12 1.24 1.29 1.16 1.08 0.54 0.35 0.93 1.33 1.12
## [316] 1.12 0.90 0.84 0.84 0.46 1.24 1.12 1.01 1.29 1.38 2.16 0.81 0.97 1.16 1.16
## [331] 0.90 0.84 1.16 0.86 1.53 0.90 1.01 0.84 1.24 0.90 1.12 1.29 1.59 1.43 1.16
## [346] 0.59 1.04 1.12 1.53 0.72 1.29 0.72 2.08 1.04 0.84 0.59 1.20 1.01 0.81 0.57
## [361] 1.12 0.90 1.01 1.33 0.61 1.43 0.75 0.75 0.72 1.43 1.16 0.81 1.24 1.12 0.81
## [376] 0.93 1.33 0.72 1.16 1.04 1.95 1.59 0.46 0.52 1.01 1.24 0.84 1.12 1.29 1.38
## [391] 0.86 0.57 0.75 0.75 1.04 1.33 0.93 1.04 0.64 1.12 1.12 0.93 0.93 0.64 0.90
## [406] 1.16 1.04 1.43 0.90 0.93 0.84 1.29 1.48 0.72 1.29 0.75 1.59 0.72 0.64 0.72
## [421] 1.12 0.90 1.16 0.59 0.75 1.24 0.59 1.16 1.01 1.12 1.76 1.88 2.25 1.29 1.48
## [436] 1.01 0.81 0.97 1.38 1.04 0.93 1.38 0.90 1.24 1.33 1.33 0.81 1.24 1.88 0.59
## [451] 1.01 1.12 1.12 0.84 0.93 1.04 1.01 1.01 0.81 0.64 0.75 0.90 1.76 0.45 0.93
## [466] 0.93 1.24 1.04 0.45 1.04 0.84 0.67 1.04 0.64 0.81 0.67 0.77 0.51 0.72 1.83
## [481] 1.16 0.72 1.16 0.84 1.01 1.24 0.59 0.72 1.12 0.93 1.24 1.76 0.77 1.01 1.04
## [496] 1.53 1.38 2.56 1.12 0.67
##  [ reached getOption("max.print") -- omitted 9500 entries ]
hist(mc.runs2)

Now we can see our original ratio of \(R=2.4\) seems extremely unlikely! In particular, most of the shuffled statistics are centered around 1 (which is what we expect, since we established \(R=1\) for independent sequences).

This method (which is a little more refined than the simpler run length method) appears to show that our original sequence isn’t well explained by the throws being independent. Since \(R=2.4\gg1\) and this result appears unlikely to happen under independence, we may conclude the player does actually have hot hands.







Appendix: Comment about how the throws were generated

As you may have guessed, the throws were intentionally generated to have a “hot hands” effect. The first throw was randomly chosen with a \(35\%\) chance of success (chosen based on a quick Google search of average 3-point shot rates). After that, the probability of success of the next shot was dependent on the success of the previous shot. If the previous attempt was a success, this was raised to \(45\%\); if it missed, this was lowered to \(25\%\).

n = 200

throws = rep(NA,n)

for(i in 1:n){
  if(i==1){
    throws[i] = sample(c("Y","N"),1,prob=c(0.35,0.65))
  } else {
    if(throws[i-1]=="Y"){
      throws[i] = sample(c("Y","N"),1,prob=c(0.45,0.55))
    } else{
      throws[i] = sample(c("Y","N"),1,prob=c(0.25,0.75))
    }
  }
}

The numbers were chosen this way so that the average number of 3-point shots made is still close-ish to \(35\%\) (it’s \(31.25\%\) to be exact) and so that the effect is large enough to be detectable, but not so much so that the run-length statistic will also be able to detect an effect (which might happen if the strength of the effect was raised by raising/lowering the numbers even more).

200 was used as the number of throws in the experiment as a balance between lowering the variance of the ratio \(R\) of the generated sequence while still being somewhat plausible for a professional basketball player to achieve.


  1. Metropolis, N. (1987). “The beginning of the Monte Carlo method”. Los Alamos Science: 125–130.↩︎

  2. https://en.wikipedia.org/wiki/Buffon%27s_needle_problem↩︎