5. Statistical modeling and Monte Carlo estimation

(1) Statistical Modeling

Statistical models : to imitates & approximates “data generating process”

https://i.ytimg.com/vi/yQhTtdq_y9M/maxresdefault.jpg

a. Four objectives of statistical models

1) Quantify Uncertainty

  • ex Q) the probability of xxxx is 57%. How sure are you about the number ‘57’?
  • ex A) confidence interval (51,63) with 99% confidence

2) Inference

  • ex) We only know 10% of people who supported the candidate. What if we extend to the total population?

3) Measure support for hypothesis

  • Polling example - hypothesis that the candidate is more popular with men than with women. 50% of women and 52% of men favored the candidate. Is this strong enough to support the hypothesis?

4) Prediction

  • Model which does not produce realistic predictions will be useless! Very important
  • Have same objective as Machine Learning.
    If Machine Learning focuses on the 4) Prediction, Statistical models strive to balance these four objectives!

b. Modeling Process

  • 1 ) Understand the problem
  • 2 ) Plan & Collect data
  • 3 ) Explore data ( + visualization )
  • 4 ) Postulate model
  • 5 ) Fit model
  • 6 ) Check Model
  • 7 ) Iterate 4)~6)
  • 8 ) Use model

(2) Bayesian Modeling

a. Components of Bayesian models (review)

1) likelihood : \(P(y\mid \theta)\)

2) prior : \(P(\theta)\)

3) posterior : \(P(\theta\mid y) = \frac{P(\theta,y)}{P(y)}= \frac{P(\theta,y)}{\int P(\theta,y)d\theta}= \frac{P(y\mid \theta)P(\theta)}{\int P(y\mid \theta)P(\theta)d\theta}\) And before fitting a model, we need to specify these components.

b. Posterior derivation

There can be a model with multiple levels (hierarchical). That is, a prior follows a certain distribution, and the parameter of that distribution follows another distribution, … and so on.
Let’s take a look at this expression.

\(X_i\mid \mu,\sigma^2 \sim N(\mu, \sigma^2)\)

\(\mu \mid \sigma^2 \sim N(m,\frac{\sigma^2}{w})\)

\(\sigma^2 \sim InvGamma(\alpha,\beta)\)

We can derive a posterior like below!

\(\begin{align*} P(X_1,...X_n,\mu,\sigma^2)&=P(X_1,..X_n\mid \mu,\sigma^2)P(\mu\mid \sigma^2)P(\sigma^2)\\ &=\prod_{i=1}^{n}[N(X_i\mid \mu,\sigma^2)]\times N(\mu|m,\frac{\sigma^2}{w})\times\tau^{-1}(\sigma^2\mid \alpha,\beta)\\ &\sim P(\mu,\sigma^2\mid X_1...X_n) \end{align*}\)

(3) Monte Carlo Estimation

a. Monte Carlo Estimation

Monte Carlo estimation refers to simulating hypothetical draws from a probability distribution. It is a method to calculate important quantities of that distribution, such as mean, variance.
In Monte Carlo estimation, it simulates a large number of draws to approximate the value which is calculated by the integral.
You can understand easily with the following expression.
There is a function of theta like To calculate the mean of this function, , we have to take an integral like the below.

\(\int h(\theta)p(\theta)d(\theta\))

But instead of using integral, by sampling a large sample, we can approximate this like below. The average of these samples converges in probability to the true mean, by the Law of Large Numbers.

\(E[h(\theta)] = \int h(\theta)p(\theta)d\theta \approx \frac {1}{m} \sum_{i=1}^{m}h(\theta_i^m)\)

b. Monte Carlo error and marginalization

[ error ]

The sample mean, theta star bar, will follow the distribution below.

\(\bar{\theta^*} \sim N(E(\theta), \frac{Var(\theta)}{m})\)

And the sample variance can be calculated like this.

\(\widehat{Var(\theta)} = \frac{1}{m}\sum_{i=1}^{m}(\theta_i^*-\bar{\theta^*})^2\)

And if we divide the sample variance and take a square root, this will be a standard error.

\(\sqrt{\frac{\widehat{Var(\theta)}}{m}}\)

[ marginalization ]

Let’s say Y follows a Binomial distribution, with n=10 and parameter phi.
And phi follows a Beta distribution with alpha=2,beta=2.

\(y\mid \phi \sim Bin(10,\phi)\)

\(\phi \sim Beta(2,2)\)

The followings are the steps of simulation.

  • Step 1) sample phi* from a beta distribution
  • Step 2) given, phi, draw yi ~ Bin(10,phi*)
  • Then you get a sample (pair) of (phi,yi)
    One major advantage of this (MC simulation) is that you can easily calculate the marginal distribution. In the case of integral, it might have been hard to marginalize out phi, but in this case, using the sample pair, you just discard all the phi’s and use yi’s.

(4) Monte Carlo Estimation with R

a. Monte Carlo Simulation (1)

set.seed(32)

################################
### montecarlo simulation 1 ####
################################

m = 100 # with sample size 100
a = 2 # shape parameter (for gamma distribution)
b = 1/3 # rate parameter

theta = rgamma(n=m,shape=a,rate=b) # random gamma distribution

hist(theta,freq=FALSE)
curve(dgamma(x,shape=a,rate=b),col='blue',add=TRUE)

sum(theta)/m # MC approximation to mean
mean(theta)
a/b # true value of expected value

var(theta) # MC approximation to variance
a/b^2 # true value of expected value


################################
### montecarlo simulation 2 ####
################################

m = 1000 # with sample size 1000
a = 2 # shape parameter (for gamma distribution)
b = 1/3 # rate parameter

theta = rgamma(n=m,shape=a,rate=b) # random gamma distribution

hist(theta,freq=FALSE)
curve(dgamma(x,shape=a,rate=b),col='blue',add=TRUE)

sum(theta)/m # MC approximation to mean
mean(theta)
a/b # true value of expected value

var(theta) # MC approximation to variance
a/b^2 # true value of expected value

###############
### Example ###
###############

# probability that this theta is less than 5?

## estimation
ind = theta < 5.0
mean(ind)

## true value
pgamma(5.0, shape=a, rate=b)



b. Monte Carlo Simulation (2)

# use CLT to approximate how accurate MC estimations are

# larger m, smaller variance
set.seed(32)
m = 10000
a = 2
b = 1/3

theta = rgamma(n=m, shape=a, rate=b)
se = sd(theta) / sqrt(m)

# Confidence Interval
mean(theta) - 2*se
mean(theta) + 2*se

# result : We are reasonably confident that the true mean of gamma distribution is between 5.91 ~ 6.08

#################
### example 1 ###
#################

ind = theta<5
mean(ind) # 0.5004
pgamma(5.0,shape=a,rate=b)

se = sd(ind) / sqrt(m)
2*se

# result : We are reasonably confident that the estimate of probability (0.5004) is within 0.01 of the true value


#################
### example 2 ###
#################

# 1) Simulate phi_i from Beta(2,2)
# 2) Simulate y_i from Binom(10, phi_i)

m = 1e5

y = numeric(m)
phi = numeric(m)

## TOO SLOW ##
'''
for (i in 1:m){
  phi[i] = rbeta(1, shape1=2.0, shape2=2.0)
  y[i] = rbinom(1,size=10, prob=phi[i])
}
'''

### FASTER ###
# (with vector)
phi = rbeta(m,shape1=2.0, shape2=2.0)
y = rbinom(m,size=10, prob=phi)

table(y)