STAT 431 — Applied Bayesian Analysis
Applied Bayesian Analysis
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STAT 431 — Applied Bayesian Analysis — Course Notes
Topics in Model Comparison
and Assessment
Bayes Factors
Recall: Bayesians don’t use frequentist p-values.
What can we use instead?
Earlier, we used posterior probabilities of one-sided hypotheses
(some of which happened to equal p-values in some
noninformative cases).
But we would like a more general approach ...
1
Simple-vs-Simple Case
Consider data y which may follow one of two different models,
M0 and M1
Assume each of these models fully specifies a distribution
for y, and the distributions have densities
p(y |M0) and p(y |M1)
Then
H0 : M0 is true H1 : M1 is true
are two simple hypotheses.
2
Let the models have prior probabilities
P(M0) (= P(H0)) P(M1) (= P(H1))
We assume
P(M0) > 0 and P(M1) > 0
but it is not necessary that they sum to 1.
The prior odds in favor of M1 are
P(M1)
P(M0)
3
By Bayes’ rule,
P(M0 | y) ∝ P(M0) p(y |M0)
P(M1 | y) ∝ P(M1) p(y |M1)
with the same normalizing constant (p(y)) in both cases.
The posterior odds in favor of M1 are
P(M1 | y)
P(M0 | y) =
P(M1) p(y |M1)
P(M0) p(y |M0)
= prior odds × p(y |M1)
p(y |M0)
4
The Bayes factor in favor of M1 versus M0 is
BF1,0 =
posterior odds
prior odds
=
p(y |M1)
p(y |M0)
Interpretation: BF1,0 is the factor by which the “odds” of M1
(relative to M0) change due to the data.
So, for example,
I BF1,0 ≈ 1 means that the data do not distinguish
between the models very well
I BF1,0 1 means that the data strongly support M1
over M0
5
The Bayes factor in favor of M1 versus M0 is
BF1,0 =
posterior odds
prior odds
=
p(y |M1)
p(y |M0)
Interpretation: BF1,0 is the factor by which the “odds” of M1
(relative to M0) change due to the data.
So, for example,
I BF1,0 ≈ 1 means that the data do not distinguish
between the models very well
I BF1,0 1 means that the data strongly support M1
over M0
5
Notice: In this simple-vs-simple case, the Bayes factor BF1,0
I equals the likelihood ratio
L(M1 ; y)
L(M0 ; y)
I does not depend on the prior — it is the same for any
valid values of P(M0) and P(M1)
6
Example: Does Waldo ride the bus?
M1 = does ride M0 = doesn’t ride
y =
{
1 if lives in an apartment
0 if not
Based on the class survey (our best guess),
p(y |M1) =
{
0.91, y = 1
0.09, y = 0
p(y |M0) =
{
0.81, y = 1
0.19, y = 0
7
If Waldo lives in an apartment (y = 1),
BF1,0 =
0.91
0.81
≈ 1.12
and, if not (y = 0),
BF1,0 =
0.09
0.19
≈ 0.47
So living in an apartment increases the odds of riding the bus,
while not living in an apartment decreases them.
8
An Interpretation Scale
BF1,0 data evidence for M1 (H1) vs. M0 (H0)
1 to 3.2 Barely worth mentioning
3.2 to 10 Substantial
10 to 100 Strong
> 100 Decisive
9
More General Case
Consider modeling data y.
Suppose models M0 and M1 have (unknown) parameters:
θ0 for M0 θ1 for M1
We will assume the models are “disjoint”: They don’t share
any distributions for y.
10
Suppose the models have (conditional) priors
p(θ0 |M0) p(θ1 |M1)
Then
p(y |M0) =
∫
p(y, θ0 |M0) dθ0
=
∫
p(θ0 |M0)︸ ︷︷ ︸
prior
p(y | θ0,M0)︸ ︷︷ ︸
M0 model
dθ0
and similarly
p(y |M1) =
∫
p(θ1 |M1) p(y | θ1,M1) dθ1
These are the marginal likelihoods of M0 and M1 (under
their respective priors).
11
Suppose the models have (conditional) priors
p(θ0 |M0) p(θ1 |M1)
Then
p(y |M0) =
∫
p(y, θ0 |M0) dθ0
=
∫
p(θ0 |M0)︸ ︷︷ ︸
prior
p(y | θ0,M0)︸ ︷︷ ︸
M0 model
dθ0
and similarly
p(y |M1) =
∫
p(θ1 |M1) p(y | θ1,M1) dθ1
These are the marginal likelihoods of M0 and M1 (under
their respective priors).
11
The Bayes factor in favor of M1 versus M0 is
BF1,0 =
p(y |M1)
p(y |M0)
Notes:
I Unlike in the simple-vs-simple case, this Bayes factor does
depend on the priors — it is not purely a measure of the
evidence in the data.
I Both priors must be proper — otherwise, the Bayes factor
would depend on an arbitrary scaling.
[see p. 34 in section 2.3.3 of Marin, J. & Roberts, C. P.
(2007). Bayesian core: A practical approach to
computational Bayesian statistics, Springer, New York,
NY.]
12
Unfortunately, Bayes factors are generally difficult to compute,
requiring specialized methods.
But they can be easily computed for certain types of
hypothesis tests ...
13
For Hypothesis Testing
Consider a model with data densities p(y | θ) and prior p(θ).
Consider testing
H0 : θ ∈ Θ0 H1 : θ ∈ Θ1
where Θ0 ∩Θ1 = ∅, and both have positive prior probability.
Regard this as a test of two data model/prior combinations:
M0 : p(y | θ), θ ∈ Θ0
with prior p(θ) restricted to Θ0
M1 : p(y | θ), θ ∈ Θ1
with prior p(θ) restricted to Θ1
14
Proposition
In this case, the Bayes factor in favor of M1 versus M0 is
BF1,0 =
P(H1 | y)
/
P(H0 | y)
P(H1)
/
P(H0)(
=
posterior odds
prior odds
)
We call this the Bayes factor in favor of H1 (versus H0).
15
Proof.
p(y |M1) =
∫
Θ1
p(θ)
P(H1)︸ ︷︷ ︸
p(θ) restricted to Θ1
p(y | θ)︸ ︷︷ ︸
M1 on Θ1
dθ
=
1
P(H1)
∫
Θ1
p(θ) p(y | θ) dθ
=
p(y)
P(H1)
∫
Θ1
p(θ | y) dθ = p(y) P(H1 | y)
P(H1)
and similarly
p(y |M0) = p(y) P(H0 | y)
P(H0)
so the result follows by taking the ratio.
16
Proof.
p(y |M1) =
∫
Θ1
p(θ)
P(H1)︸ ︷︷ ︸
p(θ) restricted to Θ1
p(y | θ)︸ ︷︷ ︸
M1 on Θ1
dθ
=
1
P(H1)
∫
Θ1
p(θ) p(y | θ) dθ
=
p(y)
P(H1)
∫
Θ1
p(θ | y) dθ = p(y) P(H1 | y)
P(H1)
and similarly
p(y |M0) = p(y) P(H0 | y)
P(H0)
so the result follows by taking the ratio.
16
Proof.
p(y |M1) =
∫
Θ1
p(θ)
P(H1)︸ ︷︷ ︸
p(θ) restricted to Θ1
p(y | θ)︸ ︷︷ ︸
M1 on Θ1
dθ
=
1
P(H1)
∫
Θ1
p(θ) p(y | θ) dθ
=
p(y)
P(H1)
∫
Θ1
p(θ | y) dθ
= p(y)
P(H1 | y)
P(H1)
and similarly
p(y |M0) = p(y) P(H0 | y)
P(H0)
so the result follows by taking the ratio.
16
Proof.
p(y |M1) =
∫
Θ1
p(θ)
P(H1)︸ ︷︷ ︸
p(θ) restricted to Θ1
p(y | θ)︸ ︷︷ ︸
M1 on Θ1
dθ
=
1
P(H1)
∫
Θ1
p(θ) p(y | θ) dθ
=
p(y)
P(H1)
∫
Θ1
p(θ | y) dθ = p(y) P(H1 | y)
P(H1)
and similarly
p(y |M0) = p(y) P(H0 | y)
P(H0)
so the result follows by taking the ratio.