Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CS 475/675 Machine Learning: Homework 6
Graphical Models and Inference
1 Introduction
In this assignment you will be exploring a technique for natural language understanding
known as Topic Modeling. You will be implementing a Variational Inference method that
uses the Mean Field approximation for learning the parameters of the probabilistic LDA
graphical model.
2 Data
We will be using State of the Union (SotU) speeches given by US presidents1
. SotU addresses have been delivered annually by the president since 1790. The purpose of these
speeches is to summarize the affairs of the US federal government. Typically, the SotU
addresses given by a newly inaugurated president contain a different tone. Your task in
this assignment is to uncover topics important to US federal affairs over the years.
The corpus contains D = 232 documents, comprising years 1790-2020 (there were two
SotU addresses in 1961 by Eisenhower and Kennedy). Each document d has a length
Nd on the order of 103 words. The overall corpus vocabulary contains approximately 104
words. Like in homework 4, we have pre-processed the data for you using standard techniques from NLP, including tokenization, converting all words to lowercase, and removing
stopwords2
, punctuation, and numbers.
3 Topic Modeling
Natural language is complex; modeling and extracting information from it requires breaking down language layer by layer. In linguistics, semantics is the study of meaning of words
and sentences4
. At the document level, one of the most useful techniques for understanding text is analyzing its topics. In this model, a topic is just a collection of words and an
observed document is a mixture of topics. The key idea of the topic model is that the
semantic meaning of a document is influenced by these latent topics.
3.1 Latent Dirichlet Allocation
Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for uncovering latent topics. LDA is a generative probabilistic model of a corpus (collection of
text)56. In the LDA model, documents are represented as random mixtures over latent
topics, whereas topics are characterized as distributions over words.
The Dirichlet distribution7
is a member of the exponential family of distributions, whose
conjugate prior is the multinomial distribution. This means that when computing the
MLE or MAP, we can simplify our expressions to the addition of sufficient statistics. Additionally, we have two sparsity requirements for the topic model: (1) document sparsity
and (2) word sparsity. Each document d should contain only a small number of topics
and each topic k should be determined by a small subset of words in the total vocabulary.
Dirichlet distributions encode these sparsity requirements. The Dirichlet distribution is
paramaterized by a vector of concentration parameters α. As α → 0, the multinomials
drawn from the Dirichlet tend to be more sparse.8
3.2 Inference
In order to use LDA, we must learn the latent parameters z, θ, and φ using inference. To
do this, we must compute the posterior distribution of the latent variables Z, Θ, and Π
given the the model parameters α and β, and observed data W:
P(Z, Θ, Φ|W, α, β) = P(W, Z, Θ, Φ|α, β)
P(W|α, β)
(2)
Unfortunately, this posterior is intractable to compute in general. To normalize the
distribution, we must marginalize over the latent variables. Specifically, examining the
form of P(W|α, β) we have:
P(W|α, β) = Z
Φ
Z
Θ
X
Z
P(W, Z, Θ, Φ|α, β)dΘdΦ (3)
=
Z
Φ
P(Φ|β)
Z
Θ
p(Θ|α)
X
Z
P(Z|Θ)P(W|Z, Φ)dΘdΦ (4)
Computing the posterior exactly is intractable because of the coupling between Θ and Φ in
the summation over latent topic assignments. In other words, we cannot make statements
about maximizing a particular term since all the latent variables are coupled together.
Since we cannot compute this posterior exactly, we turn to approximate inference methods. Specifically, in this assignment you will be using variational inference to approximate
the posterior distribution.
3.3 Mean Field Variational Inference
In Variational Inference, we design a family of tractable distributions Q, which have
parameters that can be tuned to approximate the true posterior. To obtain Q, we will
use the Mean Field approximation, which approximates the complex p(Z, Θ, Φ|W, α, β)
with a fully factorized approximation.