Inspecting a model: Overview

The first thing to understand about a topic model is that all words are in all topics, just as all topics are theoretically in all documents. In other words, some words are much more likely to be associated with specific topics, but they have a miniscule probability of appearing in any topic. This is the flexibility of a topic model — it is not deterministic but probabilistic. You might never have used the word “abode” when talking about “driving”…but you might. Topic models are based on the idea that linguistic systems are predictable, but not fixed.

The same goes for the relationship between topics and documents. Every topic has some chance of being in every document, even if this probability is in most cases tiny. It could be there, but the model thinks it is incredibly unlikely.

The second important point about a topic model is that words are not apportioned to a single topic. As you’ll see the word “man” might be associated with “work” and “money” or with “horses” or with “God,” all distinct topics in our model. This is another affordance of topic modeling — it captures the polysemous nature of language. Words can mean different things or be used in very different contexts. Topic models are good at capturing those differences.

In order to extract the probabilities of topics to documents or words to topics, first run:

probabilities<-posterior(topicmodel)

Now we can move to inspecting different aspects of the model.

Share this:

Leave a comment Cancel reply