What does the "latent" mean in LSI?

Latent Dirichlet Allocation (LDA): What does it mean by "Generating a word from a multinomial distribution conditioned on the topic"?

  • In the topic model literature, I often see the statement "Generating a word from a multinomial distribution conditioned on the topic" or "Generating a topic from a multinomial distribution conditioned on the document". What does it mean by generating a single variable from a multinomial distribution? To my understanding, you can only generate vector of variables from a multinomial distribution. If you have only a single variable, the multinomial distribution degenerates into the Bernoulli distribution.

  • Answer:

    A categorical distribution defines the probability of different set of discrete outcomes. For example if we have four possible words {w1,w2,w3,w4} then a multinomial distribution will tell you the prob. of occurrence of each word. p(w1)≡Pr{ W=w1 }, where W is a random word. The multinomial distribution corresponds to repeated draws from a categorical distribution. To my understanding, you can only generate vector of variables from a multinomial distribution. You are right, but in this case they are thinking about the degenerate case where you draw a single trial from a categorical distr. If you have only a single variable, the multinomial distribution degenerates into the Bernoulli distribution. You may be confusing things. Let me clarify. If the R.V. has two outcomes and single trial => Bernoulli two outcomes + multiple trials => Binomial (counts of outcomes) multiple outcomes, single trial => Categorical multiple outcomes, multiple trials => Multinomial

Ivan Savov at Quora Visit the source

Was this solution helpful to you?

Other answers

I have a detailed blog post: http://saravananthirumuruganathan.wordpress.com/2012/01/10/detecting-mixtures-of-genres-in-movie-dialogues/ describing my experiments with LDA and also my understanding of it. In the case of http://en.wikipedia.org/wiki/Maximum_likelihood given n data points, we assume the underlying distribution that generated this data is a Gaussian and fit the data to the best http://en.wikipedia.org/wiki/Gaussian_function possible. LDA also makes a similar assumption that there is a hidden  structure to the data. And that hidden structure is a multinomial whose  parameter [math]\theta [/math] comes from a http://en.wikipedia.org/wiki/Dirichlet_distribution Prior. Let us say that I want to generate a random document;  I don’t care if its meaningful or not. I first fix the number of words I  would want to generate in that document. I can on the other hand draw a  random number from say a Poisson. Once I have the number of words (N) to be generated, I go ahead to generate those many words from the corpus. Each word is generated thus: draw a [math]\theta [/math] from a http://en.wikipedia.org/wiki/Dirichlet_distribution. (Dirichlet is a distribution over the simplex.) Consider [math]\alpha [/math]  as the parameter that decides the shape of the Dirichlet similar to how  mean and variance decide the shape of the Gaussian bell curve. In a 3-D  space, for some choice of [math]\alpha [/math], consider the probability to be more near (1,0,0), (0,1,0) and (0,0,1). For some other choice of [math]\alpha [/math] all points in the 3-D simplex might get the same probability! This  represents what kind of topic mixtures I can generally expect. (If my  initial guess is that each document has only one topic, mostly I will  choose an [math]\alpha [/math] such that I get more probability on the (1,0,0) points. This is just a prior which could be wrong! And in this way it is not strictly analogous to Maximum Likelihood). So I have an [math]\alpha [/math] now and I draw a sample from the Dirichlet. What I actually get is a vector that sums up to 1. I call this [math]\theta [/math].   Remember that I’m trying to generate a random document and I haven’t generated a single word yet! The [math]\theta [/math] I have is my guess on the topic vector! I have obtained this [math]\theta [/math] by sampling from a k-dimensional vector (here k=3 in the above example.) Now that [math]\theta [/math] represents a topic vector which can also be re-imagined as a  probability distribution and because any draw is guaranteed to be from  the simplex, I can use this drawn vector ([math]\theta [/math])  as the weights of a loaded ‘k’ faced die. And I throw this die! Lets  say it shows up 5 (a number between 1 and k). I will now say that the  word I’m going to generate belongs to Topic-5. I have not yet generated a word! To generate a word  that belongs to a topic, I need a |V| faced die. |V| is the size of the  vocabulary of the corpus. How do I get such a huge die?! I will get that in a similar way as for the topic  vector. I will sample again from a Dirichlet – but a different Dirichlet  – one that is over a v-dimensional simplex. Any draw from this  Dirichlet will give a v-faced die. Call this the Dirichlet [math]\beta [/math]. For each topic ([math]\theta [/math]) you need a different v-faced die ([math]\beta [/math]). Thus I end up drawing ‘k’ such ‘v’-faced dice. So for topic-5, I throw the 5th v-faced die. Let us  say it shows 42; I then go to the vocabulary and pick the 42nd word!  I  will do this whole process ‘N’ times (N was the number of words to be  generated for the random document.) The crux of this discussion is this: for every document , the dice (i.e samples from the dirichlet([math]\alpha [/math]) and dirichelt([math]\beta [/math])  ) are generated only once. It is just that to generate each word, the  dice are rolled multiple times. Once to get a topic and once to get a  word given this topic!

Kripa Chettiar

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.