spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: PLSA
Date Fri, 04 Jul 2014 15:48:20 GMT
Thanks for the pointer...

Looks like you are using EM algorithm for factorization which looks similar
to multiplicative update rules

Do you think using mllib ALS implicit feedback, you can scale the problem
further ?

We can handle L1, L2, equality and positivity constraints in ALS now...As
long as you can find the gradient and hessian from the KL divergence loss,
you can use that in place of gram matrix that is used in ALS right now

If you look in topic modeling work in Solr (Carrot is the package), they
use ALS to generate the topics...that algorithm looks like a simplified
version of what you are attempting here...

May be the EM algorithm for topic modeling is efficient than ALS but from
looking at it I don't see how...I see lot of broadcasts...while in implicit
feedback you need one broadcast of gram matrix...

On Fri, Jul 4, 2014 at 4:27 AM, Denis Turdakov <turdakov@ispras.ru> wrote:

> Hi, Deb.
>
> I don't quite understand the question. PLSA is an instance of matrix
> factorization problem.
>
> If you are asking about inference algorithm, we use EM-algorithm.
> Description of this approach is, for example, here:
> http://www.machinelearning.ru/wiki/images/1/1f/Voron14aist.pdf
>
>
> Best, Denis.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/PLSA-tp7170p7179.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message