spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <>
Subject [jira] [Commented] (SPARK-1405) parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib
Date Tue, 03 Feb 2015 20:29:34 GMT


Joseph K. Bradley commented on SPARK-1405:

Thanks everyone for all of your contributions, help and feedback!  The initial LDA implementation
has been merged, but there are many improvements which remain to be done.  I've put a list
of JIRAs here []

[~yuhao yang] +1 for online LDA.  (I made a JIRA for it.)

> parallel Latent Dirichlet Allocation (LDA) atop of spark in MLlib
> -----------------------------------------------------------------
>                 Key: SPARK-1405
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Xusen Yin
>            Assignee: Joseph K. Bradley
>            Priority: Critical
>              Labels: features
>             Fix For: 1.3.0
>         Attachments: performance_comparison.png
>   Original Estimate: 336h
>  Remaining Estimate: 336h
> Latent Dirichlet Allocation (a.k.a. LDA) is a topic model which extracts topics from
text corpus. Different with current machine learning algorithms in MLlib, instead of using
optimization algorithms such as gradient desent, LDA uses expectation algorithms such as Gibbs
> In this PR, I prepare a LDA implementation based on Gibbs sampling, with a wholeTextFiles
API (solved yet), a word segmentation (import from Lucene), and a Gibbs sampling core.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message