spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liquan Pei (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-2510) word2vec: Distributed Representation of Words
Date Fri, 01 Aug 2014 15:56:39 GMT

     [ https://issues.apache.org/jira/browse/SPARK-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liquan Pei updated SPARK-2510:
------------------------------

    Description: We would like to add parallel implementation of word2vec to MLlib. word2vec
finds distributed representation of words through training of large data sets. We will focus
on skip-gram model and hierarchical softmax in our initial implementation.   (was: We would
like to add parallel implementation of word2vec to MLlib. word2vec finds distributed representation
of words through training of large data sets. The Spark programming model fits nicely with
word2vec as the training algorithm of word2vec is embarrassingly parallel. We will focus on
skip-gram model and negative sampling in our initial implementation. )

> word2vec: Distributed Representation of Words
> ---------------------------------------------
>
>                 Key: SPARK-2510
>                 URL: https://issues.apache.org/jira/browse/SPARK-2510
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Liquan Pei
>            Assignee: Liquan Pei
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> We would like to add parallel implementation of word2vec to MLlib. word2vec finds distributed
representation of words through training of large data sets. We will focus on skip-gram model
and hierarchical softmax in our initial implementation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message