spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Derrick Burns (JIRA)" <>
Subject [jira] [Created] (SPARK-5405) Spark clusterer should support high dimensional data
Date Mon, 26 Jan 2015 06:29:34 GMT
Derrick Burns created SPARK-5405:

             Summary: Spark clusterer should support high dimensional data
                 Key: SPARK-5405
             Project: Spark
          Issue Type: New Feature
          Components: MLlib
    Affects Versions: 1.2.0
            Reporter: Derrick Burns

The MLLIB clusterer works well for low  (<200) dimensional data.  However, performance
is linear with the number of dimensions.  So, for practical purposes, it is not very useful
for high dimensional data.  

Depending on the data type, one can embed the high dimensional data into lower dimensional
spaces in a distance-preserving way.  The Spark clusterer should support such embedding.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message