spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1357) [MLLIB] Annotate developer and experimental API's
Date Wed, 09 Apr 2014 17:16:15 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964405#comment-13964405
] 

Xiangrui Meng commented on SPARK-1357:
--------------------------------------

Hi Sean, 

Actually, you came in just in time. This was only the first pass, and we are still accepting
API visibility/annotation patches during the QA period. MLlib is still a beta component of
Spark, so "1.0" doesn't mean it is stable. And we still accept additions (JIRA submitted before
April 1) to MLlib, as Patrick announced in the dev mailing list.

(I do want to mark all of MLlib experimental to reserve the right to change in the future,
but we need to find a balance point here.)

I agree that it is future-proof to switch id type from Int to Long in ALS. The extra storage
requirement is 8 bytes per rating. Inside ALS, we also re-partition the ratings, which needs
extra storage. We need to consider whether we want to switch to Long completely or provide
an option to use Long ids. Could you submit a patch, either marking ALS experimental or allowing
using Long ids?

I don't think String type is necessary because we can alway creates a map between String ids
and Long ids. A String id usually costs more than a Long id. For the same reason, classification
uses Double for labels.

Please submit a patch for APIs you don't feel comfortable to say "stable" or marked "experimental/developer"
by me but you think the other way. It would be great to keep the discussion going. Thanks!

Best,
Xiangrui

> [MLLIB] Annotate developer and experimental API's
> -------------------------------------------------
>
>                 Key: SPARK-1357
>                 URL: https://issues.apache.org/jira/browse/SPARK-1357
>             Project: Spark
>          Issue Type: Sub-task
>          Components: MLlib
>    Affects Versions: 1.0.0
>            Reporter: Patrick Wendell
>            Assignee: Xiangrui Meng
>             Fix For: 1.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message