spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Davidson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1438) Update RDD.sample() API to make seed parameter optional
Date Tue, 08 Apr 2014 17:01:39 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963173#comment-13963173
] 

Aaron Davidson commented on SPARK-1438:
---------------------------------------

PartitionwiseSampledRDD already has the seed as an optional argument, using System.nanoTime
as the default value. This seems reasonable, as Math.random() does the same thing (the first
time). System.nanoTime is also usually high enough resolution that collisions are unlikely.

Scala and probably Python can use default arguments, Java will need an overloaded method.

> Update RDD.sample() API to make seed parameter optional
> -------------------------------------------------------
>
>                 Key: SPARK-1438
>                 URL: https://issues.apache.org/jira/browse/SPARK-1438
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Matei Zaharia
>            Priority: Blocker
>              Labels: Starter
>             Fix For: 1.0.0
>
>
> When a seed is not given, it should pick one based on Math.random().
> This needs to be done in Java and Python as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message