spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <>
Subject [jira] [Commented] (SPARK-4417) New API: sample RDD to fixed number of items
Date Wed, 17 Dec 2014 18:24:13 GMT


Apache Spark commented on SPARK-4417:

User 'ilganeli' has created a pull request for this issue:

> New API: sample RDD to fixed number of items
> --------------------------------------------
>                 Key: SPARK-4417
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark, Spark Core
>            Reporter: Davies Liu
> Sometimes, we just want to a fixed number of items randomly selected from an RDD, for
example, before sort an RDD we need to gather a fixed number of keys from each partitions.
> In order to do this, we need to two pass on the RDD, get the total number, then calculate
the right ratio for sampling. In fact, we could do this in one pass.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message