spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: RFC: Supporting the Scala drop Method for Spark RDDs
Date Mon, 21 Jul 2014 15:27:10 GMT
Personally I'd find the method useful -- I've often had a .csv file with a
header row that I want to drop so filter it out, which touches all
partitions anyway.  I don't have any comments on the implementation quite
yet though.


On Mon, Jul 21, 2014 at 8:24 AM, Erik Erlandson <eje@redhat.com> wrote:

> A few weeks ago I submitted a PR for supporting rdd.drop(n), under
> SPARK-2315:
> https://issues.apache.org/jira/browse/SPARK-2315
>
> Supporting the drop method would make some operations convenient, however
> it forces computation of >= 1 partition of the parent RDD, and so it would
> behave like a "partial action" that returns an RDD as the result.
>
> I wrote up a discussion of these trade-offs here:
>
> http://erikerlandson.github.io/blog/2014/07/20/some-implications-of-supporting-the-scala-drop-method-for-spark-rdds/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message