spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3369) Java mapPartitions Iterator->Iterable is inconsistent with Scala's Iterator->Iterator
Date Mon, 21 Dec 2015 10:47:46 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066310#comment-15066310
] 

Apache Spark commented on SPARK-3369:
-------------------------------------

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/10413

> Java mapPartitions Iterator->Iterable is inconsistent with Scala's Iterator->Iterator
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-3369
>                 URL: https://issues.apache.org/jira/browse/SPARK-3369
>             Project: Spark
>          Issue Type: Improvement
>          Components: Java API
>    Affects Versions: 1.0.2, 1.2.1
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>              Labels: breaking_change, releasenotes
>         Attachments: FlatMapIterator.patch
>
>
> {{mapPartitions}} in the Scala RDD API takes a function that transforms an {{Iterator}}
to an {{Iterator}}: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
> In the Java RDD API, the equivalent is a FlatMapFunction, which operates on an {{Iterator}}
but is requires to return an {{Iterable}}, which is a stronger condition and appears inconsistent.
It's a problematic inconsistent though because this seems to require copying all of the input
into memory in order to create an object that can be iterated many times, since the input
does not afford this itself.
> Similarity for other {{mapPartitions*}} methods and other {{*FlatMapFunctions}}s in Java.
> (Is there a reason for this difference that I'm overlooking?)
> If I'm right that this was inadvertent inconsistency, then the big issue here is that
of course this is part of a public API. Workarounds I can think of:
> Promise that Spark will only call {{iterator()}} once, so implementors can use a hacky
{{IteratorIterable}} that returns the same {{Iterator}}.
> Or, make a series of methods accepting a {{FlatMapFunction2}}, etc. with the desired
signature, and deprecate existing ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message