spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: groupByKey() and keys with many values
Date Mon, 07 Sep 2015 08:31:17 GMT
That's how it's intended to work; if it's a problem, you probably need
to re-design your computation to not use groupByKey. Usually you can
do so.

On Mon, Sep 7, 2015 at 9:02 AM, kaklakariada <christoph.pirkl@gmail.com> wrote:
> Hi,
>
> I already posted this question on the users mailing list
> (http://apache-spark-user-list.1001560.n3.nabble.com/Using-groupByKey-with-many-values-per-key-td24538.html)
> but did not get a reply. Maybe this is the correct forum to ask.
>
> My problem is, that doing groupByKey().mapToPair() loads all values for a
> key into memory which is a problem when the values don't fit into memory.
> This was not a problem with Hadoop map/reduce, as the Iterable passed to the
> reducer read from disk.
>
> In Spark, the Iterable passed to mapToPair() is backed by a CompactBuffer
> containing all values.
>
> Is it possible to change this behavior without modifying Spark, or is there
> a plan to change this?
>
> Thank you very much for your help!
> Christoph.
>
>
>
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/groupByKey-and-keys-with-many-values-tp13985.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message