spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Boesch <java...@gmail.com>
Subject Re: [Spark 2.x Core] .collect() size limit
Date Sat, 28 Apr 2018 16:52:33 GMT
While it is certainly possible to use VM I have seen in a number of places
warnings that collect() results must be able to be fit in memory. I'm not
sure if that applies to *all" spark calculations: but in the very least
each of the specific collect()'s that are performed would need to be
verified.

And maybe *all *collects do require sufficient memory - would you like to
check the source code to see if there were disk backed collects actually
happening for some cases?

2018-04-28 9:48 GMT-07:00 Deepak Goel <deicool@gmail.com>:

> There is something as *virtual memory*
>
> On Sat, 28 Apr 2018, 21:19 Stephen Boesch, <javadba@gmail.com> wrote:
>
>> Do you have a machine with  terabytes of RAM?  afaik collect() requires
>> RAM - so that would be your limiting factor.
>>
>> 2018-04-28 8:41 GMT-07:00 klrmowse <klrmowse@gmail.com>:
>>
>>> i am currently trying to find a workaround for the Spark application i am
>>> working on so that it does not have to use .collect()
>>>
>>> but, for now, it is going to have to use .collect()
>>>
>>> what is the size limit (memory for the driver) of RDD file that
>>> .collect()
>>> can work with?
>>>
>>> i've been scouring google-search - S.O., blogs, etc, and everyone is
>>> cautioning about .collect(), but does not specify how huge is huge...
>>> are we
>>> talking about a few gigabytes? terabytes?? petabytes???
>>>
>>>
>>>
>>> thank you
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>

Mime
View raw message