spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harit Vishwakarma <harit.vishwaka...@gmail.com>
Subject Re: Spark APIs memory usage?
Date Sat, 18 Jul 2015 08:57:23 GMT
Even if I remove numpy calls. (no matrices loaded), Same exception is
coming.
Can anyone tell what createDataFrame does internally? Are there any
alternatives for it?

On Fri, Jul 17, 2015 at 6:43 PM, Akhil Das <akhil@sigmoidanalytics.com>
wrote:

> I suspect its the numpy filling up Memory.
>
> Thanks
> Best Regards
>
> On Fri, Jul 17, 2015 at 5:46 PM, Harit Vishwakarma <
> harit.vishwakarma@gmail.com> wrote:
>
>> 1. load 3 matrices of size ~ 10000 X 10000 using numpy.
>> 2. rdd2 = rdd1.values().flatMap( fun )  # rdd1 has roughly 10^7 tuples
>> 3. df = sqlCtx.createDataFrame(rdd2)
>> 4. df.save() # in parquet format
>>
>> It throws exception in createDataFrame() call. I don't know what exactly
>> it is creating ? everything in memory? or can I make it to persist
>> simultaneously while getting created.
>>
>> Thanks
>>
>>
>> On Fri, Jul 17, 2015 at 5:16 PM, Akhil Das <akhil@sigmoidanalytics.com>
>> wrote:
>>
>>> Can you paste the code? How much memory does your system have and how
>>> big is your dataset? Did you try df.persist(StorageLevel.MEMORY_AND_DISK)?
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Fri, Jul 17, 2015 at 5:14 PM, Harit Vishwakarma <
>>> harit.vishwakarma@gmail.com> wrote:
>>>
>>>> Thanks,
>>>> Code is running on a single machine.
>>>> And it still doesn't answer my question.
>>>>
>>>> On Fri, Jul 17, 2015 at 4:52 PM, ayan guha <guha.ayan@gmail.com> wrote:
>>>>
>>>>> You can bump up number of partitions while creating the rdd you are
>>>>> using for df
>>>>> On 17 Jul 2015 21:03, "Harit Vishwakarma" <harit.vishwakarma@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I used createDataFrame API of SqlContext in python. and getting
>>>>>> OutOfMemoryException. I am wondering if it is creating whole dataFrame
in
>>>>>> memory?
>>>>>> I did not find any documentation describing memory usage of Spark
>>>>>> APIs.
>>>>>> Documentation given is nice but little more details (specially on
>>>>>> memory usage/ data distribution etc.) will really help.
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>> Harit Vishwakarma
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Harit Vishwakarma
>>>>
>>>>
>>>
>>
>>
>> --
>> Regards
>> Harit Vishwakarma
>>
>>
>


-- 
Regards
Harit Vishwakarma

Mime
View raw message