spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay <jayadeep.jayara...@gmail.com>
Subject Re: Dataframe from 1.5G json (non JSONL)
Date Wed, 06 Jun 2018 14:28:24 GMT
I might have missed it but can you tell if the OOM is happening in driver
or executor ? Also it would be good if you can post the actual exception.

On Tue 5 Jun, 2018, 1:55 PM Nicolas Paris, <niparisco@gmail.com> wrote:

> IMO your json cannot be read in parallell at all  then spark only offers
> you
> to play again with memory.
>
> I d'say at one step it has to feet in both one executor and in the driver.
> I d'try something like 20GB for both driver and executors and by using
> dynamic amount of executor in order to then repartition that fat json.
>
>
>
>
> 2018-06-05 22:40 GMT+02:00 raksja <shanmugkraja@gmail.com>:
>
>> Yes I would say thats the first thing that i tried. thing is even though i
>> provide more num executor and more memory to each, this process gets OOM
>> in
>> only one task which is stuck and unfinished.
>>
>> I dont think its splitting the load to other tasks.
>>
>> I had 11 blocks on that file i stored in hdfs and i got 11 partitions in
>> my
>> dataframe, when i did show(1), it spinned up 11 tasks, 10 passed quickly 1
>> stuck and oom.
>>
>> Also i repartitioned to 1000 and that didnt help either.
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>

Mime
View raw message