spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Paris <nipari...@gmail.com>
Subject Re: Dataframe from 1.5G json (non JSONL)
Date Tue, 05 Jun 2018 20:55:49 GMT
IMO your json cannot be read in parallell at all  then spark only offers you
to play again with memory.

I d'say at one step it has to feet in both one executor and in the driver.
I d'try something like 20GB for both driver and executors and by using
dynamic amount of executor in order to then repartition that fat json.




2018-06-05 22:40 GMT+02:00 raksja <shanmugkraja@gmail.com>:

> Yes I would say thats the first thing that i tried. thing is even though i
> provide more num executor and more memory to each, this process gets OOM in
> only one task which is stuck and unfinished.
>
> I dont think its splitting the load to other tasks.
>
> I had 11 blocks on that file i stored in hdfs and i got 11 partitions in my
> dataframe, when i did show(1), it spinned up 11 tasks, 10 passed quickly 1
> stuck and oom.
>
> Also i repartitioned to 1000 and that didnt help either.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message