spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Paris <>
Subject Re: Dataframe from 1.5G json (non JSONL)
Date Tue, 05 Jun 2018 20:55:49 GMT
IMO your json cannot be read in parallell at all  then spark only offers you
to play again with memory.

I d'say at one step it has to feet in both one executor and in the driver.
I d'try something like 20GB for both driver and executors and by using
dynamic amount of executor in order to then repartition that fat json.

2018-06-05 22:40 GMT+02:00 raksja <>:

> Yes I would say thats the first thing that i tried. thing is even though i
> provide more num executor and more memory to each, this process gets OOM in
> only one task which is stuck and unfinished.
> I dont think its splitting the load to other tasks.
> I had 11 blocks on that file i stored in hdfs and i got 11 partitions in my
> dataframe, when i did show(1), it spinned up 11 tasks, 10 passed quickly 1
> stuck and oom.
> Also i repartitioned to 1000 and that didnt help either.
> --
> Sent from:
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

View raw message