spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nihed mbarek <nihe...@gmail.com>
Subject Re: Reading TB of JSON file
Date Thu, 18 Jun 2020 13:19:15 GMT
Hi,

What is the size of one json document ?

There is also the scan of your json to define the schema, the overhead can
be huge.
2 solution:
define a schema and use directly during the load or ask spark to analyse a
small part of the json file (I don't remember how to do it)

Regards,


On Thu, Jun 18, 2020 at 3:12 PM Chetan Khatri <chetan.opensource@gmail.com>
wrote:

> Hi Spark Users,
>
> I have a 50GB of JSON file, I would like to read and persist at HDFS so it
> can be taken into next transformation. I am trying to read as
> spark.read.json(path) but this is giving Out of memory error on driver.
> Obviously, I can't afford having 50 GB on driver memory. In general, what
> is the best practice to read large JSON file like 50 GB?
>
> Thanks
>


-- 

M'BAREK Med Nihed,
Fedora Ambassador, TUNISIA, Northern Africa
http://www.nihed.com

<http://tn.linkedin.com/in/nihed>

Mime
View raw message