spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: Reading TB of JSON file
Date Thu, 18 Jun 2020 13:16:36 GMT
Depends on the data types you use.

Do you have in jsonlines format? Then the amount of memory plays much less a role.

Otherwise if it is one large object or array I would not recommend it.

> Am 18.06.2020 um 15:12 schrieb Chetan Khatri <>:
> Hi Spark Users,
> I have a 50GB of JSON file, I would like to read and persist at HDFS so it can be taken
into next transformation. I am trying to read as but this is giving
Out of memory error on driver. Obviously, I can't afford having 50 GB on driver memory. In
general, what is the best practice to read large JSON file like 50 GB?
> Thanks

To unsubscribe e-mail:

View raw message