spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Reading TB of JSON file
Date Thu, 18 Jun 2020 13:11:45 GMT
Hi Spark Users,

I have a 50GB of JSON file, I would like to read and persist at HDFS so it
can be taken into next transformation. I am trying to read as
spark.read.json(path) but this is giving Out of memory error on driver.
Obviously, I can't afford having 50 GB on driver memory. In general, what
is the best practice to read large JSON file like 50 GB?

Thanks

Mime
View raw message