spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Wehner <step...@buckmaster.ca>
Subject Re: Reading TB of JSON file
Date Thu, 18 Jun 2020 16:54:56 GMT
It's an interesting problem. What is the structure of the file? One big
array? On hash with many key-value pairs?

Stephan

On Thu, Jun 18, 2020 at 6:12 AM Chetan Khatri <chetan.opensource@gmail.com>
wrote:

> Hi Spark Users,
>
> I have a 50GB of JSON file, I would like to read and persist at HDFS so it
> can be taken into next transformation. I am trying to read as
> spark.read.json(path) but this is giving Out of memory error on driver.
> Obviously, I can't afford having 50 GB on driver memory. In general, what
> is the best practice to read large JSON file like 50 GB?
>
> Thanks
>


-- 
Stephan Wehner, Ph.D.
The Buckmaster Institute, Inc.
2150 Adanac Street
Vancouver BC V5L 2E7
Canada
Cell (604) 767-7415
Fax (888) 808-4655

Sign up for our free email course
http://buckmaster.ca/small_business_website_mistakes.html

http://www.buckmaster.ca
http://answer4img.com
http://loggingit.com
http://clocklist.com
http://stephansmap.org
http://benchology.com
http://www.trafficlife.com
http://stephan.sugarmotor.org (Personal Blog)
@stephanwehner (Personal Account)
VA7WSK (Personal call sign)

Mime
View raw message