spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Re: Reading TB of JSON file
Date Fri, 19 Jun 2020 12:41:56 GMT
Thanks, you meant in a for loop. could you please put pseudocode in spark

On Fri, Jun 19, 2020 at 8:39 AM Jörn Franke <jornfranke@gmail.com> wrote:

> Make every json object a line and then read t as jsonline not as multiline
>
> Am 19.06.2020 um 14:37 schrieb Chetan Khatri <chetan.opensource@gmail.com
> >:
>
> 
> All transactions in JSON, It is not a single array.
>
> On Thu, Jun 18, 2020 at 12:55 PM Stephan Wehner <stephan@buckmaster.ca>
> wrote:
>
>> It's an interesting problem. What is the structure of the file? One big
>> array? On hash with many key-value pairs?
>>
>> Stephan
>>
>> On Thu, Jun 18, 2020 at 6:12 AM Chetan Khatri <
>> chetan.opensource@gmail.com> wrote:
>>
>>> Hi Spark Users,
>>>
>>> I have a 50GB of JSON file, I would like to read and persist at HDFS so
>>> it can be taken into next transformation. I am trying to read as
>>> spark.read.json(path) but this is giving Out of memory error on driver.
>>> Obviously, I can't afford having 50 GB on driver memory. In general, what
>>> is the best practice to read large JSON file like 50 GB?
>>>
>>> Thanks
>>>
>>
>>
>> --
>> Stephan Wehner, Ph.D.
>> The Buckmaster Institute, Inc.
>> 2150 Adanac Street
>> Vancouver BC V5L 2E7
>> Canada
>> Cell (604) 767-7415
>> Fax (888) 808-4655
>>
>> Sign up for our free email course
>> http://buckmaster.ca/small_business_website_mistakes.html
>>
>> http://www.buckmaster.ca
>> http://answer4img.com
>> http://loggingit.com
>> http://clocklist.com
>> http://stephansmap.org
>> http://benchology.com
>> http://www.trafficlife.com
>> http://stephan.sugarmotor.org (Personal Blog)
>> @stephanwehner (Personal Account)
>> VA7WSK (Personal call sign)
>>
>

Mime
View raw message