spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abel Coronado Iruegas <acoronadoirue...@gmail.com>
Subject Re: SQL FIlter of tweets (json) running on Disk
Date Fri, 04 Jul 2014 14:56:57 GMT
Ok i find this slides of Yin Huai (
http://spark-summit.org/wp-content/uploads/2014/07/Easy-json-Data-Manipulation-Yin-Huai.pdf
)

to read a Json file the code seem pretty simple :

sqlContext.jsonFile("data.json")  <---- Is this already available in the
master branch???

But the question about the use a combination of resources (Memory
processing & Disk processing) still remains.

Thanks !!



On Fri, Jul 4, 2014 at 9:49 AM, Abel Coronado Iruegas <
acoronadoiruegas@gmail.com> wrote:

> Hi everybody
>
> Someone can tell me if it is possible to read and filter a 60 GB file of
> tweets (Json Docs) in a Standalone Spark Deployment that runs in a single
> machine with 40 Gb RAM and 8 cores???
>
> I mean, is it possible to configure Spark to work with some amount of
> memory (20 GB) and the rest of the process in Disk, and avoid OutOfMemory
> exceptions????
>
> Regards
>
> Abel
>

Mime
View raw message