spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Wampler <deanwamp...@gmail.com>
Subject Re: Spark distributed SQL: JSON Data set on all worker node
Date Sun, 03 May 2015 14:20:50 GMT
Note that each JSON object has to be on a single line in the files.

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Sun, May 3, 2015 at 4:14 AM, ayan guha <guha.ayan@gmail.com> wrote:

> Yes it is possible. You need to use jsonfile method on SQL context and
> then create a dataframe from the rdd. Then register it as a table. Should
> be 3 lines of code, thanks to spark.
>
> You may see few YouTube video esp for unifying pipelines.
> On 3 May 2015 19:02, "Jai" <jai4love@gmail.com> wrote:
>
>> Hi,
>>
>> I am noob to spark and related technology.
>>
>> i have JSON stored at same location on all worker clients spark cluster).
>> I am looking to load JSON data set on these clients and do SQL query, like
>> distributed SQL.
>>
>> is it possible to achieve?
>>
>> right now, master submits task to one node only.
>>
>> Thanks and regards
>> Mrityunjay
>>
>

Mime
View raw message