spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cesar Flores <ces...@gmail.com>
Subject Re: SchemaRDD: SQL Queries vs Language Integrated Queries
Date Wed, 11 Mar 2015 14:05:16 GMT
Hi:

Thanks for both answers. One final question. *This registerTempTable is not
an extra process that the SQL queries need to do that may decrease
performance over the language integrated method calls? *The thing is that I
am planning to use them in the current version of the ML Pipeline
transformers classes for feature extraction, and If I need to save the
input and maybe output SchemaRDD of the transform function in every
transformer, this may not very efficient.


Thanks

On Tue, Mar 10, 2015 at 8:20 PM, Tobias Pfeiffer <tgp@preferred.jp> wrote:

> Hi,
>
> On Tue, Mar 10, 2015 at 2:13 PM, Cesar Flores <cesar7@gmail.com> wrote:
>
>> I am new to the SchemaRDD class, and I am trying to decide in using SQL
>> queries or Language Integrated Queries (
>> https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD
>> ).
>>
>> Can someone tell me what is the main difference between the two
>> approaches, besides using different syntax? Are they interchangeable? Which
>> one has better performance?
>>
>
> One difference is that the language integrated queries are method calls on
> the SchemaRDD you want to work on, which requires you have access to the
> object at hand. The SQL queries are passed to a method of the SQLContext
> and you have to call registerTempTable() on the SchemaRDD you want to use
> beforehand, which can basically happen at an arbitrary location of your
> program. (I don't know if I could express what I wanted to say.) That may
> have an influence on how you design your program and how the different
> parts work together.
>
> Tobias
>



-- 
Cesar Flores

Mime
View raw message