spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cesar Flores <>
Subject Re: SchemaRDD: SQL Queries vs Language Integrated Queries
Date Wed, 11 Mar 2015 14:05:16 GMT

Thanks for both answers. One final question. *This registerTempTable is not
an extra process that the SQL queries need to do that may decrease
performance over the language integrated method calls? *The thing is that I
am planning to use them in the current version of the ML Pipeline
transformers classes for feature extraction, and If I need to save the
input and maybe output SchemaRDD of the transform function in every
transformer, this may not very efficient.


On Tue, Mar 10, 2015 at 8:20 PM, Tobias Pfeiffer <> wrote:

> Hi,
> On Tue, Mar 10, 2015 at 2:13 PM, Cesar Flores <> wrote:
>> I am new to the SchemaRDD class, and I am trying to decide in using SQL
>> queries or Language Integrated Queries (
>> ).
>> Can someone tell me what is the main difference between the two
>> approaches, besides using different syntax? Are they interchangeable? Which
>> one has better performance?
> One difference is that the language integrated queries are method calls on
> the SchemaRDD you want to work on, which requires you have access to the
> object at hand. The SQL queries are passed to a method of the SQLContext
> and you have to call registerTempTable() on the SchemaRDD you want to use
> beforehand, which can basically happen at an arbitrary location of your
> program. (I don't know if I could express what I wanted to say.) That may
> have an influence on how you design your program and how the different
> parts work together.
> Tobias

Cesar Flores

View raw message