spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Is SparkSQL optimizer aware of the needed data after the query?
Date Mon, 02 Mar 2015 17:30:30 GMT
-dev +user

No, lambda functions and other code are black-boxes to Spark SQL.  If you
want those kinds of optimizations you need to express the columns required
in either SQL or the DataFrame DSL (coming in 1.3).

On Mon, Mar 2, 2015 at 1:55 AM, Wail <w.alkowaileet@cces-kacst-mit.org>
wrote:

> Dears,
>
> I'm just curious about the complexity of the query optimizer. Can the
> optimizer evaluates what after the SQL? maybe it's a stupid question ,, but
> here is an example to show the case:
>
> From the Spark SQL example:
> val teenagers = sqlContext.sql("SELECT * FROM people WHERE age >= 13 AND
> age
> <= 19")
>
> if(condition)
> {
>     teenagers.map(t => "Name: " + t(0)).collect().foreach(println)
> }
> else
> {
>     teenagers.map(t => "Age: " + t(1)).collect().foreach(println)
> }
>
> As for instance ... is the optimizer aware that I need only one column and
> pushes down the projection to bring only one  as needed?
>
> Thanks!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Is-SparkSQL-optimizer-aware-of-the-needed-data-after-the-query-tp10835.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
View raw message