spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: Possible to push sub-queries down into the DataSource impl?
Date Thu, 28 Jul 2016 07:07:21 GMT
Hi,

Have you seen this ticket?
https://issues.apache.org/jira/browse/SPARK-12449

// maropu

On Thu, Jul 28, 2016 at 2:13 AM, Timothy Potter <thelabdude@gmail.com>
wrote:

> I'm not looking for a one-off solution for a specific query that can
> be solved on the client side as you suggest, but rather a generic
> solution that can be implemented within the DataSource impl itself
> when it knows a sub-query can be pushed down into the engine. In other
> words, I'd like to intercept the query planning process to be able to
> push-down computation into the engine when it makes sense.
>
> On Wed, Jul 27, 2016 at 8:04 AM, Marco Colombo
> <ing.marco.colombo@gmail.com> wrote:
> > Why don't you create a dataframe filtered, map it as temporary table and
> > then use it in your query? You can also cache it, of multiple queries on
> the
> > same inner queries are requested.
> >
> >
> > Il mercoledì 27 luglio 2016, Timothy Potter <thelabdude@gmail.com> ha
> > scritto:
> >>
> >> Take this simple join:
> >>
> >> SELECT m.title as title, solr.aggCount as aggCount FROM movies m INNER
> >> JOIN (SELECT movie_id, COUNT(*) as aggCount FROM ratings WHERE rating
> >> >= 4 GROUP BY movie_id ORDER BY aggCount desc LIMIT 10) as solr ON
> >> solr.movie_id = m.movie_id ORDER BY aggCount DESC
> >>
> >> I would like the ability to push the inner sub-query aliased as "solr"
> >> down into the data source engine, in this case Solr as it will
> >> greatlly reduce the amount of data that has to be transferred from
> >> Solr into Spark. I would imagine this issue comes up frequently if the
> >> underlying engine is a JDBC data source as well ...
> >>
> >> Is this possible? Of course, my example is a bit cherry-picked so
> >> determining if a sub-query can be pushed down into the data source
> >> engine is probably not a trivial task, but I'm wondering if Spark has
> >> the hooks to allow me to try ;-)
> >>
> >> Cheers,
> >> Tim
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >>
> >
> >
> > --
> > Ing. Marco Colombo
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro

Mime
View raw message