phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-4666) Add a subquery cache that persists beyond the life of a query
Date Tue, 17 Apr 2018 21:54:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441554#comment-16441554
] 

ASF GitHub Bot commented on PHOENIX-4666:
-----------------------------------------

GitHub user ortutay opened a pull request:

    https://github.com/apache/phoenix/pull/298

    PHOENIX-4666 Persistent subquery cache for hash joins

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ortutay/phoenix PHOENIX-4666-subquery-cache

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/298.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #298
    
----
commit 77460c37697b2a112cf6ed345356a16da08dd51c
Author: Marcell Ortutay <mortutay@...>
Date:   2018-03-29T19:59:03Z

    PHOENIX-4666 Persistent subquery cache for hash joins

commit d1fc310e3d0df772c0aeb1673d2b64d01f495d27
Author: Marcell Ortutay <mortutay@...>
Date:   2018-04-17T20:51:00Z

    PHOENIX-4666 Add tests for TenantCacheTest

----


> Add a subquery cache that persists beyond the life of a query
> -------------------------------------------------------------
>
>                 Key: PHOENIX-4666
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4666
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Marcell Ortutay
>            Assignee: Marcell Ortutay
>            Priority: Major
>
> The user list thread for additional context is here: [https://lists.apache.org/thread.html/e62a6f5d79bdf7cd238ea79aed8886816d21224d12b0f1fe9b6bb075@%3Cuser.phoenix.apache.org%3E]
> ----
> A Phoenix query may contain expensive subqueries, and moreover those expensive subqueries
may be used across multiple different queries. While whole result caching is possible at the
application level, it is not possible to cache subresults in the application. This can cause
bad performance for queries in which the subquery is the most expensive part of the query,
and the application is powerless to do anything at the query level. It would be good if Phoenix
provided a way to cache subquery results, as it would provide a significant performance gain.
> An illustrative example:
>     SELECT * FROM table1 JOIN (SELECT id_1 FROM large_table WHERE x = 10) expensive_result
ON table1.id_1 = expensive_result.id_2 AND table1.id_1 = \{id}
> In this case, the subquery "expensive_result" is expensive to compute, but it doesn't
change between queries. The rest of the query does because of the \{id} parameter. This means
the application can't cache it, but it would be good if there was a way to cache expensive_result.
> Note that there is currently a coprocessor based "server cache", but the data in this
"cache" is not persisted across queries. It is deleted after a TTL expires (30sec by default),
or when the query completes.
> This is issue is fairly high priority for us at 23andMe and we'd be happy to provide
a patch with some guidance from Phoenix maintainers. We are currently putting together a design
document for a solution, and we'll post it to this Jira ticket for review in a few days.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message