phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maryann Xue (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-4666) Add a subquery cache that persists beyond the life of a query
Date Fri, 13 Apr 2018 20:45:00 GMT


Maryann Xue commented on PHOENIX-4666:

Thank you for explaining 2! And now it's clear to me. It is actually an optimization implemented
in PHOENIX-852. It doesn't not apply to all join queries, so I guess in the test case you
have there this optimization is triggered. So I'm thinking two options here:
 1) Call {{HashCacheClient#evaluateKeyExpression()}} to get the key ranges if cache is already
available on the server side, in which case {{CachedSubqueryResultIterator}} would still be
needed but we do not add cache one more time. We can have a client-side cache for such key-range
values as well. And if this is the first client building the cache for the first time, we
get these values from calling {{addHashCache()}} and then cache them on the client side.
 2) A more radical but easier approach is to disable this "child-parent (FK-PK) join optimization"
when using persistent cache. This makes some practical sense: if we can make a big performance
gain from avoiding rebuilding the hash cache, it could indicate that the cache itself might
be of some considerable side, and thus the key ranges generated from a relatively large amount
of values might not be that useful to narrow down the scan anyway.
 For now, I actually prefer the second approach, since we can focus on the main part of this
issue and move forward faster.

For 5: Just for simplicity. Feel like we can have less getters and "get" calls here.

> Add a subquery cache that persists beyond the life of a query
> -------------------------------------------------------------
>                 Key: PHOENIX-4666
>                 URL:
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Marcell Ortutay
>            Assignee: Marcell Ortutay
>            Priority: Major
> The user list thread for additional context is here: []
> ----
> A Phoenix query may contain expensive subqueries, and moreover those expensive subqueries
may be used across multiple different queries. While whole result caching is possible at the
application level, it is not possible to cache subresults in the application. This can cause
bad performance for queries in which the subquery is the most expensive part of the query,
and the application is powerless to do anything at the query level. It would be good if Phoenix
provided a way to cache subquery results, as it would provide a significant performance gain.
> An illustrative example:
>     SELECT * FROM table1 JOIN (SELECT id_1 FROM large_table WHERE x = 10) expensive_result
ON table1.id_1 = expensive_result.id_2 AND table1.id_1 = \{id}
> In this case, the subquery "expensive_result" is expensive to compute, but it doesn't
change between queries. The rest of the query does because of the \{id} parameter. This means
the application can't cache it, but it would be good if there was a way to cache expensive_result.
> Note that there is currently a coprocessor based "server cache", but the data in this
"cache" is not persisted across queries. It is deleted after a TTL expires (30sec by default),
or when the query completes.
> This is issue is fairly high priority for us at 23andMe and we'd be happy to provide
a patch with some guidance from Phoenix maintainers. We are currently putting together a design
document for a solution, and we'll post it to this Jira ticket for review in a few days.

This message was sent by Atlassian JIRA

View raw message