phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Feng (Jira)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-5491) Improve performance of InListExpression.hashCode
Date Thu, 26 Sep 2019 01:54:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chen Feng updated PHOENIX-5491:
-------------------------------
    Description: 
WhereOptimizer runs very slow in parsing sql like "where A in (a1, a2, ..., a_N) and B = X"
when N is very large. In our environment, it runs > 90s for N = 140000.

This is because for the same instance of InListExpression, InListExpression.hashCode() is
calculate every time, where InListExpression.values is traversed.

In previous sql, InListExpression.hashCode() will be called N times, and InListExpression.values
has N elements. Therefore the total complexity is N^2.

Saving the hashCode of InListExpression can reduce the complexity to N. The test shows the
large sql can be finished within 5 seconds.

  was:
In WhereOptimizer.pushKeyExpressionsToScan(), has a line of code: "extractNodes.addAll(nodesToExtract)"

When executing sqls like "select * from ... where A in (a1, a2, ..., a_n) and B = X", saying
A in N (N > 100,000) elements, previous code execution will slow (> 90s in our environment).

This is because in such case, extractNodes is a HashSet, nodesToExtract is a List with N
InListExpression (the N InListExpressions are the same instance), each InListExpression.values
has N elements as well.

HashSet.addAll(list<InListExpression>) will call N times of InListExpression.hashCode().
Each time, InListExpression.hashCode() will calculate hashCode for every value. Therefore,
the time complexity will be N^2.

A simple way to solve it is to remember of the hashCode of InListExpression and returns it
directly if calculated once. The query will finish in 5 seconds.


> Improve performance of InListExpression.hashCode
> ------------------------------------------------
>
>                 Key: PHOENIX-5491
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5491
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Chen Feng
>            Assignee: Chen Feng
>            Priority: Critical
>         Attachments: PHOENIX-5491-v2.patch, PHOENIX-5491.patch
>
>
> WhereOptimizer runs very slow in parsing sql like "where A in (a1, a2, ..., a_N) and
B = X" when N is very large. In our environment, it runs > 90s for N = 140000.
> This is because for the same instance of InListExpression, InListExpression.hashCode()
is calculate every time, where InListExpression.values is traversed.
> In previous sql, InListExpression.hashCode() will be called N times, and InListExpression.values
has N elements. Therefore the total complexity is N^2.
> Saving the hashCode of InListExpression can reduce the complexity to N. The test shows
the large sql can be finished within 5 seconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message