hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
Date Wed, 21 Dec 2016 09:58:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766626#comment-15766626
] 

Jesus Camacho Rodriguez edited comment on HIVE-11652 at 12/21/16 9:58 AM:
--------------------------------------------------------------------------

[~dhiraj.kumar], thanks for the feedback.

FWIW, when we worked on this patch, we focused on the _removeAll_ call because it was causing
slowness for many queries. We run tests on multiple workloads w/o the patch, and we did not
observe performance regression wrt previous version. However, it is indeed possible that our
queries were not covering all possible cases.

The performance of the algorithm could be improved by storing the position of the last added
child to the stack. However, this will impact the memory footprint too, as we will need to
store an integer per node in the stack. I think that is acceptable, but we just should be
aware that there is a trade-off here.

I would suggest you open a new JIRA case pointing to this one, describe the problem, and provide
some query where this issue is noticeable so we can reproduce the issue in a different environment.
If you have a clear idea on the solution, you could contribute a patch. Otherwise, I will
try to look into the issue asap.

--

Btw, wrt 3., have you set _hive.driver.parallel.compilation_ to _true_? That should at least
prevent the locking issue.


was (Author: jcamachorodriguez):
[~dhiraj.kumar], thanks for the feedback.

FWIW, when we worked on this patch, we focused on the _removeAll_ call because it was causing
slowness for many queries. We run tests on multiple workloads w/o the patch, and we did not
observe performance regression wrt previous version. However, it is indeed possible that our
queries were not covering all possible cases.

The performance of the algorithm could be improved by storing the position of the last added
child to the stack. However, this will impact the memory footprint too, as we will need to
store an integer per AST node in the stack. I think that is acceptable, but we just should
be aware that there is a trade-off here.

I would suggest you open a new JIRA case pointing to this one, describe the problem, and provide
some query where this issue is noticeable so we can reproduce the issue in a different environment.
If you have a clear idea on the solution, you could contribute a patch. Otherwise, I will
try to look into the issue asap.

--

Btw, wrt 3., have you set _hive.driver.parallel.compilation_ to _true_? That should at least
prevent the locking issue.

> Avoid expensive call to removeAll in DefaultGraphWalker
> -------------------------------------------------------
>
>                 Key: HIVE-11652
>                 URL: https://issues.apache.org/jira/browse/HIVE-11652
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer, Physical Optimizer
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>             Fix For: 1.3.0, 2.0.0
>
>         Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will
take very long as it will have to go through the list looking for each of the nodes. We try
to get rid of this call by rewriting the logic in the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message