hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <>
Subject [jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
Date Sat, 14 Nov 2015 09:10:10 GMT


Jesus Camacho Rodriguez commented on HIVE-11531:

[~huizane], thanks for working on this!

Agreed with the comments left by [~sershe]. In addition:

- This block in CalcitePlanner:
      Integer offset = qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next())==null?
      Integer limit = qbp.getDestToLimit().get(qbp.getClauseNames().iterator().next())==null?
No need to create the iterator twice for offset (and for limit) and call next. Could you cache
the value to avoid unnecessary overhead and improve readability?
- A couple of additional style notes. There is typo in {{getOffetExpr}} method in HiveSortLimit.
We could rename old {{limit}} variable to {{fetch}} (for instance, in line 2275 of CalcitePlanner)
to avoid confusion. Please, pay attention with code spacing (e.g. {{}else{}}).
- On a side note, maybe we should support the more verbose syntax too (as part of this JIRA
or a follow-up). In addition to the abbreviated syntax:
LIMIT (skip,)? n
that is mostly used by MySQL, we could support:
which is supported e.g. by MySQL and PostgreSQL.

[~sershe], Calcite should handle offset properly: support for offset was already there. The
only code that needs to be extended is in HIVE-11684; I will update the patch accordingly
so we do not ignore offset.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -----------------------------------------------------------------------------
>                 Key: HIVE-11531
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Hui Zheng
>         Attachments: HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, HIVE-11531.patch
> For any UIs that involve pagination, it is useful to issue queries in the form SELECT
... LIMIT X,Y where X,Y are coordinates inside the result to be paginated (which can be extremely
large by itself). At present, ROW_NUMBER can be used to achieve this effect, but optimizations
for LIMIT such as TopN in ReduceSink do not apply to ROW_NUMBER. We can add first class support
for "skip" to existing limit, or improve ROW_NUMBER for better performance

This message was sent by Atlassian JIRA

View raw message