kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (KYLIN-3961) Optimize TopN measure merge function to reduce TopNCounter errors
Date Wed, 17 Apr 2019 14:40:00 GMT


ASF GitHub Bot commented on KYLIN-3961:

zhaojintaozhao commented on pull request #612: KYLIN-3961 Optimize TopNCounter's merge function
to reduce TopNCounter's error size.
   Sometimes TopN measure query will return a large error.
   I optimize TopNCounter's merge function to reduce TopNCounter's errors when using TopN
   This optimization work well in my kylin system and reduce TopNCounter's error size.
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

>  Optimize TopN  measure merge function  to  reduce TopNCounter errors
> ---------------------------------------------------------------------
>                 Key: KYLIN-3961
>                 URL:
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Measure - TopN
>    Affects Versions: v2.5.2
>         Environment: Huawei FusionInsight
>            Reporter: zhao jintao
>            Assignee: zhao jintao
>            Priority: Major
>              Labels: easyfix
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> Hi Team:
> I use "Top-N "measure to query such sql "select sum(AAA) from BBB group by CCC,DDD",
It is much better than a cube without "Top-N".
> In my system, kylin cost just 0.2s to query sql with "Top-N" measure cube; If without
"Top-N" measure it may be cost 10s.
> But I find that Top-N measure can be optimized to reduce errors.
> I use kylin demo to test "TopN".
> I build two cube using "KYLIN_SALES". The first cube has three dimentions:"SELLER_ID","BUYER_ID"
and "PART_DT", has one measures: "SUM(PRICE)" . The second cube has one dimention:"PART_DT",
has twon measures: "SUM(PRICE)" and "TOPN(10)", the "ORDER|SUM by Column" of  "TOPN(10)"
is "PRICE", the "Group by Column"  of “TOPN(10)” is "SELLER_ID" and "BUYER_ID",the "Return
Type" of "TOPN(10)" is "Top 10". Then I build cube from "2012-01-01" to "2014-01-01".
> I use same sql to query two cube. I find that 2 cubes have a larger error.
> The top5  "SUM PRICE" of first cube without "TopN" is "167.7269", "99.9908", "99.9888","99.9865","99.978".
> The top5 "SUM PRICE" of second cube with "TopN" is "179.27699...","167.6320...","167.3050...","167.2069...","166.7429...".
> Does any one meet same problem?
> Best regards.

This message was sent by Atlassian JIRA

View raw message