phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
Date Sun, 31 Aug 2014 19:06:20 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anoop Sam John updated PHOENIX-1203:
------------------------------------
    Affects Version/s:     (was: 4.1)
                           (was: 4.0.0)
                       3.0.0

> Uable to work for count (distinct col) queries via phoenix table with secondary indexes
> ---------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1203
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1203
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>         Environment: hadoop-2.2.0   hbase: Version 0.98.3-hadoop2
>            Reporter: Sun Fulin
>            Assignee: Anoop Sam John
>              Labels: distinct, secondaryIndex, test
>         Attachments: PHOENIX-1203.patch
>
>
> I build the latest 4.1 rc0 from here:  https://github.com/apache/phoenix/releases
> And examine the count (distinct col) query within the new environment. However, the problem
still exists with index queries as the following while the correct distinct query result is
expected to be 536 for my project:
> 0: jdbc:phoenix:zookeeper1> select count (distinct t.imsi) from ranapsignal t where
t.pkttime>=1404964800000 and t.pkttime<=1404965699999 and t.sac=32351 and t.nasmsgtype=0
and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ;
> +-------------+
> | COUNT(IMSI) |
> +-------------+
> | 2322        |
> +-------------+
> 1 row selected (70.572 seconds)
> As James suggests, I conduct the query adding group by t.imsi with /without secondary
indexes. And the result seems to be fine as they both got the correct 536 distinct groups.

> Here are some considerations:
> 1. count (distinct col) query over index table did not work as expectation.
> 2. only distinct query over index table works fine.
> 3. If the phoenix version got some wrong configuration, correct me.
> Thanks and Best Regards,
> Sun
> -------------------------------
> Hi Sun,
> Thanks for the detailed description. Yes, your syntax is correct, and
> it's definitely true that the count distinct query should return the
> same result with and without the index. Would you mind trying this on
> our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to
> file a JIRA?
>  
> One thing that make make it easier for your testing: do you know about
> our NO_INDEX hint which forces the query *not* to use an index, like
> this:
>  
> select /*+ NO_INDEX */ ...
>  
> Another question too. What about this query with and with/out the index:
>  
> select count(*) from ranapsignal t
> where t.pkttime>=1404964800000 and t.pkttime<=1404965699999
> and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0
> and t.ranapsubmsgtype=0
> group by t.imsi;
>  
> Thanks,
> James
>  
> On Thu, Aug 21, 2014 at 10:38 PM, sunfl@certusnet.com.cn
> <sunfl@certusnet.com.cn> wrote:
> >
> >
> > Hi James,
> >
> >   Recently I got trouble while trying to conduct some query performance test
> > in my phoenix tables with secondary indexes.
> >   I created a table called RANAPSIGNAL for my projects in phoenix via
> > sqlline client and load data into the table. Then I create
> >   an index on the specific column PKTTIME for the table RANAPSIGNAL while
> > including other more columns for adjusting my
> >   index query, like the following DDL:
> >          create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include
> > (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE );
> >    The index creation worked successfully without any errors. So, when I am
> > trying to conduct such query as:
> >            select count (distinct t.imsi) from ranapsignal t where
> > t.pkttime>=1404964800000 and t.pkttime<=1404965699999
> >            and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
> > t.ranapsubmsgtype=0 ;
> >    Without secondary indexes, the final result got 536 distinct imsi, wihch
> > is the right distinct count results. However, after I  create the above
> >   secondary index PKT_IDX and reconducting the above count (distinct imsi)
> > query, I got 2322 imsi rows which obviously are not the expected
> >   distinct counts results. I used the explain grammar to observe the scan of
> > the above select query and found that it definitely scaned over
> >   the index table PKT_IDX. I then tried to conduct the following query with
> > no count function:
> >             select distinct t.imsi from ranapsignal t where
> > t.pkttime>=1404964800000 and t.pkttime<=1404965699999
> >            and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
> > t.ranapsubmsgtype=0 ;
> >    And the result is right 536 distinct imsi over scanning the index table.
> > By the way, imsi is one of the primary key when creating the table
> > RANAPSIGNAL.
> >    Here are several considerations for my trouble and practice:
> >    1. Did you guys ever practice such count (distinct) queries over phoenix
> > table via secondary index?
> >    2. I am not sure whether this problem was due to the index table, but my
> > practice may assume that conclusion.
> >    3. Corrects my if I am wrong with my previous operations (index creation,
> > DDL grammar, etc..)
> >    4. Any available hints or reply are best appreciated.
> >
> >   Thanks and Best Regards
> >   Sun
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message