phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "rajeshbabu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-17) Support to make use of partial covered indexes in scan
Date Fri, 31 Jan 2014 06:34:08 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887513#comment-13887513
] 

rajeshbabu commented on PHOENIX-17:
-----------------------------------

bq. Unless the index is highly selective or region local (i.e. data and index data are colocated)
it is hard to get good performance out of HBase.
Yes Lars. We are able to see good performance with colocation.

bq. The index hits would need to be translated in to a set of GETs, which is fairly slow.
(or maybe a SKIP SCAN with the GETs as way points)
Yes. Even if we use region level coprocessors to fetch seek points from index table then we
need to hit all regions of main table I think,which is like losing the meaning of global indexing.
Lets take an example. 
1) create employee table with id, name, addr as columns and create index on name column. Assume
<a,c> are the split keys of the table.
2) Inserted some data
|     ID     |    NAME    |    ADDR    |
+------------+------------+------------+
| a          | foo        | bang       |
| b          | foo        | chen       |
| c          | foo        | mum        |
| d          | foo        | hyd        |
+------------+------------+------------+
3) Then index data will be as follows. Now the foo name is there for four employees.
+--------+---------+
| NAME  |    :ID     |
+--------+---------+
| foo        | a          |
| foo        | b          |
| foo        | c          |
| foo        | d          |
+------------+------------+
4) When we scan with condition name='foo' then from first region coprocessor hooks we can
fetch a, b as seek points similarly c,d from second region hooks.
Like this we should hit all the main table regions.

With this approach, the dead lock issue discussed during put may also come.

bq. Highly selective indexes, obviously, include PK and unique indexes, which could be useful
even when the query isn't covered.
Most of the cases users prefer pk and unique indexes which need not be fully covered.

> Support to make use of partial covered indexes in scan
> ------------------------------------------------------
>
>                 Key: PHOENIX-17
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-17
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: rajeshbabu
>             Fix For: 3.0.0
>
>
> Normally when we want to use secondary indices we create index on one or very few
> columns of interest in query conditions. Index may not contain all the columns to retrieve.
> Currently Phoenix supporting full covered indexes only(where all or most of the columns
> should be in the index in many cases). When we run a query we will choose to scan from
user table
> or index table based on condition that  whether all projected columns in the index or
not.
> This approach may have some disadvantages mainly in case of wider tables.
> 1)If we did just store all the columns in the index, then it would be just like creating
another copy of the entire table
>  – which would take up way too much space and would be very inefficient for wider tables.
> 2) Some times if user creates index on few columns and observes that index is not getting
used
> and then he need to add all the projected columns to index(may be to the part of index
or included columns). 
> which is something like we are exposing design decisions to the users, especially when
we already
> target to simplify user's experience by giving SQL on top of a noSQL DB.
> 3) One more thing is as of now if we have an index table contains all projected columns
in the
> query then we are simply scanning index table only.This can also become full table scan
when we don't
> have any condition in the query or condition on non primary key column of index.
> Some times this might give bad performance than normal table full scan.
> This JIRA is to support making use of partial covered indexes to avoid full table scan.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message