hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krzysztof Gałęcki <krzysztof.gale...@gmail.com>
Subject RE: Pigi project
Date Wed, 01 Oct 2008 20:53:00 GMT
Hi

1. This is not some great advantage. But if you want to index for example
users (described by firstname, lastname, age) and you would like to execute
queries based on all combinations of that fields - then you have about 2^3
indexes (without ordering). Because of paging, each index can have even 3
tables (we will describe it in technical presentation). So without ordering,
you have 8*3 = 24 additional tables for 1 data table. I would rather want to
have 1 data table and 3 index tables. It is just more clear for me, but if
you like, you can have another table (or 3 tables) for each index.

2. At this stage we don't. It is interesting feature, but I'm not sure if it
is possible to ensure transactions.

Regards

Chriss

-----Original Message-----
From: Ding, Hui [mailto:hui.ding@sap.com] 
Sent: Wednesday, October 01, 2008 7:18 PM
To: hbase-user@hadoop.apache.org
Subject: RE: Pigi project

 This sounds really interesting. A few more questions if I may:

1. what do you see as the advantage of having one index table that
contains all, rather than having separate index tables?
2. do you ensure that update to the main table and the index table are
done in one transaction?

-----Original Message-----
From: cure@g.pl [mailto:cure@g.pl] 
Sent: Wednesday, October 01, 2008 1:48 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Pigi project

> Hey Antoni & Krzysztof:
>
> Couple of things:
>
> + How does it work?  The indices in particular? (I suppose I'm
> interested in seeing the technial presentation).
> + Why the name Pigi?
> + What features do you need in hbase to support Pigi?
> + What Jim said regards the list (unless you wanted just two of us to
> see it first?).
> + Multivalue fields?  Is that cells in hbase-speak?
> + Distributed object cache?  How?  Sounds great.
>


stack pisze:
> Hey Antoni & Krzysztof:
>
> Couple of things:
>
> + How does it work?  The indices in particular? (I suppose I'm
interested in seeing the technial presentation).
> + Why the name Pigi?
> + What features do you need in hbase to support Pigi?
> + What Jim said regards the list (unless you wanted just two of us to
see it first?).
> + Multivalue fields?  Is that cells in hbase-speak?
> + Distributed object cache?  How?  Sounds great.
>

   Hi


We will prepare a short technical presentation, but at this moment i'll
try to answer your questions:

    1) How does it work ?

    The idea is based on fact that identifiers in hbase table are sorted
lexicographically.
    For every 1:n relation Pigi maintains additional table (index
table).
Every row added to child table causes insert row
    to each index designed for that child object. Index table contains
identifiers of ordered child object identifiers.

    This order is cause by special prepared identifiers of rows in index
table - it contains:

        index name
        parent object id
        optional index parameters (for example: color of the car)
        optional ordering parameters (if we want to order results)
        child object id

    Because of index name field in that id, many indexes can share one
index table (so in fact there is no need to create another table for
every one index)

    Pigi helps to create and maintain such kind of indexes. Otherwise
user
has to do it manually (probably individually for each 1:n relation)




             indexes - our framework creates an additional table and
puts
there all data it needs.
                          Indexing is realised by preparing complex
rowId:
                              for example :
                                     we have objects:
                                              -  UserVO  with fields:
id,
name, surname
                                              -  CarVO with fields:  id,
userId, color
                          Each user can have many cars, and one car has
only one owner.

                          We want to execute queries:
                                      - find all cars by userId
                                      - find all cars by userId and
color

                          Framework maintain 2 indexes:
                                    - cars by userId - where rowId in
index table will contain userId data.
                                    - cars by userId and color - where
rowId in index table will contain
userId and color data.

                         indexes are ordered lexicographicaly, than for
descendant index rowId will be "reversed".

                         When we want to change color of a car, we only
have to notify framework about changes in CarVO
object.
                         Framework will update all indexes of this
object.

      2) Why the name Pigi?

               there are no specyfic reason..... :-)

      3) What features do you need in hbase to support Pigi?
              only java API - we use only scanners and simple gets, we
don't use filters.

      4) Multivalue fields?  Is that cells in hbase-speak?

      5) Distributed object cache?  How?  Sounds great.
            in future we will need to write distributed cache -
something
like TreeCache - or use some existing solution.
            We need it to reduce reads from hbase - like in hibernate
and
any Cache.


   Antony



Mime
View raw message