hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop Sam John <anoo...@huawei.com>
Subject RE: Best technique for doing lookup with Secondary Index
Date Fri, 26 Oct 2012 04:06:01 GMT
Hi Anil,
              Some confusion after seeing your reply.
You use bulk loading?  You created your own mapper?  You call HTable#put() from mappers?

I think confusion in another thread also..  I was refering to the HFileOutputReducer.. There
is a TableOutputFormat also... In TableOutputFormat it will try put to the HTable...  Here
write to WAL is applicable...


[HFileOutputReducer] : As we discussed in another thread, in case of bulk loading the aproach
is like MR job create KVs and write to files and this file is written as an HFile. Yes this
will contain all meta information, trailer etc... Finally only HBase cluster need to be contacted
just to load this HFile(s) into HBase cluster.. Under corresponding regions.  This will be
the fastest way for bulk loading of huge data... 

    
-Anoop-
________________________________________
From: anil gupta [anilgupta84@gmail.com]
Sent: Friday, October 26, 2012 3:40 AM
To: user@hbase.apache.org
Subject: Re: Best technique for doing lookup with Secondary Index

Anoop:  In prePut hook u call HTable#put()?
Anil: Yes i call HTable#put() in prePut. Is there better way of doing it?

Anoop: Why use the network calls from server side here then?
Anil: I thought this is a cleaner approach since i am using BulkLoader. I
decided not to run two jobs since i am generating a UniqueIdentifier at
runtime in bulkloader.

Anoop: can not handle it from client alone?
Anil: I cannot handle it from client since i am using BulkLoader. Is it a
good idea to create Htable instance on "B" and do put in my mapper? I might
try this idea.

Anoop: You can have a look at Lily project.
Anil: It's little late for us to evaluate Lily now and at present we dont
need complex secondary index since our data is immutable.

Ram: what is rowkey B here?
Anil: Suppose i am storing customer events in table A. I have two
requirement for data query:
1. Query customer events on basis of customer_Id and event_ID.
2. Query customer events on basis of event_timestamp and customer_ID.

70% of querying is done by query#1, so i will create
<customer_Id><event_ID> as row key of Table A.
Now, in order to support fast results for query#2, i need to create a
secondary index on A. I store that secondary index in B, rowkey of B is
<event_timestamp><customer_ID>  .Every row stores the corresponding rowkey
of A.

Ram:How is the startRow determined for every query?
Anil: Its determined by a very simple application logic.

Thanks,
Anil Gupta

On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
ramkrishna.vasudevan@huawei.com> wrote:

> Just out of curiosity,
> > The secondary index is stored in table "B" as rowkey B -->
> > family:<rowkey
> > A>
> what is rowkey B here?
> > 1. Scan the secondary table by using prefix filter and startRow.
> How is the startRow determined for every query ?
>
> Regards
> Ram
>
> > -----Original Message-----
> > From: Anoop Sam John [mailto:anoopsj@huawei.com]
> > Sent: Thursday, October 25, 2012 10:15 AM
> > To: user@hbase.apache.org
> > Subject: RE: Best technique for doing lookup with Secondary Index
> >
> > >I build the secondary table "B" using a prePut RegionObserver.
> >
> > Anil,
> >        In prePut hook u call HTable#put()?  Why use the network calls
> > from server side here then? can not handle it from client alone? You
> > can have a look at Lily project.   Thoughts after seeing ur idea on put
> > and scan..
> >
> > -Anoop-
> > ________________________________________
> > From: anil gupta [anilgupta84@gmail.com]
> > Sent: Thursday, October 25, 2012 3:10 AM
> > To: user@hbase.apache.org
> > Subject: Best technique for doing lookup with Secondary Index
> >
> > Hi All,
> >
> > I am using HBase 0.92.1. I have created a secondary index on table "A".
> > Table A stores immutable data. I build the secondary table "B" using a
> > prePut RegionObserver.
> >
> > The secondary index is stored in table "B" as rowkey B -->
> > family:<rowkey
> > A>  . "<rowkey A>" is the column qualifier. Every row in B will only on
> > have one column and the name of that column is the rowkey of A. So the
> > value is blank. As per my understanding, accessing column qualifier is
> > faster than accessing value. Please correct me if i am wrong.
> >
> >
> > HBase Querying approach:
> > 1. Scan the secondary table by using prefix filter and startRow.
> > 2. Do a batch get on primary table by using HTable.get(List<Get>)
> > method.
> >
> > The above approach for retrieval works fine but i was wondering it
> > there is
> > a better approach. I was planning to try out doing the retrieval using
> > coprocessors.
> > Have anyone tried using coprocessors? I would appreciate if others can
> > share their experience with secondary index for HBase queries.
> >
> > --
> > Thanks & Regards,
> > Anil Gupta=
>
>


--
Thanks & Regards,
Anil Gupta
Mime
View raw message