hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: HBase Index: indexed table or lucene index
Date Mon, 23 Nov 2009 06:57:14 GMT
I'll try explaining again.

These are two separate things.

1. HBase column indexing/ Secondary indexes.
Lets say you have queries like "Give me all rows where columnA="xyz"". You
can have another hbase table (which is essentially your secondary index)
where the row keys are the values of the columnA from the original table.
And one of the rows in the secondary index table will be for xyz and you'll
have the list of rowid's from the original table stored in it. This gives
you an easy way of determining which rows in the original table have the
value of columnA=xyz.

2. Indexes for free text.
This is for answering questions like "Give me all documents where the word
"Amandeep" occurs". You can build an inverted index using tools like Lemur
or Lucene and query that index to find the list of documents.

Now, you can store the free text index in hbase if you want where you can
have a row for each word in the free text index and the cells in that row
can be a list of documents that contain that word.

I hope this makes it clearer. Or did I take it too basic and not answer your
question at all?


On Sun, Nov 22, 2009 at 10:46 PM, <sallonchina@hotmail.com> wrote:

> Thanks, Amandeep. But a little confused:  as I known, lucene index built by
> hbase mapreduce(
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/BuildTableIndex.html
> )
> is of key-value type where key is column name. If I store these indice in
> hbase, how I import them, still column name: value? Seems like the data form
> in original htable. Otherwise, if i store them in HDFS, how I use the index
> to improve the search. Till now, I am not clear this mechanism can help, so
> what do you think of it?
> --------------------------------------------------
> From: "Amandeep Khurana" <amansk@gmail.com>
> Sent: Monday, November 23, 2009 2:31 PM
> To: <hbase-user@hadoop.apache.org>
> Subject: Re: HBase Index: indexed table or lucene index
>  So you are essentially trying to build a search feature over text. Index
>> using Lucene or Lemur and store the index in HBase if you want. Thats one
>> way of doing it.
>> Secondary indexes in hbase are not what you want. You need to index
>> documents/text.
>> On Sun, Nov 22, 2009 at 10:27 PM, <sallonchina@hotmail.com> wrote:
>>  Hi, Amandeep. My applications store each text page and its features as
>>> one
>>> row in Htable. When given a query, it has to scan all rows in the table
>>> and
>>> calculate scores of each row based on their features. Test shows the
>>> response speed is not too high for real-time applciation. So I am
>>> thinking
>>> build some index or use other mechanism like cache to improve the query
>>> performance. Any suggestions?
>>> Thanks.
>>> --------------------------------------------------
>>> From: "Amandeep Khurana" <amansk@gmail.com>
>>> Sent: Monday, November 23, 2009 2:18 PM
>>> To: <hbase-user@hadoop.apache.org>
>>> Subject: Re: HBase Index: indexed table or lucene index
>>>  What kind of querying do you want to do? What do you mean by query
>>>> performance?
>>>> Hbase has secondary indexes (IndexedTable). However, its recommended
>>>> that
>>>> you build your own secondary index instead of using the one provided by
>>>> Hbase.
>>>> Lucene is a different framework altogether. Lucene indexes are for
>>>> unstructured text processing (afaik). How did you end up linking the
>>>> two?
>>>> -Amandeep
>>>> 2009/11/22 <sallonchina@hotmail.com>
>>>>  Hi, everyone. I am focusing on improve data query performance from
>>>> HBase
>>>>> and found that there are secondary index and lucene index built by
>>>>> mapreduce. I am not clear whether both index are the same. If not,
>>>>> which
>>>>> is
>>>>> more helpful to data query?
>>>>> Thanks.
>>>>> Best Wishes!
>>>>> _____________________________________________________________
>>>>> 刘祥龙  Liu Xianglong

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message