hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TuX RaceR <tuxrace...@gmail.com>
Subject Re: ported lucandra: lucene index on HBase
Date Mon, 19 Apr 2010 08:06:12 GMT
Hi Thomas,

Thanks for sharing your code for lucehbase.
The schema you used  seems the same as the one use in lucandra:

*Documents Ids are currently random and autogenerated.

*Term keys and Document Keys are encoded as follows (using a random 
binary delimiter)

      Term Key                     col name         value
      "index_name/field/term" => { documentId , position vector }

      Document Key
      "index_name/documentId" => { fieldName , value }

I have two questions:
1) for a given term key, the number of column can get potentially very 
large. Have you tried another schema where the document id is put in the 
key, i.e.:

      Term Key                                               col 
name         value
      "index_name/field/term/docid" => { info , position vector }
That way you get trivial paging in the case where a lot of documents 
contain the term.

2) once you get the list of docids, to get the document details (i.e the 
pairs { fieldName , value }), you will trigger a lot of random access 
queries to Hbase (where in 1, with the alternative schema 
"index_name/field/term/docid" you open a scanner and with the schema 
"index_name/field/term" you just get one row). I am wondering how you 
can get fast answers that way. If you have few fields, would it be a 
good idea to store also the values in the index (only the alternative 
schema "index_name/field/term/docid" allows this)?


Thomas Koch wrote:
> Hi,
> Lucandra stores a lucene index on cassandra:
> http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend
> As the author of lucandra writes: "I’m sure something similar could be built 
> on hbase."
> So here it is:
> http://github.com/thkoch2001/lucehbase
> This is only a first prototype which has not been tested on anything real yet. 
> But if you're interested, please join me to get it production ready!
> I propose to keep this thread on hbase-user and java-dev only.
> Would it make sense to aim this project to become an hbase contrib? Or a 
> lucene contrib?
> Best regards,
> Thomas Koch, http://www.koch.ro

View raw message