hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tim robertson" <timrobertson...@gmail.com>
Subject Lucene from HBase - raw values in Lucene index or not?
Date Tue, 16 Dec 2008 21:17:05 GMT
Hi All,

I have HBase running now, building Lucene indexes on Hadoop
successfully and then I will get Katta running for distributing my
indexes.

I have around 15 search fields indexed that I wish to extract and
return those 15 to the user in the result set - my result sets will be
up to millions of records...

Should I:

  a) have the values stored in the Lucene index which will make it
slower to search but returns the results immediately in pages without
hitting HBase

or

  b) Not store the data in the index but page over the Lucene index
and do millions of "get by ROWKEY" on HBase

Obviously this is not happening synchronously while the user waits,
but looking forward to hear if people have done similar scenarios and
what worked out nicely...

Lucene degrades in performance at large page numbers (100th page of
1000 results) right?

Thanks for any insights,

Tim

Mime
View raw message