lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shai deljo" <>
Subject Using Lucene - Design Question
Date Tue, 20 Feb 2007 10:51:13 GMT
I have no experience with Lucene and I'm trying to collect some
information in order to determine what solution is best for me.
I need to index ~50M documents (starting with 10M), the size of each
document is ~2k-~5k and I'll index a couple of fields per document. I
expect ~20 queries per seconds and each query is ~4 terms. Update rate
- not sure what is best and/or possible strategy based on performance,
i.e. incremental indexing vs. pushing a full index but as far as the
product is concerned most data can be updated daily, the head (let's
say 20%) needs hourly (or at least on the order of hours) update.
I also need to be able to override the scoring/ranking and inject my
own logic and of course  my main concern is response time, especially
since i have additional computation on the hits before returning the

BTW, for the additional ranking/computation i will need to retrieve
values that are mapped by a term-field key, i.e. i can't know the key
until i have the result and the query in my hands. i figured i would
use Oracle Berkeley DB Java edition in order to keep the calls as much
as possible in the memory -> any advise on this as well ?

For these requirements, do i need to worry about partitioning the
Index? If i do partition it, is there a solution to merge the results
back or do i need to do it on my own (does Solr do it for me and if it
does, can i override the scoring there)?
AS far as serving multiple users, will a simple rsync of the index
between multiple nodes running the same index (i am not that sensitive
to data integrity) work or do i need to look at something like

In short, i am looking for the simplest solution.

Thanks in advance.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message