lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: About search books?
Date Fri, 15 Apr 2005 13:33:22 GMT
On Apr 15, 2005, at 7:33 AM, wrote:
> Hello everybody,
> I am trying to put together a search engine specific for books.

Interesting!  Could you tell us more about what you're building?

> Is there anybody that can give me some advice?
> in particular, i have some questions:
> - beside hardware requirements, do you guys think Lucene will perfrom 
> well
> running searches over an index of about 10K TextDocuments? (around 800 
> pages
> each).

It'll perform well.  Or your money back!  :)

>  What if the search is a PhraseSearch with a slop = 10 ?

Phrase queries are not an issue performance-wise.  The slop computation 
is straightforward and fast, so no worries there.

> - If performance will be poor, I tought about index "single" document's
> page... but i need to "group" the results so I can show "distinct" 
> book,
> not pages. At this opint i guess it will be like handle 10K * 800 
> docs..right?
> 8 Millions doc?

Ok, so you're going to index one page per Document.... this won't 
handle phrase searches that span across pages though.

For, I index a document per book section, so spanning 
pages is not a problem.  Phrase queries won't span across sections 
though, nor would I want them to.

But yes, Lucene will handle 8 million docs.

> - The second option will make my life easier to highlight the results 
> on
> a page basis.

But will prevent phrase queries from spanning pages.

> - If I decide split the index on several Server and use the 
> RemoteSearcher,
> will it perform well? or there is too much RMI overhead??

I don't have experience with searching over RMI in a production 
environment.  You'll need to perform test for this.  I would recommend 
keeping the index on a single server though, until it gets too large to 


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message