lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: ParallelMultiSearcher and idf
Date Tue, 04 Aug 2009 14:20:23 GMT
Hi Christian,

You didn't mention Solr, so I'm not sure if you are aware of it.  Maybe Solr meets your needs?

Sematext is hiring --
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

----- Original Message ----
> From: Christian Reuschling <>
> To:
> Sent: Tuesday, August 4, 2009 5:50:16 AM
> Subject: ParallelMultiSearcher and idf
> Hello,
> when searching over multiple indices, we create one IndexReader for each index,
> and wrap them into a MultiReader, that we use for IndexSearcher creation.
> This is fine for searching multiple indices on one machine, but in the case the
> indices are distributed over the (intra)net, this scenario has several lacks:
> - searching/scoring/sorting is 100% on the client machine, so you need all the
>   ram and cpu power at every client.
> - all the data necessary for scoring must go over the net - so the traffic
>   should be significantly higher
> - thus, there is a lack of overall performance
> Nevertheless, creating a MultiReader and making a searcher out of it has one
> advantage (at least can be an advantage depending on the scenario): The
> document freqiencies of a term will be summed up, and thus it is 100%
> transparent for scoring whether the indices are splittet or not.
> I'm wondering whether there is the possibility to get the advantages of both
> scenarios, e.g. by first summing up the query terms-related document
> frequencies, and sending them together with the query to every (remote)searcher
> of ParallelMultiSearcher, for scoring.
> Maybe this is exactly what ParallelMultiSearcher does, and I haven't seen it?
> Thanks for clarification!
> Chris

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message