lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaeger, Jay - DOT" <Jay.Jae...@dot.wi.gov>
Subject RE: About solr distributed search
Date Thu, 29 Sep 2011 13:07:05 GMT
I am no expert, but here is my take and our situation.

Firstly, are you asking what the minimum number of documents is before it makes *any* sense
at all to use a distributed search, or are you asking what the maximum number of documents
is before a distributed search is essentially required?  The answers would be different. 
I get the feeling you are asking the second question, so I'll proceed under that assumption.

I expect that in part the answer is "it depends".  I expect that it is mostly a function of
the size of the index (and the interaction between that and memory and search performance),
which depends on both the number of documents and how much is stored for the documents.  It
also would depend upon your update load.

If the documents are small and/or the amount of stuff you store per document is small , then
until the number of documents and/or updates gets truly enormous a single machine will probably
be fine.

But, if your documents (the amount stored per document) is very large, then at some point
the index files get so large that performance on a single machine isn't adequate.  Alternatively,
if your update load is very very large, you might need to spread out that load among multiple
servers to handle the update load without crippling your ability to respond to queries.

As for a specific instance, we have a single index of 7 Million (going on 28 Million), with
maybe 512 bytes of data stored for each document, with maybe 26 or so indexed fields (we have
a *lot* of copyField operations in order to index the data the way we want it, yet preserve
the original data to return), and did not need to use distributed search.

JRJ

-----Original Message-----
From: Pengkai Qin [mailto:qin19890204@163.com] 
Sent: Thursday, September 29, 2011 5:15 AM
To: solr-user@lucene.apache.org; dev@lucene.apache.org
Subject: About solr distributed search

Hi all,

Now I'm doing research on solr distributed search, and it is said documents more than one
million is reasonable to use distributed search.
So I want to know, does anyone have the test result(Such as time cost) of using single index
and distributed search of more than one million data? I need the test result very urgent,
thanks in advance!

Best Regards,
Pengkai



Mime
View raw message