lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <rochk...@jhu.edu>
Subject RE: how to do offline adding/updating index
Date Wed, 11 May 2011 14:26:43 GMT
Theoretically, a commit alone should have negligible effect on the slave, because of the same
aspect of Solr architecture that makes too frequent commits problematic --- an existing Searcher
continues to serve requests off the old version of the index, until the new commit (plus all
it's warming) is complete, at which point the newly warmed Searcher switches into action.


So long as there's enough RAM available for both operations, and so long as there's enough
CPU available so the committing and warming of the new stuff doesn't starve things out. (this
is where the 'too frequent commit' problem comes in, when you get so many overlapping commits
such that you run out of RAM and/or CPU)

However, this same 'theoretical' logic could be used to argue that you should be able to commit
directly to the 'slave' without any replication at all with no performance indications, which
doesn't seem to match actually observed results. So maybe it should be taken with a grain
of salt, and investigated empirically. For that matter, it has seemed to me that even in the
master-slave setup that I use, while the commit is going on there is SOME performance implication,
although I haven't benchmarked it well, just impression. But it hasn't been a disastrous one,
and it's a relatively short timespan, in the replication scenario.  

Running master and slave on the very same server (one with a whole bunch of cores and plenty
of RAM), there hasn't seemed to me to be any performance implications on searching the slave
while 'add'ing to the master (in a completely seperate java container). Only when actually
doing the replication pull (and it's inherent commit to slave). 
________________________________________
From: kenf_nc [ken.foster@realestate.com]
Sent: Wednesday, May 11, 2011 9:46 AM
To: solr-user@lucene.apache.org
Subject: Re: how to do offline adding/updating index

My understanding is that the Master has done all the indexing, that
replication is a series of file copies to a temp directory, then a move and
commit. The slave only gets hit with the effects of a commit, so whatever
warming queries are in place, and the caches get reset. Doing too many
commits too often is a problem in any situation with Solr and I wouldn't
recommend it here. However, the original question implied commits would
occur approximately once an hour, that is easily within the capabilities of
the system. Fine tuning of warming queries should minimize any performance
impact. Any effects should also be a relatively linear constant, they should
not be wildly affected by the size of the update or the number of documents.
Warming query results may be slightly different with new documents, but on
the other hand, your new documents are now in cache ready for fast search,
so a reasonable trade off.

--
View this message in context: http://lucene.472066.n3.nabble.com/how-to-do-offline-adding-updating-index-tp2923035p2927336.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message