lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branham, Jeremy [HR]" <Jeremy.D.Bran...@sprint.com>
Subject RE: SOLR reindexing
Date Mon, 04 Mar 2013 16:04:17 GMT
Thank Chris.
I considered this approach but wasn't sure about resource consumption.

We've run into a couple of issues where a full index rebuild/swap/replicate [overlapping]
has left the slaves looking for an index that doesn't exist. This should resolve that issue.



Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Office: +1 (972) 405-2970 | Mobile: +1 (817) 791-1627
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham


-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
Sent: Friday, March 01, 2013 6:22 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR reindexing


: For full reindexes (DIH full-import), I use build cores, then swap them with
: the live cores.  I don't do this for performance reasons, I do it because I
: want to continue making incremental updates to the live cores while the
: rebuild is underway.  The rebuild takes four hours.

that's kind of a special case though -- the OP mentioned that the entire reason he does full
rebuilds is because he has no way of inrementally tracking changes to his source data, so
he's clearly not going to be making incremental updates to a "live" core.

in the simpler case of "rebuild the full index ever N hours, never to incremental updates"
a simple master/slave setup is probably the easiest
-- with a single "snappull" command being triggered once you know the full index is build.

If you only have one machine to work with, then another simple appropach is just use a single
solr core, and rebuild ontop of your existing data every N hours.  You can use a "timestamp"
field to keep track of when documents were added and do a deleteByQuery on the timestamp field
at the end of teh "rebuild" to remove any old documents (ie: things no longer in your source
data)

as long as you don't commit until the end of your "rebuild" you don't need to worry about
inconsistent data, and you should wind up using less resources then the core swapping approach.

-Hoss


________________________________

This e-mail may contain Sprint Nextel proprietary information intended for the sole use of
the recipient(s). Any use by others is prohibited. If you are not the intended recipient,
please contact the sender and delete all copies of the message.


Mime
View raw message