lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Replication happening before replicateAfter event
Date Sat, 01 Dec 2012 20:13:03 GMT
First comment: You probably don't need to optimize. Despite its name, it
rarely makes a difference and has several downsides, particularly it'll make
replication replicate the entire index rather than just the changed
segments.
Optimize purges leftover data from docs that have been deleted, which will
happen anyway on segment merges.

But your problem isn't really a problem I don't think. I think you're
confusing
special events and polling. When you set these properties:
"replicateAfter" "startup" and "optimize", you're really telling the slaves
to update when any of them fire _in addition to_ when any replication that
happens due to polling. So when you optimize, a couple of thing happen.
1> all unclosed segments are closed.
2> segments are merged.

If the poll happens between 1 and 2, you'll get an index replication. Then
you'll get another after the optimize.

Ditto on autocommits. An auto commit closes the open segments. As soon
as a poll sees that, the new segments are pulled down.

The intent is for polling to pull down all changes it can every time, that's
just the way it's designed.

So you have a couple of choices:
1> use the HTTP api to disable replication, then enable it when you want.
2> turn off autocommit and don't commit during indexing at all until the
very end. No commit ==  no replication.
3> but even if you do <2>, you still might get a replication after commit
and after optimize. If you insist on optimizing, you're probably stuck with
<1>. But I'd really think twice about the optimize bit.

Best
Erick


On Fri, Nov 30, 2012 at 7:25 AM, Duncan Irvine <duncan.w.irvine@gmail.com>wrote:

> Hi All,
>   I'm a bit new to the whole solr world and am having a slight problem with
> replication.  I'm attempting to configure a master/slave scenario with bulk
> updates happening periodically. I'd like to insert a large batch of docs to
> the master, then invoke an optimize and have it only then replicate to the
> slave.
>
> At present I can create the master index, which seems to go to plan.
>  Watching the updateHandler, I see records being added, indexed and
> auto-committed every so often.  If I query the master while I am inserting,
> and auto-commits have happened I see 0 records.  Then, when I commit at the
> end, they all appear at once.  This is as I'd expect.
>
> What doesn't seem to be working right is that I've configured replication
> to "replicateAfter" "startup" and "optimize" with a pollInterval of 60s;
> however the slave is replicating and serving the "uncommitted" data
> (although presumably post-auto-commit).
>
> According to my master, I have:
>
> Version: 0
> Gen: 1
> Size: 1.53GB
> replicateAfter: optimize, startup
>
> And, at present, my slave says:
> Master:
>   Version: 0
>   Gen: 1
>   Size: 1.53GB
> Slave:
>   Version: 1354275651817
>   Gen: 52
>   Size: 1.39GB
>
> Which is a bit odd.
> If I query the slave, I get results and as the slave polls I gradually get
> more and more.
>
> Obviously, I can disable polling and enable it programmatically once I'm
> ready, but I was hoping to avoid that.
>
> Does anyone have any thoughts?
>
> Cheers,
>   Duncan.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message