lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregory Chanan <gcha...@cloudera.com>
Subject Re: Managed Schema and SolrCloud
Date Thu, 05 Jun 2014 02:37:01 GMT
Thanks for the reply, Steve.

I filed SOLR-6137.

Greg


On Wed, Jun 4, 2014 at 4:08 PM, Steve Rowe <sarowe@gmail.com> wrote:

> Hi Greg,
>
> Your understanding is correct, and I agree that this limits managed schema
> functionality.
>
> Under SolrCloud, all Solr nodes participating in a collection bound to a
> configset with a managed schema keep a watch on the corresponding schema ZK
> node.  In my testing (on my laptop), when the managed schema is written to
> ZK, the other nodes are notified very quickly (single-digit milliseconds)
> and immediately download and start parsing the schema.  Incoming requests
> are bound to a snapshot of the live schema at the time they arrive, so
> there is a window of time between initial posting to ZK and swapping out
> the schema after parsing.  Different loads on, and/or different network
> latentcy between ZK and each participating node can result in varying
> latencies before all nodes are in sync.
>
> For Schema API users, delaying a couple of seconds after adding fields
> before using them should workaround this problem.  While not ideal, I think
> schema field additions are rare enough in the Solr collection lifecycle
> that this is not a huge problem.
>
> For schemaless users, the picture is worse, as you noted.  Immediate
> distribution of documents triggering schema field addition could easily
> prove problematic.  Maybe we need a schema update blocking mode, where
> after the ZK schema node watch is triggered, all new request processing is
> halted until the schema is finished downloading/parsing/swapping out?  Can
> you make an issue, Greg?  (Such a mode should help Schema API users too.)
>
> Thanks,
> Steve
>
> On Jun 3, 2014, at 8:06 PM, Gregory Chanan <gchanan@cloudera.com> wrote:
>
> > I'm trying to determine if the Managed Schema functionality works with
> SolrCloud, and AFAICT the integration seems pretty limited.
> >
> > The issue I'm running into is variants of the issue that schema changes
> are not pushed to all shards/replicas synchronously.  So, for example, I
> can make the following two requests:
> > 1) add a field to the collection on server1 using the Schema API
> > 2) add a document with the new field, the document is routed to a core
> on server2
> >
> > Then, there appears to be a race between when the document is processed
> by the core on server2 and when the core on server2, via the
> ZkIndexSchemaReader, gets the new schema.  If the document is processed
> first, I get a 400 error because the field doesn't exist.  This is easily
> reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
> >
> > I hit a similar issue with Schemaless: the distributed request handler
> sends out the document updates, but there is no guarantee that the other
> shards/replicas see the schema changes made by the update.chain.
> >
> > Is my understanding correct?  Is this expected?
> >
> > Thanks,
> > Greg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message