lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gregory Chanan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-6137) Managed Schema / Schemaless and SolrCloud concurrency issues
Date Thu, 05 Jun 2014 02:37:01 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018425#comment-14018425
] 

Gregory Chanan commented on SOLR-6137:
--------------------------------------

The Schema API blocking mode is an interesting idea, I'd want to think more about that.

In some sense, the schemaless issue seems easier to solve than the Schema API issue.  This
is because if we run all (or more) of the update chain, instead of just skipping to the distributed
update handler on the forwarded nodes, we could have all the cores apply the schema changes,
so we are guaranteed of having the correct schema on each core.  We'd need to be smarter about
trying to update the schema in ZK (as I noted above, concurrent schema changes may fail currently).
 But that doesn't seem impossible.

The Schema API issue does seem more difficult.  A blocking mode could work in theory, though
I guess one complication is you need to wait for all the cores that use the config, not just
all the cores of the collection.  Although, perhaps we should just throw in some checks that
only one collection is using a certain managed schema config at a time; it may make the logic
easier and it seems very unlikely the user actually wants to use the same schema for multiple
collections (I did that myself the first time before realizing why it didn't make any sense).

As Steve noted above, a blocking mode could be used by the schemaless functionality as well,
instead of what I wrote above.

> Managed Schema / Schemaless and SolrCloud concurrency issues
> ------------------------------------------------------------
>
>                 Key: SOLR-6137
>                 URL: https://issues.apache.org/jira/browse/SOLR-6137
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis, SolrCloud
>            Reporter: Gregory Chanan
>
> This is a follow up to a message on the mailing list, linked here: http://mail-archives.apache.org/mod_mbox/lucene-dev/201406.mbox/%3CCAKfebOOcMeVEb010SsdcH8nta%3DyonMK5R7dSFOsbJ_tnre0O7w%40mail.gmail.com%3E
> The Managed Schema integration with SolrCloud seems pretty limited.
> The issue I'm running into is variants of the issue that schema changes are not pushed
to all shards/replicas synchronously.  So, for example, I can make the following two requests:
> 1) add a field to the collection on server1 using the Schema API
> 2) add a document with the new field, the document is routed to a core on server2
> Then, there appears to be a race between when the document is processed by the core on
server2 and when the core on server2, via the ZkIndexSchemaReader, gets the new schema.  If
the document is processed first, I get a 400 error because the field doesn't exist.  This
is easily reproducible by adding a sleep to the ZkIndexSchemaReader's processing.
> I hit a similar issue with Schemaless: the distributed request handler sends out the
document updates, but there is no guarantee that the other shards/replicas see the schema
changes made by the update.chain.
> Another issue I noticed today: making multiple schema API calls concurrently can block;
that is, one may get through and the other may infinite loop.
> So, for reference, the issues include:
> 1) Schema API changes return success before all cores are updated; subsequent calls attempting
to use new schema may fail
> 2) Schemaless changes may fail on replicas/other shards for the same reason
> 3) Concurrent Schema API changes may block
> From Steve Rowe on the mailing list:
> {quote}
> For Schema API users, delaying a couple of seconds after adding fields before using them
should workaround this problem.  While not ideal, I think schema field additions are rare
enough in the Solr collection lifecycle that this is not a huge problem.
> For schemaless users, the picture is worse, as you noted.  Immediate distribution of
documents triggering schema field addition could easily prove problematic.  Maybe we need
a schema update blocking mode, where after the ZK schema node watch is triggered, all new
request processing is halted until the schema is finished downloading/parsing/swapping out?
(Such a mode should help Schema API users too.)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message