lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomás Fernández Löbbe (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs
Date Fri, 08 Dec 2017 20:10:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284147#comment-16284147
] 

Tomás Fernández Löbbe commented on SOLR-11739:
----------------------------------------------

I thought about three options
1. Fix the actual race condition, don't let duplicate async IDs at all.
2. Fix the Overseer so that it checks before running each task if one with the same ID was
completed before.
3. Let the Overseer re-run the tasks (leave it as it is now). Maybe just add logging, or a
way to show the error (failed tasks)

#3 can be dangerous, since the task could be something like a DELETEREPLICA. If the duplicate
ID was caused by some broken retry logic on the client side, Solr could be deleting many replicas
with what the client thought was a single command. 

#2 may be OK, the problem I see with that is that it gives an inconsistent behavior to the
user (sometimes the duplicate IDs are rejected, and sometimes not). Also, this would make
the Overseer silently drop tasks (yes, we can add some sort of failure in the logs but we
can’t assume anyone is going to notice). 

#1 is the correct fix from the functional stand point, however I can’t think of a way to
really fix the race condition without adding an extra write to ZooKeeper, which we’d have
to do for every collection request with an asyncID. And this is to cover from a client misuse
edge case. 

I think (and I discussed this offline with [~anshumg], he thinks this too) #1 is the way to
go. I’ll put up a patch.

> Solr can accept duplicated async IDs
> ------------------------------------
>
>                 Key: SOLR-11739
>                 URL: https://issues.apache.org/jira/browse/SOLR-11739
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Tomás Fernández Löbbe
>            Priority: Minor
>         Attachments: SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs are sent
fast enough, a race condition in Solr will let the repeated IDs through. The duplicated task
is ran and and then silently fails to report as completed because the same async ID is already
in the completed map. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message