lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Blum (JIRA)" <>
Subject [jira] [Commented] (SOLR-6760) New optimized DistributedQueue implementation for overseer
Date Wed, 12 Aug 2015 16:46:46 GMT


Scott Blum commented on SOLR-6760:

[~noble.paul] I feel like the API and implementation of DistributedQueue represents a pretty
clean, cohesive, and general API.  This is evidenced by the fact that most of the existing
places we were using DQ "just work".

DistributedQueueExt represents what I feel like is kind of crap that was glommed on to support
the collection task queue, specifically.  You have methods like containsTaskWithRequestId()
that are highly specific to the collection task queue, the strange QueueEvent and response-prefix
stuff that I don't even understand what it's supposed to do, getTailId() to peek at the end
of the queue with unclear semantics (is it good enough to answer with the end of the in-memory
queue, or does the caller expect a synchronous read-through into ZK?), and a remove method
that doesn't operate on the head of the queue.  In addition to the unclear semantics on some
of these, the implementations of some of them necessarily break the clean model DQ uses and
are in some cases FAR less efficient -- containsTaskWithRequestId for example has to not only
fetch the entire list from ZK, it then has to actually read all the data nodes.

Suffice it to say I don't think anything in there is good enough to promote into the general
purpose DQ.  Maybe the core issue is that the collection work queue is fundamentally looking
for something more, like a distributed task queue.  I think someone should go back and analyze
the true needs there and figure out if there's something better we can do.

> New optimized DistributedQueue implementation for overseer
> ----------------------------------------------------------
>                 Key: SOLR-6760
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>         Attachments: SOLR-6760.patch, deadlock.patch
> Currently the DQ works as follows
> * read all items in the directory
> * sort them all 
> * take the head and return it and discard everything else
> * rinse and repeat
> This works well when we have only a handful of items in the Queue. If the items in the
queue is much larger (in tens of thousands) , this is counterproductive
> As the overseer queue is a multiple producers + single consumer queue, We can read them
all in bulk  and before processing each item , just do a zk.exists(itemname) and if all is
well we don't need to do the fetch all + sort thing again

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message