hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10278) Refactor to make CallQueue pluggable
Date Tue, 18 Feb 2014 18:24:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904363#comment-13904363

Daryn Sharp commented on HADOOP-10278:

Up front, my requirement is I don't want a feature that will be rarely used to add additional
lock contention during normal operation.  I'm -1 to more locking.

bq.  LinkedBlockingQueue separates its lock into only two locks, put side and get side, so
that congesion in each side causes context switches. Moreover the separation of the lock is
ineffective when there are almost always empty in the queue, and collision between put and
get causes context switches.

I'm not sure I understand.  Context switches are inevitable if the queue is empty.  Under
normal conditions the put/take lock reduces contention.

bq. An issue arises when production > consumption: 
I thought about that case with the original analysis but I didn't want the post to be too
big.  If the NN is under heavy load, the callq is full, and the user uses a new queue is of
a much smaller size - I'd argue that's a case of user error.  The draining thread will compete
with new calls but should eventually complete.

bq. The atomic reference is not needed for the queue and volatile is enough if you only set
and get. Volatile variables may have much chance to be optimized by VM because volatile is
within the language specification. 

AtomicRef is just a wrapper around a volatile (by nature can't be optimized) with CAS methods.
 I like the fact AtomicRef provides an obvious hint that the value is going to be changed,
plus we should be using CAS to swap the queue ref to prevent races.

So how about something like this rough idea:
# CallQueueManager (or whatever it's called) maintains a putRef and takeRef, initially set
to the same queue
# for a swap, use CAS to update the putRef with the new callq so readers begin filling it
- note it's not yet being serviced by handlers
# now wait for handlers to drain all the calls in takeRef's old callq (1) - periodically check
# finally update takeRef to point to the new callq

(1)There's a race where readers may drop in 1 more call after the callq swapped.  Using isEmpty()
is imperfect due to the race.  I dislike time-based solutions, but if handlers are polling
for a few seconds, that should be plenty of time to get straggler calls before the handlers
switch to the new/current queue.

> Refactor to make CallQueue pluggable
> ------------------------------------
>                 Key: HADOOP-10278
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10278
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc
>            Reporter: Chris Li
>         Attachments: HADOOP-10278-atomicref-adapter.patch, HADOOP-10278-atomicref-adapter.patch,
HADOOP-10278-atomicref-rwlock.patch, HADOOP-10278-atomicref.patch, HADOOP-10278-atomicref.patch,
HADOOP-10278-atomicref.patch, HADOOP-10278-atomicref.patch, HADOOP-10278.patch, HADOOP-10278.patch
> * Refactor CallQueue into an interface, base, and default implementation that matches
today's behavior
> * Make the call queue impl configurable, keyed on port so that we minimize coupling

This message was sent by Atlassian JIRA

View raw message