hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anu Engineer <aengin...@hortonworks.com>
Subject Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby
Date Thu, 06 Dec 2018 19:08:19 GMT
Hi Daryn,

I have just started reading the patch. Hence my apologies if my question has a response somewhere
hidden in the patch.

Are you concerned that FSEditLock is taken in GlobalStateIdContext on Server side, and worried
that a malicious or stupid client would 
cause this lock to be held up for a long time?

How do retriable exceptions help? Wouldn’t the system eventually hold the lock similarly?

I am asking to understand this better so that I get a better sense when I am reading the code.


On 12/6/18, 10:38 AM, "Daryn Sharp" <daryn@oath.com.INVALID> wrote:

    -1 pending additional info.  After a cursory scan, I have serious concerns
    regarding the design.  This seems like a feature that should have been
    purely implemented in hdfs w/o touching the common IPC layer.
    The biggest issue in the alignment context.  It's purpose appears to be for
    allowing handlers to reinsert calls back into the call queue.  That's
    completely unacceptable.  A buggy or malicious client can easily cause
    livelock in the IPC layer with handlers only looping on calls that never
    satisfy the condition.  Why is this not implemented via RetriableExceptions?
    On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <yzhang@cloudera.com.invalid>
    > Great work guys.
    > Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
    > is not needed for the feature to complete?
    > 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
    > know about ObserverNodes trying to convert them to SBNs.
    > Thanks.
    > --Yongjun
    > On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <shv.hadoop@gmail.com>
    > wrote:
    > > Hi Hadoop developers,
    > >
    > > I would like to propose to merge to trunk the feature branch HDFS-12943
    > for
    > > Consistent Reads from Standby Node. The feature is intended to scale read
    > > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
    > > NameNode. We should be able to accommodate higher overall RPC workloads
    > (up
    > > to 4x by some estimates) by adding multiple ObserverNodes.
    > >
    > > The main functionality has been implemented see sub-tasks of HDFS-12943.
    > > We followed up with the test plan. Testing was done on two independent
    > > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
    > > We ran standard HDFS commands, MR jobs, admin commands including manual
    > > failover.
    > > We know of one cluster running this feature in production.
    > >
    > > There are a few outstanding issues:
    > > 1. Need to provide proper documentation - a user guide for the new
    > feature
    > > 2. Need to fix automatic failover with ZKFC. Currently it does not
    > doesn't
    > > know about ObserverNodes trying to convert them to SBNs.
    > > 3. Scale testing and performance fine-tuning
    > > 4. As testing progresses, we continue fixing non-critical bugs like
    > > HDFS-14116.
    > >
    > > I attached a unified patch to the umbrella jira for the review and
    > Jenkins
    > > build.
    > > Please vote on this thread. The vote will run for 7 days until Wed Dec
    > 12.
    > >
    > > Thanks,
    > > --Konstantin
    > >

View raw message