hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11620) Add Support for Load Balancing across a group of KMS servers for HA
Date Sun, 22 Feb 2015 02:14:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14331990#comment-14331990

Arun Suresh commented on HADOOP-11620:

[~lmccay], you make very valid comments. Let me try to address them

bq. .. why is it that we often put the burden of the loadbalancing on the clients in Hadoop
rather than put the servers behind a loadbalancing process or VIP ?
I believe that having the clients manage the load balancing makes the system easier to deploy,
test and manage since there is one less component (the VIP) to configure and handle. Also
given that without a KMS, encrypted data cannot be accessed, I feel that basic High availability
should be part of the core library and deployable without the need for external loadbalancers
/ VIPS. Administrators can then decide to use a custom LB or the base client based on specific
deployment environment considerations.

bq.. .. makes it difficult to provide elastic provisioning of server instances since all the
clients config would need to be made aware of the changes.
I agree that my current implementation and yes, even the core RM and NN clients do suffer
from this. But this is something I feel can be fixed, considering the fact that the Zookeeper
Curator library is being used in hadoop-common, and curator does come with a robust library
for [Dynamic Service Scaling|http://curator.apache.org/curator-x-discovery/] 
I was infact planning on filing a follow-up JIRA for KMS (but yes, looks like it might have
more general applicability)

The reason I did not want to introduce dynamic scaling in this patch was that I was more interested
in High Availability, rather than improved read-scalability where I need atleast 1 KMS up
else encrypted data becomes unreadable which implies that for current deployment scenarios,
I was not expecting more than 2 or 3 KMS to participate in the loadbalancing group. Also the
simple round-robin load balancing would allow the caches in all participating KMSs to be warmed
over a period of time. Like I mentioned, I do plan to work on the dynamic scaling once I hit
the read-scalability wall.

> Add Support for Load Balancing across a group of KMS servers for HA
> -------------------------------------------------------------------
>                 Key: HADOOP-11620
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11620
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: kms
>    Affects Versions: 2.6.0
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: HADOOP-11620.1.patch, HADOOP-11620.2.patch
> This patch needs to add support for :
> * specification of multiple hostnames in the kms key provider uri
> * KMS client to load balance requests across the hosts specified in the kms keyprovider

This message was sent by Atlassian JIRA

View raw message