From mapreduce-dev-return-19050-apmail-hadoop-mapreduce-dev-archive=hadoop.apache.org@hadoop.apache.org Fri Nov 3 06:29:58 2017 Return-Path: X-Original-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44C6210E81 for ; Fri, 3 Nov 2017 06:29:58 +0000 (UTC) Received: (qmail 87459 invoked by uid 500); 3 Nov 2017 06:29:52 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 87371 invoked by uid 500); 3 Nov 2017 06:29:52 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 87358 invoked by uid 99); 3 Nov 2017 06:29:52 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Nov 2017 06:29:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 49C0ADC9C6 for ; Fri, 3 Nov 2017 06:29:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.679 X-Spam-Level: X-Spam-Status: No, score=0.679 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id M7ni1J5A15KT for ; Fri, 3 Nov 2017 06:29:49 +0000 (UTC) Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5A0345FD8B for ; Fri, 3 Nov 2017 06:29:49 +0000 (UTC) Received: by mail-wm0-f53.google.com with SMTP id b189so2840009wmd.4 for ; Thu, 02 Nov 2017 23:29:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=YrCuUMfV3ehCnOMvkb9B7wDsoE0hNRM9CnWkKm1XkTo=; b=uK0xqSUjqnjPWhpz8hU6+yQy7zcAZPPkM+7EIPh+UjJDDgFPwJhkEYjkXH6/rQNTbQ ZbtakPk8tajvWlHDWdfvtjiVkko7pNSJaptZ3Z79qLycuArMw/S4lhSbRqxyXjnwO6dc lp1Dz9w+bmnejb2wsWtIHol9Di6kyIyWdhZVPdCjdq5xxwtobOu3fDE9sDZuQI0Blwvj WItsrxkM9IBEXxsQMeV30CVSYGv075+YETgDz/eK47LfG9W004B68WNdfuZgEBTSbcQp VSqqXV4Hdcek1YuL/2+Jp6cSDMfPcifD7clDBr5gqSMmmcK1ZjAw+apS1J0sdP2CqBrz UQ2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=YrCuUMfV3ehCnOMvkb9B7wDsoE0hNRM9CnWkKm1XkTo=; b=N2SuYulLe9Np1x6WeFKKv5Y/R15cPySq4GvL52AUzfOI2b/cm8MOE3TF002wboP4ZH nv+NTzeFA3PFaH5F9u1tIk4F5IULpC4HtqKCHwRqHesEjtFSgn3dei9u1krkK/BFIDmu Mte8FdhrPPyCVg6vfJyc/F1QOqlf+OIyMjDAbBpyBfHuF4bzYJAYSq9MxweBGij/uC8A rHaUACpE8uhy/jPW91o1zeu/sJTadgd2QKawirYX/btoELGz19qbWa7ypZFQIyM1BQPR NLAx3sCKgMFtAwzlcp6dIpXbfy1iclV5V5OmVbYlQ8Ia1FpamyWB6a5EWUcvlo0OCxEj Z+jA== X-Gm-Message-State: AJaThX6Bry2x21fRJtRmFCQ6nrXYSXpIq+hNxzBuoKGN4QqdGa0+zZkG vgeAPlMeoAwb6XT3hmlkB1e/B083o2XYR0mjAoIXcA== X-Google-Smtp-Source: ABhQp+TJGX0WoVD4w5nqhYmJgvFWQcmXWCWzzNEJQZwErTrt4UbqYmC8h7opUIMRhuKikMTPs0e5o5ojTRQOM88kKPg= X-Received: by 10.28.197.65 with SMTP id v62mr3229659wmf.9.1509690588030; Thu, 02 Nov 2017 23:29:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.95.134 with HTTP; Thu, 2 Nov 2017 23:29:27 -0700 (PDT) In-Reply-To: <56ABFB15-ED2C-401B-97A1-F8620C8D391F@hortonworks.com> References: <56ABFB15-ED2C-401B-97A1-F8620C8D391F@hortonworks.com> From: Lei Xu Date: Thu, 2 Nov 2017 23:29:27 -0700 Message-ID: Subject: =?UTF-8?B?UmU6IOetlOWkjTogW0RJU0NVU1NJT05dIE1lcmdpbmcgSERGUy03MjQwIE9iamVjdCBTdA==?= =?UTF-8?B?b3JlIChPem9uZSkgdG8gdHJ1bms=?= To: Jitendra Pandey Cc: larry mccay , Anu Engineer , Steve Loughran , Yang Weiwei , "hdfs-dev@hadoop.apache.org" , "yarn-dev@hadoop.apache.org" , "mapreduce-dev@hadoop.apache.org" , "common-dev@hadoop.apache.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey, Weiwei and Jitendra Thanks a lot for this large effort to bring us ozone. * As the current state of Ozone implementation, what are the major benefits of using today=E2=80=99s Ozone over HDFS? Giving that its missing features like HDFS-12680 and HDFS-12697, being disabled by default, and the closing of Hadoop 3.0 release, should we wait for a late merge when Ozone is more mature ? Or more generally, why should this merge to a release branch happen now, when Ozone is not yet usable by users? Staying on a feature branch seems like it's still the right place to me. * For the existing HDFS user, could you address the semantic gaps between Ozone / Ozone File System and HDFS. It would be great to illustrate what is the expected use cases for Ozone giving its different architecture and design decisions? Like no append, no atomic rename and etc. * A follow question, was it able to run any of today=E2=80=99s Hadoop applications (MR, Spark, Impala, Presto and etc) on Ozone directly, or against OZoneFileSystem? I think a performance / scalability gain or extended functionality should be the prerequisites for the merge. Additionally, I believe such tests will reveal the potential caveats if any. * Ozone=E2=80=99s architecture shows great potential to address NN scalability. However it looks like a XXL effort to me, considering the fact that 1) the community had multiple unfinished attempts to simply separate namespace and block management within the same NN process, and 2) many existing features like snapshot, append, erasure coding, and etc, are not straightforward to be implemented in today=E2=80= =99s ozone design. Could you share your opinions on this matter? * How stable is the ozone client? Should we mark them as unstable for now? Also giving the significant difference between OzoneClient and HdfsClient, should move it to a separated package or even a project? I second Konstantin=E2=80=99s option to separate ozone from HDFS. * Please add sections to the end-user and system admin oriented documents for deploying and operating SCM, KSM, and also the chunk servers on DataNodes. Additionally, the introduction in =E2=80=9COZoneGettingStarted.md=E2=80=9D is still building ozone from featu= re branch HDFS-7240. Best regards, On Mon, Oct 23, 2017 at 11:10 AM, Jitendra Pandey wrote: > I have filed https://issues.apache.org/jira/browse/HDFS-12697 to ensure o= zone stays disabled in a secure environment. > Since ozone is disabled by default and will not come with security on, it= will not expose any new attack surface in a Hadoop deployment. > Ozone security effort will need a detailed design and discussion on a com= munity jira. Hopefully, that effort will start soon after the merge. > > Thanks > jitendra > > On 10/20/17, 2:40 PM, "larry mccay" wrote: > > All - > > I broke this list of questions out into a separate DISCUSS thread whe= re we > can iterate over how a security audit process at merge time might loo= k and > whether it is even something that we want to take on. > > I will try and continue discussion on that thread and drive that to s= ome > conclusion before bringing it into any particular merge discussion. > > thanks, > > --larry > > On Fri, Oct 20, 2017 at 12:37 PM, larry mccay wro= te: > > > I previously sent this same email from my work email and it doesn't= seem > > to have gone through - resending from apache account (apologizing u= p from > > for the length).... > > > > For such sizable merges in Hadoop, I would like to start doing secu= rity > > audits in order to have an initial idea of the attack surface, the > > protections available for known threats, what sort of configuration= is > > being used to launch processes, etc. > > > > I dug into the architecture documents while in the middle of this l= ist - > > nice docs! > > I do intend to try and make a generic check list like this for such > > security audits in the future so a lot of this is from that but I t= ried to > > also direct specific questions from those docs as well. > > > > 1. UIs > > I see there are at least two UIs - Storage Container Manager and Ke= y Space > > Manager. There are a number of typical vulnerabilities that we find= in UIs > > > > 1.1. What sort of validation is being done on any accepted user inp= ut? > > (pointers to code would be appreciated) > > 1.2. What explicit protections have been built in for (pointers to = code > > would be appreciated): > > 1.2.1. cross site scripting > > 1.2.2. cross site request forgery > > 1.2.3. click jacking (X-Frame-Options) > > 1.3. What sort of authentication is required for access to the UIs? > > 1.4. What authorization is available for determining who can access= what > > capabilities of the UIs for either viewing, modifying data or affec= ting > > object stores and related processes? > > 1.5. Are the UIs built with proxying in mind by leveraging X-Forwar= ded > > headers? > > 1.6. Is there any input that will ultimately be persisted in config= uration > > for executing shell commands or processes? > > 1.7. Do the UIs support the trusted proxy pattern with doas imperso= nation? > > 1.8. Is there TLS/SSL support? > > > > 2. REST APIs > > > > 2.1. Do the REST APIs support the trusted proxy pattern with doas > > impersonation capabilities? > > 2.2. What explicit protections have been built in for: > > 2.2.1. cross site scripting (XSS) > > 2.2.2. cross site request forgery (CSRF) > > 2.2.3. XML External Entity (XXE) > > 2.3. What is being used for authentication - Hadoop Auth Module? > > 2.4. Are there separate processes for the HTTP resources (UIs and R= EST > > endpoints) or are the part of existing HDFS processes? > > 2.5. Is there TLS/SSL support? > > 2.6. Are there new CLI commands and/or clients for access the REST = APIs? > > 2.7. Bucket Level API allows for setting of ACLs on a bucket - what > > authorization is required here - is there a restrictive ACL set on = creation? > > 2.8. Bucket Level API allows for deleting a bucket - I assume this = is > > dependent on ACLs based access control? > > 2.9. Bucket Level API to list bucket returns up to 1000 keys - is t= here > > paging available? > > 2.10. Storage Level APIs indicate =E2=80=9CSigned with User Authori= zation=E2=80=9D what > > does this refer to exactly? > > 2.11. Object Level APIs indicate that there is no ACL support and o= nly > > bucket owners can read and write - but there are ACL APIs on the Bu= cket > > Level are they meaningless for now? > > 2.12. How does a REST client know which Ozone Handler to connect to= or am > > I missing some well known NN type endpoint in the architecture doc > > somewhere? > > > > 3. Encryption > > > > 3.1. Is there any support for encryption of persisted data? > > 3.2. If so, is KMS and the hadoop key command used for key manageme= nt? > > > > 4. Configuration > > > > 4.1. Are there any passwords or secrets being added to configuratio= n? > > 4.2. If so, are they accessed via Configuration.getPassword() to al= low for > > provisioning in credential providers? > > 4.3. Are there any settings that are used to launch docker containe= rs or > > shell out any commands, etc? > > > > 5. HA > > > > 5.1. Are there provisions for HA? > > 5.2. Are we leveraging the existing HA capabilities in HDFS? > > 5.3. Is Storage Container Manager a SPOF? > > 5.4. I see HA listed in future work in the architecture doc - is th= is > > still an open issue? > > > > On Fri, Oct 20, 2017 at 11:19 AM, Anu Engineer > > wrote: > > > >> Hi Steve, > >> > >> In addition to everything Weiwei mentioned (chapter 3 of user guid= e), if > >> you really want to drill down to REST protocol you might want to a= pply this > >> patch and build ozone. > >> > >> https://issues.apache.org/jira/browse/HDFS-12690 > >> > >> This will generate an Open API (https://www.openapis.org , > >> http://swagger.io) based specification which can be accessed from = KSM UI > >> or just as a json file. > >> Unfortunately, this patch is still at code review stage, so you wi= ll have > >> to apply the patch and build it yourself. > >> > >> Thanks > >> Anu > >> > >> > >> On 10/20/17, 6:09 AM, "Yang Weiwei" wrote= : > >> > >> Hi Steve > >> > >> > >> The code is available in HDFS-7240 feature branch, public git = repo > >> here. > >> > >> I am not sure if there is a "public" API for object stores, bu= t the > >> design doc >> 9/ozone_user_v0.pdf> uses most common syntax so I believe it shoul= d be > >> compliance. You can find the rest API doc here >> /hadoop/blob/HDFS-7240/hadoop-hdfs-project/hadoop-hdfs/src/ > >> site/markdown/OzoneRest.md> (with some example usages), and comman= dline > >> API here >> hdfs-project/hadoop-hdfs/src/site/markdown/OzoneCommandShell.md>. > >> > >> > >> Look forward for your feedback! > >> > >> > >> --Weiwei > >> > >> > >> ________________________________ > >> =E5=8F=91=E4=BB=B6=E4=BA=BA: Steve Loughran > >> =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2017=E5=B9=B410=E6=9C=88= 20=E6=97=A5 11:49 > >> =E6=94=B6=E4=BB=B6=E4=BA=BA: Yang Weiwei > >> =E6=8A=84=E9=80=81: hdfs-dev@hadoop.apache.org; mapreduce-dev@= hadoop.apache.org; > >> yarn-dev@hadoop.apache.org; common-dev@hadoop.apache.org > >> =E4=B8=BB=E9=A2=98: Re: [DISCUSSION] Merging HDFS-7240 Object = Store (Ozone) to trunk > >> > >> > >> Wow, big piece of work > >> > >> 1. Where is a PR/branch on github with rendered docs for us to= look > >> at? > >> 2. Have you made any public APi changes related to object stor= es? > >> That's probably something I'll have opinions on more than implemen= tation > >> details. > >> > >> thanks > >> > >> > On 19 Oct 2017, at 02:54, Yang Weiwei > >> wrote: > >> > > >> > Hello everyone, > >> > > >> > > >> > I would like to start this thread to discuss merging Ozone > >> (HDFS-7240) to trunk. This feature implements an object store whic= h can > >> co-exist with HDFS. Ozone is disabled by default. We have tested O= zone with > >> cluster sizes varying from 1 to 100 data nodes. > >> > > >> > > >> > > >> > The merge payload includes the following: > >> > > >> > 1. All services, management scripts > >> > 2. Object store APIs, exposed via both REST and RPC > >> > 3. Master service UIs, command line interfaces > >> > 4. Pluggable pipeline Integration > >> > 5. Ozone File System (Hadoop compatible file system > >> implementation, passes all FileSystem contract tests) > >> > 6. Corona - a load generator for Ozone. > >> > 7. Essential documentation added to Hadoop site. > >> > 8. Version specific Ozone Documentation, accessible via se= rvice > >> UI. > >> > 9. Docker support for ozone, which enables faster developm= ent > >> cycles. > >> > > >> > > >> > To build Ozone and run ozone using docker, please follow > >> instructions in this wiki page. https://cwiki.apache.org/confl > >> uence/display/HADOOP/Dev+cluster+with+docker. > >> Dev cluster with docker - Hadoop - Apache Software Foundation< > >> https://cwiki.apache.org/confluence/display/HADOO > >> P/Dev+cluster+with+docker> > >> cwiki.apache.org > >> First, it uses a much more smaller common image which doesn't > >> contains Hadoop. Second, the real Hadoop should be built from the = source > >> and the dist director should be ... > >> > >> > >> > >> > > >> > > >> > We have built a passionate and diverse community to drive th= is > >> feature development. As a team, we have achieved significant progr= ess in > >> past 3 years since first JIRA for HDFS-7240 was opened on Oct 2014= . So far, > >> we have resolved almost 400 JIRAs by 20+ contributors/committers f= rom > >> different countries and affiliations. We also want to thank the la= rge > >> number of community members who were supportive of our efforts and > >> contributed ideas and participated in the design of ozone. > >> > > >> > > >> > Please share your thoughts, thanks! > >> > > >> > > >> > -- Weiwei Yang > >> > >> > >> > >> > >> ------------------------------------------------------------------= --- > >> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org > >> For additional commands, e-mail: common-dev-help@hadoop.apache.org > >> > > > > > > --=20 Lei (Eddy) Xu Software Engineer, Cloudera --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org