spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <msegel_had...@hotmail.com>
Subject Re: Any documentation on Spark's security model beyond YARN?
Date Fri, 01 Apr 2016 12:23:11 GMT
Guys, 

Getting a bit off topic.  

Saying Security and HBase in the same sentence is a bit of a joke until HBase rejiggers its
co-processers. Although’s Andrew’s fix could be enough to keep CSOs and their minions
happy.

The larger picture is that Security has to stop being a ‘second thought’.  Once you start
getting in to restricted and highly restricted data, you will have issues and anything you
can do to stop leakage or the potential of leakage would be great. 

Getting back to spark specifically, you have components like the Thrift Service which can
persist RDDs and I don’t see any restrictions on access. 

Does this mean integration w Ranger or Sentry? Does it mean rolling a separate solution? 

And if you’re going to look at Thrift, do you want to look at other potential areas as well?


Please note: This may all be for nothing. It may be just having the discussion and coming
to a conclusion as to the potential risks and how to mitigate is enough. 

Thx

-Mike

> On Mar 31, 2016, at 6:32 AM, Steve Loughran <stevel@hortonworks.com> wrote:
> 
>> 
>> On 30 Mar 2016, at 21:02, Sean Busbey <busbey@cloudera.com> wrote:
>> 
>> On Wed, Mar 30, 2016 at 4:33 AM, Steve Loughran <stevel@hortonworks.com> wrote:
>>> 
>>>> On 29 Mar 2016, at 22:19, Michael Segel <msegel_hadoop@hotmail.com>
wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> So yeah, I know that Spark jobs running on a Hadoop cluster will inherit
its security from the underlying YARN job.
>>>> However… that’s not really saying much when you think about some use
cases.
>>>> 
>>>> Like using the thrift service …
>>>> 
>>>> I’m wondering what else is new and what people have been thinking about
how to enhance spark’s security.
>>>> 
>>> 
>>> Been thinking a bit.
>>> 
>>> One thing to look at is renewal of hbase and hive tokens on long-lived services,
alongside hdfs
>>> 
>>> 
>> 
>> I've been looking at this as well. The current work-around I'm using
>> is to use keytab logins on the executors, which is less than
>> desirable.
> 
> 
> OK, let's work together on this ... the current spark renewal code assumes its only for
HDFS (indeed, that the filesystem is HDFS and therefore the #of tokens > 0); there' s no
fundamental reason why the code in YarnSparkHadoopUtils can't run in the AM too.
> 
>> 
>> Since the HBase project maintains Spark integration points, it'd be
>> great if there were just a hook for services to provide "here's how to
>> renew" to a common renewal service.
>> 
> 
> 1. Wittenauer is doing some work on a tool for doing this; I'm pushing for it to be a
fairly generic API. Even if Spark has to use reflection to get at it, at least it would be
consistent across services. See https://issues.apache.org/jira/browse/HADOOP-12563 <https://issues.apache.org/jira/browse/HADOOP-12563>
> 
> 2. The topic of HTTPS based acquisition/use of HDFS tokens has arisen elsewhere; needed
for long-haul job submission when  you don' t have a keytab to hand. This could be useful
as it'd avoid actually needing hbase-*.jar on the classpath at submit time.
> 
> 
>> 
>> 
>> -- 
>> busbey
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <mailto:dev-unsubscribe@spark.apache.org>
>> For additional commands, e-mail: dev-help@spark.apache.org <mailto:dev-help@spark.apache.org>
>> 
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <mailto:dev-unsubscribe@spark.apache.org>
> For additional commands, e-mail: dev-help@spark.apache.org <mailto:dev-help@spark.apache.org>

Mime
View raw message