spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Elliston Ball <si...@simonellistonball.com>
Subject Re: HDFS not supported by databricks cloud :-(
Date Tue, 16 Jun 2015 20:06:01 GMT
You could consider using Zeppelin and spark on yarn as an alternative. http://zeppelin.incubator.apache.org/

Simon

> On 16 Jun 2015, at 17:58, Sanjay Subramanian <sanjaysubramanian@yahoo.com.INVALID>
wrote:
> 
> hey guys
> 
> After day one at the spark-summit SFO, I realized sadly that (indeed) HDFS is not supported
by Databricks cloud.
> My speed bottleneck is to transfer ~1TB of snapshot HDFS data (250+ external hive tables)
to S3 :-( 
> 
> I want to use databricks cloud but this to me is a starting disabler.
> The hard road for me will be (as I believe EVERYTHING is possible. The impossible just
takes longer) 
> - transfer all HDFS to S3
> - our org does not permit AWS server side encryption so I have figure out if AWS KMS
encrypted S3 files can be read by Hive/Impala/Spark  
> - modify all table locations in metadata to S3
> - modify all scripts to point and write to S3 instead of   
> 
> Any ideas / thoughts will be helpful.
> 
> Till I can get the above figured out , I am going ahead and working hard to make spark-sql
as the main workhorse for creating dataset (now its Hive and Impala)
> 
> 
> thanks
> regards
> 
> sanjay
>  
> 

Mime
View raw message