spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Subramanian <>
Subject HDFS not supported by databricks cloud :-(
Date Tue, 16 Jun 2015 16:58:52 GMT
hey guys
After day one at the spark-summit SFO, I realized sadly that (indeed) HDFS is not supported
by Databricks cloud.My speed bottleneck is to transfer ~1TB of snapshot HDFS data (250+ external hive
tables) to S3 :-( 
I want to use databricks cloud but this to me is a starting disabler.The hard road for me
will be (as I believe EVERYTHING is possible. The impossible just takes longer) - transfer
all HDFS to S3- our org does not permit AWS server side encryption so I have figure out if
AWS KMS encrypted S3 files can be read by Hive/Impala/Spark  - modify all table locations
in metadata to S3- modify all scripts to point and write to S3 instead of   
Any ideas / thoughts will be helpful.
Till I can get the above figured out , I am going ahead and working hard to make spark-sql
as the main workhorse for creating dataset (now its Hive and Impala)


View raw message