spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: S3 Zip File Loading Advice
Date Wed, 09 Mar 2016 08:39:02 GMT

Oozie may be able to do this for you and integrate with Spark.

> On 09 Mar 2016, at 06:03, Benjamin Kim <bbuild11@gmail.com> wrote:
> 
> I am wondering if anyone can help.
> 
> Our company stores zipped CSV files in S3, which has been a big headache from the start.
I was wondering if anyone has created a way to iterate through several subdirectories (s3n://events/2016/03/01/00,
s3n://2016/03/01/01, etc.) in S3 to find the newest files and load them. It would be a big
bonus to include the unzipping of the file in the process so that the CSV can be loaded directly
into a dataframe for further processing. I’m pretty sure that the S3 part of this request
is not uncommon. I would think the file being zipped is uncommon. If anyone can help, I would
truly be grateful for I am new to Scala and Spark. This would be a great help in learning.
> 
> Thanks,
> Ben
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message