spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: S3 Zip File Loading Advice
Date Wed, 09 Mar 2016 08:39:02 GMT

Oozie may be able to do this for you and integrate with Spark.

> On 09 Mar 2016, at 06:03, Benjamin Kim <> wrote:
> I am wondering if anyone can help.
> Our company stores zipped CSV files in S3, which has been a big headache from the start.
I was wondering if anyone has created a way to iterate through several subdirectories (s3n://events/2016/03/01/00,
s3n://2016/03/01/01, etc.) in S3 to find the newest files and load them. It would be a big
bonus to include the unzipping of the file in the process so that the CSV can be loaded directly
into a dataframe for further processing. I’m pretty sure that the S3 part of this request
is not uncommon. I would think the file being zipped is uncommon. If anyone can help, I would
truly be grateful for I am new to Scala and Spark. This would be a great help in learning.
> Thanks,
> Ben
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message