spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <>
Subject Re: NEW to spark and sparksql
Date Wed, 19 Nov 2014 21:45:28 GMT
In general you should be able to read full directories of files as a single
RDD/SchemaRDD.  For documentation I'd suggest the programming guides:

For Avro in particular, I have been working on a library for Spark SQL.
Its very early code, but you can find it here:

Bug reports welcome!


On Wed, Nov 19, 2014 at 1:02 PM, Sam Flint <> wrote:

> Hi,
>     I am new to spark.  I have began to read to understand sparks RDD
> files as well as SparkSQL.  My question is more on how to build out the RDD
> files and best practices.   I have data that is broken down by hour into
> files on HDFS in avro format.   Do I need to create a separate RDD for each
> file? or using SparkSQL a separate SchemaRDD?
> I want to be able to pull lets say an entire day of data into spark and
> run some analytics on it.  Then possibly a week, a month, etc.
> If there is documentation on this procedure or best practives for building
> RDD's please point me at them.
> Thanks for your time,
>    Sam

View raw message