spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Flint <sam.fl...@magnetic.com>
Subject NEW to spark and sparksql
Date Wed, 19 Nov 2014 21:02:51 GMT
Hi,

    I am new to spark.  I have began to read to understand sparks RDD files
as well as SparkSQL.  My question is more on how to build out the RDD files
and best practices.   I have data that is broken down by hour into files on
HDFS in avro format.   Do I need to create a separate RDD for each file? or
using SparkSQL a separate SchemaRDD?

I want to be able to pull lets say an entire day of data into spark and run
some analytics on it.  Then possibly a week, a month, etc.


If there is documentation on this procedure or best practives for building
RDD's please point me at them.

Thanks for your time,
   Sam

Mime
View raw message