spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <philip.og...@oracle.com>
Subject RDD[URI]
Date Thu, 30 Jan 2014 16:18:12 GMT
In my Spark programming thus far my unit of work has been a single row 
from an hdfs file by creating an RDD[Array[String]] with something like:

spark.textFile(path).map(_.split("\t"))

Now, I'd like to do some work over a large collection of files in which 
the unit of work is a single file (rather than a row from a file.)  Does 
Spark anticipate users creating an RDD[URI] or RDD[File] or some such 
and supporting actions and transformations that one might want to do on 
such an RDD?  Any advice and/or code snippets would be appreciated!

Thanks,
Philip

Mime
View raw message