spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <>
Subject RDD[URI]
Date Thu, 30 Jan 2014 16:18:12 GMT
In my Spark programming thus far my unit of work has been a single row 
from an hdfs file by creating an RDD[Array[String]] with something like:


Now, I'd like to do some work over a large collection of files in which 
the unit of work is a single file (rather than a row from a file.)  Does 
Spark anticipate users creating an RDD[URI] or RDD[File] or some such 
and supporting actions and transformations that one might want to do on 
such an RDD?  Any advice and/or code snippets would be appreciated!


View raw message