spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <>
Subject Re: Schema view of HadoopRDD
Date Fri, 16 May 2014 08:57:51 GMT
I guess what you are trying to do is get a columnar projection on your
data, sparksql maybe a good option for you (especially if your data is
sparse & good for columnar projection).
If you are looking to work with simple key value then you are better off
using Hbase input reader in hadoopIO  & get a pairRDD.

Mayur Rustagi
Ph: +1 (760) 203 3257
@mayur_rustagi <>

On Thu, May 8, 2014 at 10:51 AM, Debasish Das <>wrote:

> Hi,
> For each line that we read as textLine from HDFS, we have a schema..if
> there is an API that takes the schema as List[Symbol] and maps each token
> to the Symbol it will be helpful...
> One solution is to keep data on hdfs as avro/protobuf serialized objects
> but not sure if that works on HBase input...we are testing HDFS right now
> but finally we will read from a persistent store like basically
> the immutableBytes need to be converted to a schema view as well incase we
> don't want to write the whole row as a protobuf...
> Does RDDs provide a schema view of the dataset on HDFS / HBase ?
> Thanks.
> Deb

View raw message