spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <mayur.rust...@gmail.com>
Subject Re: Schema view of HadoopRDD
Date Fri, 16 May 2014 08:57:51 GMT
I guess what you are trying to do is get a columnar projection on your
data, sparksql maybe a good option for you (especially if your data is
sparse & good for columnar projection).
If you are looking to work with simple key value then you are better off
using Hbase input reader in hadoopIO  & get a pairRDD.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, May 8, 2014 at 10:51 AM, Debasish Das <debasish.das83@gmail.com>wrote:

> Hi,
>
> For each line that we read as textLine from HDFS, we have a schema..if
> there is an API that takes the schema as List[Symbol] and maps each token
> to the Symbol it will be helpful...
>
> One solution is to keep data on hdfs as avro/protobuf serialized objects
> but not sure if that works on HBase input...we are testing HDFS right now
> but finally we will read from a persistent store like hbase...so basically
> the immutableBytes need to be converted to a schema view as well incase we
> don't want to write the whole row as a protobuf...
>
> Does RDDs provide a schema view of the dataset on HDFS / HBase ?
>
> Thanks.
> Deb
>
>

Mime
View raw message