spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <>
Subject Re: Schema view of HadoopRDD
Date Thu, 08 May 2014 05:21:20 GMT

For each line that we read as textLine from HDFS, we have a schema..if
there is an API that takes the schema as List[Symbol] and maps each token
to the Symbol it will be helpful...

One solution is to keep data on hdfs as avro/protobuf serialized objects
but not sure if that works on HBase input...we are testing HDFS right now
but finally we will read from a persistent store like basically
the immutableBytes need to be converted to a schema view as well incase we
don't want to write the whole row as a protobuf...

Does RDDs provide a schema view of the dataset on HDFS / HBase ?


View raw message