hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Spark HBase Bulk load using HFileFormat
Date Thu, 14 Jul 2016 00:46:09 GMT
Can you show the code inside saveASHFile ?

Maybe the partitions of the RDD need to be sorted (for 1st issue).

Cheers

On Wed, Jul 13, 2016 at 4:29 PM, yeshwanth kumar <yeshwanth43@gmail.com>
wrote:

> Hi i am doing bulk load into HBase as HFileFormat, by
> using saveAsNewAPIHadoopFile
>
> i am on HBase 1.2.0-cdh5.7.0 and spark 1.6
>
> when i try to write i am getting an exception
>
>  java.io.IOException: Added a key not lexically larger than previous.
>
> following is the code snippet
>
> case class HBaseRow(rowKey: ImmutableBytesWritable, kv: KeyValue)
>
> val kAvroDF =
> sqlContext.read.format("com.databricks.spark.avro").load(args(0))
> val kRDD = kAvroDF.select("seqid", "mi", "moc", "FID", "WID").rdd
> val trRDD = kRDD.map(a => preparePUT(a(1).asInstanceOf[String],
> a(3).asInstanceOf[String]))
> val kvRDD = trRDD.flatMap(a => a).map(a => (a.rowKey, a.kv))
> saveAsHFile(kvRDD, args(1))
>
>
> prepare put returns a list of HBaseRow( ImmutableBytesWritable,KeyValue)
> sorted on KeyValue, where i do a flat map on the rdd and
> prepare a RDD(ImmutableBytesWritable,KeyValue) and pass it to saveASHFile
>
> i tried using Put api,
> it throws
>
> java.lang.Exception: java.lang.ClassCastException:
> org.apache.hadoop.hbase.client.Put cannot be cast to
> org.apache.hadoop.hbase.Cell
>
>
> is there any i can skip using KeyValue Api,
> and do the bulk load into HBase?
> please help me in resolving this issue,
>
> Thanks,
> -Yeshwanth
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message