hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yeshwanth kumar <yeshwant...@gmail.com>
Subject Re: Spark HBase Bulk load using HFileFormat
Date Thu, 14 Jul 2016 06:33:37 GMT
following is the code snippet for saveASHFile

def saveAsHFile(putRDD: RDD[(ImmutableBytesWritable, KeyValue)],
outputPath: String) = {
  val conf = ConfigFactory.getConf
  val job = Job.getInstance(conf, "HBaseBulkPut")
  job.setMapOutputKeyClass(classOf[ImmutableBytesWritable])
  job.setMapOutputValueClass(classOf[Put])
  val connection = ConnectionFactory.createConnection(conf)
  val stTable= connection.getTable(TableName.valueOf("strecords"))
  val regionLocator = new
HRegionLocator(TableName.valueOf("strecords"),
connection.asInstanceOf[ClusterConnection])
  HFileOutputFormat2.configureIncrementalLoad(job, stTable, regionLocator)

  putRDD.saveAsNewAPIHadoopFile(
    outputPath,
    classOf[ImmutableBytesWritable],
    classOf[Put],
    classOf[HFileOutputFormat2],
    conf)
}

i just saw that i am using   job.setMapOutputValueClass(classOf[Put])

where as i am writing KeyValue, does that cause any issue?

i will update the code and will run it,

can you suggest me sorting on partitions.

Thanks,

Yeshwanth


-Yeshwanth
Can you Imagine what I would do if I could do all I can - Art of War

On Wed, Jul 13, 2016 at 7:46 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Can you show the code inside saveASHFile ?
>
> Maybe the partitions of the RDD need to be sorted (for 1st issue).
>
> Cheers
>
> On Wed, Jul 13, 2016 at 4:29 PM, yeshwanth kumar <yeshwanth43@gmail.com>
> wrote:
>
> > Hi i am doing bulk load into HBase as HFileFormat, by
> > using saveAsNewAPIHadoopFile
> >
> > i am on HBase 1.2.0-cdh5.7.0 and spark 1.6
> >
> > when i try to write i am getting an exception
> >
> >  java.io.IOException: Added a key not lexically larger than previous.
> >
> > following is the code snippet
> >
> > case class HBaseRow(rowKey: ImmutableBytesWritable, kv: KeyValue)
> >
> > val kAvroDF =
> > sqlContext.read.format("com.databricks.spark.avro").load(args(0))
> > val kRDD = kAvroDF.select("seqid", "mi", "moc", "FID", "WID").rdd
> > val trRDD = kRDD.map(a => preparePUT(a(1).asInstanceOf[String],
> > a(3).asInstanceOf[String]))
> > val kvRDD = trRDD.flatMap(a => a).map(a => (a.rowKey, a.kv))
> > saveAsHFile(kvRDD, args(1))
> >
> >
> > prepare put returns a list of HBaseRow( ImmutableBytesWritable,KeyValue)
> > sorted on KeyValue, where i do a flat map on the rdd and
> > prepare a RDD(ImmutableBytesWritable,KeyValue) and pass it to saveASHFile
> >
> > i tried using Put api,
> > it throws
> >
> > java.lang.Exception: java.lang.ClassCastException:
> > org.apache.hadoop.hbase.client.Put cannot be cast to
> > org.apache.hadoop.hbase.Cell
> >
> >
> > is there any i can skip using KeyValue Api,
> > and do the bulk load into HBase?
> > please help me in resolving this issue,
> >
> > Thanks,
> > -Yeshwanth
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message