spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From innowireless TaeYun Kim <>
Subject RE: Bulk-load to HBase
Date Fri, 19 Sep 2014 11:17:33 GMT


After reading several documents, it seems that saveAsHadoopDataset cannot
use HFileOutputFormat.

It's because saveAsHadoopDataset method uses JobConf, so it belongs to the
old Hadoop API, while HFileOutputFormat is a member of mapreduce package
which is for the new Hadoop API.


Am I right?

If so, is there another method to bulk-load to HBase from RDD?




From: innowireless TaeYun Kim [] 
Sent: Friday, September 19, 2014 7:17 PM
Subject: Bulk-load to HBase




Is there a way to bulk-load to HBase from RDD?

HBase offers HFileOutputFormat class for bulk loading by MapReduce job, but
I cannot figure out how to use it with saveAsHadoopDataset.



View raw message