I know little about your use case.

Did you mean that your data is relatively evenly distributed in Spark domain but showed skew in the bulk load phase ?

On Fri, Feb 26, 2016 at 9:02 AM, Renu Yadav <yrenu21@gmail.com> wrote:

Hi Ted,

Thanks for the reply. I am using spark hbase module only but the problem is when I do the bulk load it shows data skew and takes time to create the hfile.

On 26 Feb 2016 10:25 p.m., "Ted Yu" <yuzhihong@gmail.com> wrote:
In hbase, there is hbase-spark module which supports bulk load.
This module is to be backported in the upcoming 1.3.0 release.

There is some pending work, such as HBASE-15271 .

FYI

On Fri, Feb 26, 2016 at 8:50 AM, Renu Yadav <yrenu21@gmail.com> wrote:
Has anybody implemented bulk load into hbase using spark?

I need help to optimize its performance.

Please help.


Thanks & Regards,
Renu Yadav