spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yeshwanth kumar <yeshwant...@gmail.com>
Subject Re: How to generate a sequential key in rdd across executors
Date Wed, 03 Aug 2016 16:35:56 GMT
Hi Andrew,

Hfileoutputformat2 needs the hbase keys to be sorted in lexicographically.

as per your suggestion  timestamp + hashed key, i might end up doing a sort
on the rdd.
which i want to avoid.

if i could generate a sequential key, i don't need to do a sort, i could
just write after processing the data into hfiles.

can you explain me how can i generate a sequential key.

Thanks,
Yesh




On Sat, Jul 23, 2016 at 11:24 PM, Andrew Ehrlich <andrew@aehrlich.com>
wrote:

> It’s hard to do in a distributed system. Maybe try generating a meaningful
> key using a timestamp + hashed unique key fields in the record?
>
> > On Jul 23, 2016, at 7:53 PM, yeshwanth kumar <yeshwanth43@gmail.com>
> wrote:
> >
> > Hi,
> >
> > i am doing bulk load to hbase using spark,
> > in which i need to generate a sequential key for each record,
> > the key should be sequential across all the executors.
> >
> > i tried zipwith index, didn't worked because zipwith index gives index
> per executor not across all executors.
> >
> > looking for some suggestions.
> >
> >
> > Thanks,
> > -Yeshwanth
>
>

Mime
View raw message