James, Abraham,

I apologize if I wasn't clear with my ask.  I am neither struggling with uniqueness nor wondering how to generate unique numbers with sequence.

I had two questions 

1.  Gaps in the sequence numbers - will this ever be backfilled - I got the answer - NO. It will not be.  thank you for that.

2.  There is a concept of sequence numbers that was introduced in Apache Phoenix.
     The question is - What is the use case for this sequence?

     When should one go for UUID and when should one go for sequences ?  
    
     What is the recommendation?  

If i create and generated a sequence how is it stored in HBase?  Does it automatically take care of hot-spotting?  Is there documentation around this that I can read.


Hopefully I clarified.

I sincerely thank you all for coming forward to help.

Thanks,
-ash







On Fri, May 5, 2017 at 5:46 PM, Abraham Tom <work2much@gmail.com> wrote:
in an RDBMS the debate has been greatly discussed with varying opinions

Since this is a phoenix (hbase) forum, the key will always be a string
so your performance bottleneck is the generation of the key.  If you like the incremental number solution, I would suggest the following:
A composite key where the sequence restarts daily would address your concern of running out of numbers, and help with hbase (both distribution and performance)
Use system date formatted as yyyyMMdd, cast as a bigint, multiply it by 100 billion and add your autogenerated sequence number to it.   This would allow you about 1.5 million unique entries per second.  



On Fri, May 5, 2017 at 12:15 AM, Ash N <742000@gmail.com> wrote:
Could any please help with guidance for the below or point me to any documents? 

Thanks


On May 3, 2017 1:01 AM, "Ash N" <742000@gmail.com> wrote:
John,

Thank you so much for responding.  Appreciate the link to ppt.  Something I could not find. but read about snowflake  
  I was looking for guidance on the sequence numbers vs UUID approach.

Could I use sequence numbers ?  are the gaps in the sequence numbers ever back filled?
There is not much documentation on how it works.  If some one explains, I will be more happy to update the documentation.


thanks again,
-ash

On Wed, May 3, 2017 at 12:51 AM, John Leach <jleach4@gmail.com> wrote:
Ash,

I built one a while back based on twitter’s snowflake algorithm.

Here is a link to a presentation from twitter on it…


We used it as the primary key for the table when in essence there was not a primary key (just needed uniqueness).

Good luck.

Regards,
John Leach

On May 2, 2017, at 6:46 PM, Ash N <742000@gmail.com> wrote:

Hello,

Distributed web application.  Millions of users connecting to the site.

we are receiving about 150,000 events/ sec through Kinesis Stream.
We need to store these events in a phoenix table identified by an ID the primary for the table.

what is the best way to accomplish this?

Option 1
I played with sequences and they seem to work well.  Although with lot of gaps.
will the gaps be filled at all?  if not we will run out of IDs pretty soon.

Option 2
UUIDs.

What is the best way to generate UUID's local or network?

How are folks typically handling this situation?

which route is recommended Sequences or UUIDs?

thanks,
-ash









--
Abraham Tom 
Email:   work2much@gmail.com
Phone:  415-515-3621