hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Huang <jason.hu...@icare.com>
Subject Re: HBase table row key design question.
Date Tue, 02 Oct 2012 21:38:19 GMT
Thanks Mohammad.

The issue about phone number is that it tends to change over time and
we think name and DOB are more reliable. SSN is more unique but the
issue is that we can't force the user to provide it. Basically we have
limited information that can be used.



On Tue, Oct 2, 2012 at 3:30 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
> Hello Sir,
>      Although we should always try to keep the rowkey length as less as
> possible, but still a short key that doesn't help much in faster data
> access is also of no use. So, it totally depends on that particular use
> case. However, in your case, how about using "phone number" as the rowkey??
> Since it is always unique, you will always get the correct result with much
> shorter rowkey. It's just that in this case you will have to ask for the
> user's phone number instead of name and DOB.
> Regards,
>     Mohammad Tariq
> On Tue, Oct 2, 2012 at 7:58 PM, Jason Huang <jason.huang@icare.com> wrote:
>> Hello,
>> I am designing a HBase table for users and hope to get some
>> suggestions for my row key design. Thanks...
>> This user table will have columns which include user information such
>> as names, birthday, gender, address, phone number, etc... The first
>> time user comes to us we will ask all these information and we should
>> generate a new row in the table with a unique row key. The next time
>> the same user comes in again we will ask for his/her names and
>> birthday and our application should quickly get the row(s) in the
>> table which meets the name and birthday provided.
>> Here is what I am thinking as row key:
>> {first 6 digit of user's first name}_{first 6 digit of user's last
>> name}_{birthday in MMDDYYYY}_{timestamp when user comes in for the
>> first time}
>> However, I see a few questions from this row key:
>> (1) Although it is not very likely but there could be some small
>> chances that two users with same name and birthday came in at the same
>> day. And the two requests to generate new user came at the same time
>> (the timestamps were defined in the HTable API and happened to be of
>> the same value before calling the put method). This means the row key
>> design above won't guarantee a unique row key. Any suggestions on how
>> to modify it and ensure a unique ID?
>> (2) Sometimes we will only have part of user's first name and/or last
>> name. In that case, we will need to perform a scan and return multiple
>> matches to the client. To avoid scanning the whole table, if we have
>> user's first name, we can set start/stop row accordingly. But then if
>> we only have user's last name, we can't set up a good start/stop row.
>> What's even worse, if the user provides a "sounds-like" first or last
>> name, then our scan won't be able to return good possible matches.
>> Does anyone ever use names as part of the row key and encounter this
>> type of issue?
>> (3) The row key seems to be long (30+ chars), will this affect our
>> read/write performance? Maybe it will increase the storage a bit (say
>> we have 3 million rows per month)? In other words, does the length of
>> the row key matter a lot?
>> thanks!
>> Jason

View raw message