hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: HBase table row key design question.
Date Tue, 02 Oct 2012 20:02:23 GMT

Hi there, while this isn't an answer to some of the specific design
questions, this chapter in the RefGuide can be helpful for general design..


On 10/2/12 10:28 AM, "Jason Huang" <jason.huang@icare.com> wrote:

>I am designing a HBase table for users and hope to get some
>suggestions for my row key design. Thanks...
>This user table will have columns which include user information such
>as names, birthday, gender, address, phone number, etc... The first
>time user comes to us we will ask all these information and we should
>generate a new row in the table with a unique row key. The next time
>the same user comes in again we will ask for his/her names and
>birthday and our application should quickly get the row(s) in the
>table which meets the name and birthday provided.
>Here is what I am thinking as row key:
>{first 6 digit of user's first name}_{first 6 digit of user's last
>name}_{birthday in MMDDYYYY}_{timestamp when user comes in for the
>first time}
>However, I see a few questions from this row key:
>(1) Although it is not very likely but there could be some small
>chances that two users with same name and birthday came in at the same
>day. And the two requests to generate new user came at the same time
>(the timestamps were defined in the HTable API and happened to be of
>the same value before calling the put method). This means the row key
>design above won't guarantee a unique row key. Any suggestions on how
>to modify it and ensure a unique ID?
>(2) Sometimes we will only have part of user's first name and/or last
>name. In that case, we will need to perform a scan and return multiple
>matches to the client. To avoid scanning the whole table, if we have
>user's first name, we can set start/stop row accordingly. But then if
>we only have user's last name, we can't set up a good start/stop row.
>What's even worse, if the user provides a "sounds-like" first or last
>name, then our scan won't be able to return good possible matches.
>Does anyone ever use names as part of the row key and encounter this
>type of issue?
>(3) The row key seems to be long (30+ chars), will this affect our
>read/write performance? Maybe it will increase the storage a bit (say
>we have 3 million rows per month)? In other words, does the length of
>the row key matter a lot?

View raw message