hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Duxbury <br...@rapleaf.com>
Subject Re: HBase Sample Schemas
Date Fri, 28 Mar 2008 14:42:07 GMT
Bloom filters in HBase, as they are currently designed, aren't a  
construct that users have to interact with directly. All retrieval  
operations take advantage of a bloom filter if it is configured.

-Bryan

On Mar 28, 2008, at 6:28 AM, Jim R. Wilson wrote:

> Thanks Ankur!
>
> Those are very helpful - finding example schemas has been a really
> sore point for me as well in trying to learn all this.
>
> I was wondering if you had an example that defined a bloom filter for
> a column, and an example on how to query a bloom filter once it's set
> up (shell example or rest example if possible).
>
> Thanks again!
>
> -- Jim R. Wilson (jimbojw)
>
> On Fri, Mar 28, 2008 at 1:33 AM, Goel, Ankur  
> <Ankur.Goel@corp.aol.com> wrote:
>>
>>> ....by adding a column.
>>  Sorry, I meant colon ":"
>>
>>
>>  -----Original Message-----
>>  From: Goel, Ankur [mailto:Ankur.Goel@corp.aol.com]
>>  Sent: Friday, March 28, 2008 12:01 PM
>>  To: hbase-user@hadoop.apache.org
>>
>>
>> Subject: RE: HBase Sample Schemas
>>
>>  The tables below are RDBMS tables with column names simply  
>> converted to
>>  column families by adding a column.
>>  I'd like to share ideas on how best these tables can be modified (or
>>  merged ??) to take advantage of column oriented design.
>>
>>  -----Original Message-----
>>  From: Edward J. Yoon [mailto:edward@udanax.org]
>>  Sent: Friday, March 28, 2008 11:48 AM
>>  To: hbase-user@hadoop.apache.org
>>  Subject: Re: HBase Sample Schemas
>>
>>  I don't think this is a good example.
>>
>>  Find the the difference between the two physical schemas for same
>>  logical data modeling of relational database using an relationship
>>  tables on RDBMS and a list of column qualifiers on BigTable.
>>
>>  On Fri, Mar 28, 2008 at 2:28 PM, Goel, Ankur  
>> <Ankur.Goel@corp.aol.com>
>>  wrote:
>>> Hi Bryan,
>>>         Here is the sample schema I have (looks closer to RDBMS, I
>>> know)
>>>
>>> TABLE:           seed_list
>>>
>>> DESCRIPTION: Used to store seed urls (both old and newly  
>>> discovered).
>>>             Initially populated with some seed URLs. The crawl
>>> controller
>>>             picks up the seeds from this table that have status=0  
>>> (Not
>>> Visited)
>>>                 or status=2 (Visited, but ready for re-crawl) and
>>> feeds these seeds
>>>             in batch to different crawl engines that it knows about.
>>>
>>> SCHEMA:      Columns families below
>>>
>>>          {"referer_id:", "100"}, // Integer here is Max_Length
>>>        {"url:","1500"},
>>>        {"site:","500"},
>>>        {"last_crawl_date:", "1000"},
>>>        {"next_crawl_date:", "1000"},
>>>        {"create_date:","100"},
>>>        {"status:","100"},
>>>        {"strike:", "100"},
>>>        {"language:","150"},
>>>        {"topic:","500"},
>>>        {"depth:","100000"}
>>>
>>> Common attributes are [max versions: 1,  compression: NONE, in  
>>> memory:
>>> false, block cache enabled: true, max length: 100, bloom filter:  
>>> none]
>>>
>>>
>>> TABLE:   web_content
>>>
>>> DESCRIPTION: Used to store information retrived after crawling a  
>>> URL.
>>>             Each crawl engines provides information about URL it
>>> crawled.
>>>             This information is then stored in this table depending
>>> upon
>>>             the profile settings (what should be stored?)
>>> SCHEMA:  Column families below
>>>
>>>            {"url:", "1500"},
>>>          {"site:","500"},
>>>          {"content_type:","100"},
>>>          {"title:", "1000"},
>>>          {"content:", Integer.MAX_VALUE + ""},
>>>          {"parsed_text:",Integer.MAX_VALUE + ""},
>>>          {"crawl_date:", "1000"},
>>>          {"last_modified_date:","100"},
>>>          {"http_headers:","10000"},
>>>          {"content_length:","11"},
>>>          {"outlinks_count:","100000"}
>>>
>>> Common attributes are [max versions: 1,  compression: BLOCK, in
>>  memory:
>>> false, block cache enabled: true, max length: 100, bloom filter:  
>>> none]
>>>
>>> Please feel free to suggest modifications/enhancements for column
>>> oriented Design.
>>>
>>> Thanks
>>> -Ankur
>>>
>>>
>>> -----Original Message-----
>>> From: Bryan Duxbury [mailto:bryan@rapleaf.com]
>>> Sent: Friday, March 28, 2008 10:33 AM
>>> To: hbase-user@hadoop.apache.org
>>> Subject: HBase Sample Schemas
>>>
>>> All,
>>>
>>> One of the more common types of questions we get from people new to
>>> HBase are about the differences in the schema between HBase and
>>> relational databases. So that we can generate some good examples of
>>> RDBMS schemas and their counterparts as they might be represented in
>>> HBase, could you guys post some small (1-5 entities) schemas that  
>>> you
>>> might be interested in using and a few sentences about how you'd  
>>> like
>>> to consume them? We can then discuss possible options and see how
>>> things might look. This will also help Stack, Jim, and myself to
>>> notice interesting access patterns we might want to support.
>>>
>>> Thanks in advance,
>>>
>>> Bryan
>>>
>>
>>
>>
>>  --
>>  B. Regards,
>>  Edward J. Yoon
>>


Mime
View raw message