hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edw...@udanax.org>
Subject Re: HBase Sample Schemas
Date Fri, 28 Mar 2008 06:18:08 GMT
I don't think this is a good example.

Find the the difference between the two physical schemas for same
logical data modeling of relational database using an relationship
tables on RDBMS and a list of column qualifiers on BigTable.

On Fri, Mar 28, 2008 at 2:28 PM, Goel, Ankur <Ankur.Goel@corp.aol.com> wrote:
> Hi Bryan,
>         Here is the sample schema I have (looks closer to RDBMS, I
> know)
>
> TABLE:           seed_list
>
> DESCRIPTION: Used to store seed urls (both old and newly discovered).
>             Initially populated with some seed URLs. The crawl
> controller
>             picks up the seeds from this table that have status=0 (Not
> Visited)
>                 or status=2 (Visited, but ready for re-crawl) and feeds
> these seeds
>             in batch to different crawl engines that it knows about.
>
> SCHEMA:      Columns families below
>
>          {"referer_id:", "100"}, // Integer here is Max_Length
>        {"url:","1500"},
>        {"site:","500"},
>        {"last_crawl_date:", "1000"},
>        {"next_crawl_date:", "1000"},
>        {"create_date:","100"},
>        {"status:","100"},
>        {"strike:", "100"},
>        {"language:","150"},
>        {"topic:","500"},
>        {"depth:","100000"}
>
> Common attributes are [max versions: 1,  compression: NONE, in memory:
> false, block cache enabled: true, max length: 100, bloom filter: none]
>
>
> TABLE:   web_content
>
> DESCRIPTION: Used to store information retrived after crawling a URL.
>             Each crawl engines provides information about URL it
> crawled.
>             This information is then stored in this table depending
> upon
>             the profile settings (what should be stored?)
> SCHEMA:  Column families below
>
>            {"url:", "1500"},
>          {"site:","500"},
>          {"content_type:","100"},
>          {"title:", "1000"},
>          {"content:", Integer.MAX_VALUE + ""},
>          {"parsed_text:",Integer.MAX_VALUE + ""},
>          {"crawl_date:", "1000"},
>          {"last_modified_date:","100"},
>          {"http_headers:","10000"},
>          {"content_length:","11"},
>          {"outlinks_count:","100000"}
>
> Common attributes are [max versions: 1,  compression: BLOCK, in memory:
> false, block cache enabled: true, max length: 100, bloom filter: none]
>
> Please feel free to suggest modifications/enhancements for column
> oriented
> Design.
>
> Thanks
> -Ankur
>
>
> -----Original Message-----
> From: Bryan Duxbury [mailto:bryan@rapleaf.com]
> Sent: Friday, March 28, 2008 10:33 AM
> To: hbase-user@hadoop.apache.org
> Subject: HBase Sample Schemas
>
> All,
>
> One of the more common types of questions we get from people new to
> HBase are about the differences in the schema between HBase and
> relational databases. So that we can generate some good examples of
> RDBMS schemas and their counterparts as they might be represented in
> HBase, could you guys post some small (1-5 entities) schemas that you
> might be interested in using and a few sentences about how you'd like to
> consume them? We can then discuss possible options and see how things
> might look. This will also help Stack, Jim, and myself to notice
> interesting access patterns we might want to support.
>
> Thanks in advance,
>
> Bryan
>



-- 
B. Regards,
Edward J. Yoon

Mime
View raw message