hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goel, Ankur" <Ankur.G...@corp.aol.com>
Subject RE: HBase performance tuning
Date Fri, 28 Mar 2008 05:34:20 GMT
At present the HBase writer is written to cater to the schema 
needs of our application. It's a 2 table schema the details of 
which I have posted in reply to Bryan's post.

I'll check and let you know if the code can be contributed.
Once I get a green, I'll make some modifications to make it
more generic and share with you folks to understand how we can 
Improve it further before posting.


-----Original Message-----
From: stack [mailto:stack@duboce.net] 
Sent: Thursday, March 27, 2008 10:08 PM
To: hbase-user@hadoop.apache.org
Subject: Re: HBase performance tuning

I have some familiarity with that crawler.

Tell us more about your writer.   Is it proprietary?  If not, can we get

it into a place where others could use it if wanted?


Goel, Ankur wrote:
> I am crawling the web indeed, but only the sites that are present in 
> my seedlist. The crawler used here is heritrix 2.0 - 
> http://webteam.archive.org/confluence/display/Heritrix/2.0.0.
> I developed a Heritrix specific HBase writer that can be integrated 
> with Heritrix to write the crawled content directly into Hbase.
> -Ankur

View raw message