hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: HBase performance tuning
Date Thu, 27 Mar 2008 16:38:11 GMT
I have some familiarity with that crawler.

Tell us more about your writer.   Is it proprietary?  If not, can we get 
it into a place where others could use it if wanted?


Goel, Ankur wrote:
> I am crawling the web indeed, but only the sites
> that are present in my seedlist. The crawler used
> here is heritrix 2.0 -
> http://webteam.archive.org/confluence/display/Heritrix/2.0.0.
> I developed a Heritrix specific HBase writer that can be integrated with
> Heritrix to write the crawled content directly into Hbase.
> -Ankur

View raw message