nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Differences 1.x and trunk
Date Fri, 18 Mar 2011 15:57:20 GMT
On 3/18/11 4:31 PM, Markus Jelsma wrote:
> Hi all,
>
> I'm giving it a try to patch https://issues.apache.org/jira/browse/NUTCH-963
> to trunk after committing to 1.3. There are of course a lot of differences so
> i need a little advice on how to procede:
>
> - instead of using CrawlDB and CrawlDatum we now need WebTableReader?

Actually you need to use StorageUtils to set up Mapper or Reducer 
contexts. See other tools, e.g. Fetcher or Generator.

> - trunk uses slf instead of commons logging now?

Yes.

> - a page is now represented by storage.WebPage?

Yes. When you prepare a Job you also need to specify what fields from 
WebPage you are interested in (and only these fields will be pulled in 
from the storage). This is all handled by StorageUtils methods.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message