nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Iterating spidered pages
Date Tue, 05 Jul 2005 17:38:29 GMT
Andy Liu wrote:

> However, somebody correct me if I'm wrong, but I don't think you can
> update individual ArrayFile entries once they've been written.  So
> while you're looping over each ParseData entry, you can write your
> updated ParseData objects to a temporary ArrayFile and replace it with
> the old one when you're done.

Yes, that's correct. Currently the only place where one can add some 
custom data without changing the core classes (Content, ParseData, 
ParseText) would be the metadata attributes. There are actually two 
metadata collections - one at the protocol level (Content.metadata) and 
the other at parse level (ParseData.metadata).

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

View raw message