nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Nazemian <>
Subject Sending parse data from one generate-fetch-update cycle to another one
Date Tue, 10 Jun 2014 10:55:42 GMT
Hi every body,
I am going to crawl and parse some news website as follows:
There are some important locations in each website that have news with
higher value of importance. Therefore I am going to parse page by xpath to
find these news. Then I am going to assign specific score to these news
based on their xpath. This is the step that I faced with problem. My
problem is score can be determined when one page is parsed by xpath. But
this score should be send to solr as a score of the document that could be
fetched at the next generate-fetch-update cycle! Therefore I should send
this score to the document that Is not fetched yet! How can I do this
procedure using Nutch? Is there exist any built-in class or process for
this purpose? How can I do that?
Best regards.


View raw message