nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2202) Integration of Anthelion (Focused Crawling Module) into Nutch
Date Fri, 26 Jan 2018 19:36:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341487#comment-16341487
] 

ASF GitHub Bot commented on NUTCH-2202:
---------------------------------------

lewismc commented on issue #97: NUTCH-2202 Integration of Anthelion (Focused Crawling Module)
into Nutch
URL: https://github.com/apache/nutch/pull/97#issuecomment-360883033
 
 
   @RobertMeusel @HansBrende this is ready to be tested. I would also appreciated if folks
were able to VOTE on the current [Any23 2.2 release candidate](https://s.apache.org/PM3x).
   Finally, I've resolved all conflicts, updated some licensing information and remove binary
documentation resources, instead hosting them on the [Nutch wiki](https://wiki.apache.org/nutch/Anthelion).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Integration of Anthelion (Focused Crawling Module) into Nutch
> -------------------------------------------------------------
>
>                 Key: NUTCH-2202
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2202
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser, scoring
>            Reporter: Robert Meusel
>            Assignee: Lewis John McGibbney
>            Priority: Major
>              Labels: any23, online_learning
>
> We have recently released anthelion, which is a focused crawler plugin for structured
data which can be extracted with any23. (https://github.com/yahoo/anthelion) As proposed by
Lewis (Lewis John McGibbney) we think the integration of the parser (any23) and the scoring
function based on the online learner could be a good improvement for nutch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message