nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter
Date Thu, 18 Jun 2015 15:13:01 GMT


ASF GitHub Bot commented on NUTCH-2038:

GitHub user asitang reopened a pull request:



You can merge this pull request into a Git repository by running:

    $ git pull NUTCH-2038

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #32
commit b0ce4a157dbd0bfd8ea368f3fa230a90c7117ae2
Author: Asitang Mishra <>
Date:   2015-06-17T16:11:42Z

    patch 1.0 for NUTCH-2038

commit e243cc5e626106a4cd8dfca8d9c2ec93e9648560
Author: Asitang Mishra <>
Date:   2015-06-17T16:14:37Z

    patch 1.0 for NUTCH-2038

commit 711f44d8d4af51538ff1764145ac743445b6f43b
Author: Asitang Mishra <>
Date:   2015-06-17T16:35:28Z

    patch 1.0 for NUTCH-2038

commit e0e924e15c247d3fa3dd92f387fe53ba7effd78a
Author: Asitang Mishra <>
Date:   2015-06-18T15:09:30Z

    final commir for pattch 1.0


> Naive Bayes classifier based url filter
> ---------------------------------------
>                 Key: NUTCH-2038
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher, injector, parser
>            Reporter: Asitang Mishra
>            Assignee: Chris A. Mattmann
>              Labels: memex, nutch
>             Fix For: 1.11
> A url filter that will filter out the urls (after the parsing stage,  will keep only
those urls that contain some "hot words" provided again in a list.) from that pages that are
classified irrelevant by the classifier.

This message was sent by Atlassian JIRA

View raw message