nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sertac TURKEL (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NUTCH-1727) Length of the Tlds
Date Wed, 12 Feb 2014 16:50:19 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sertac TURKEL updated NUTCH-1727:
---------------------------------

    Attachment: NUTCH-1727.patch

I had a look domain-suffix.xml  and I saw the longest domain suffix can include 8 characters(.internal).
By default value, I picked 8 for this reason and I prepared a patch.  Could you review my
patch?

> Length of the Tlds
> ------------------
>
>                 Key: NUTCH-1727
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1727
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Sertac TURKEL
>            Priority: Minor
>             Fix For: 2.1
>
>         Attachments: NUTCH-1727.patch
>
>
> Length of the tld  should be selectable, there is some available tld's like .travel and
url-validator plugin filters this type of urls.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message