nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pablo Aragón (JIRA) <>
Subject [jira] Created: (NUTCH-802) Problems managing outlinks with large url length
Date Thu, 18 Mar 2010 10:40:27 GMT
Problems managing outlinks with large url length

                 Key: NUTCH-802
             Project: Nutch
          Issue Type: Bug
          Components: parser
            Reporter: Pablo Aragón

Nutch can get idle during the collection of outlinks if  the URL address of the outlink is
too large.

The maximum sizes of an URL for the main web servers are:

    * Apache: 4,000 bytes
    * Microsoft Internet Information Server (IIS): 16, 384 bytes
    * Perl HTTP::Daemon: 8.000 bytes

URL adress sizes bigger than 4000 bytes are problematic, so the limit should be set in the
nutch-default.xml configuration file.

I attached a patch

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message