nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otis Gospodnetic (JIRA)" <>
Subject [jira] Created: (NUTCH-627) Minimize host address lookup
Date Thu, 10 Apr 2008 04:12:04 GMT
Minimize host address lookup

                 Key: NUTCH-627
             Project: Nutch
          Issue Type: Improvement
          Components: generator
            Reporter: Otis Gospodnetic
         Attachments: NUTCH-627.patch

The simple patch that I'm about to attach keeps track of hosts whose "max URLs per host" limit
we already reached, as well as hosts whose hostname->IP lookup already failed.  For such
hosts, further DNS lookups are skipped:
- there is no point in looking up a hostname yet again if we already have the max number of
URLs for that host
- there is little point in attempting to look up a hostname yet again if the previous lookup
already failed

In a simple test, this saved a few hundred thousand lookups for the first case and a few hundred
lookups for the second case.

If nobody complains, I'll commit by the end of the week.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message