nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ferdy Galema (JIRA)" <>
Subject [jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.
Date Tue, 12 Jun 2012 11:08:43 GMT


Ferdy Galema commented on NUTCH-1356:


"The parser threads you refer to, is that a known problem? Can we solve it?"
To solve it correctly every parser should check the interrupted state at regular intervals.
This is pretty huge task considering the amount of parsers. For now it is something to keep
in mind. I'll create an issue for reference.
> ParseUtil use ExecutorService instead of manually thread handling.
> ------------------------------------------------------------------
>                 Key: NUTCH-1356
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Ferdy Galema
>             Fix For: nutchgora, 1.6
>         Attachments: NUTCH-1356-trunk-v2.patch, NUTCH-1356-trunk.patch, NUTCH-1356.patch
> Because ParseUtil manages it's own parser threads by creating a thread for every parse
it sometimes happens that specific parsers are very expensive. For example, parsers that have
threadlocal fields will initialize them for every item to be parsed.
> By simply introducing a caching ExecutorService the ParseUtil will be able to cache threads
therefore parsing more efficient. See attached patch.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message