nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-965) Parsing takes up 100% CPU
Date Wed, 09 Feb 2011 11:23:57 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992440#comment-12992440
] 

Julien Nioche commented on NUTCH-965:
-------------------------------------

this should be optional but activated by default
the parsing is also done within the fetching so it would need modifying there as well
would be nice to have that in 1.3 
note : change the title to something like "skip parsing for truncated documents" would be
more accurate description

> Parsing takes up 100% CPU
> -------------------------
>
>                 Key: NUTCH-965
>                 URL: https://issues.apache.org/jira/browse/NUTCH-965
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Alexis
>         Attachments: parserJob.patch
>
>
> The issue you're likely to run into when parsing truncated FLV files is described here:
> http://www.mail-archive.com/user@nutch.apache.org/msg01880.html
> The parser library gets stuck in infinite loop as it encounters corrupted data due to
for example truncating big binary files at fetch time.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message