nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yossi Tamari (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-2715) WARCExporter fails on large records
Date Mon, 06 May 2019 12:21:00 GMT
Yossi Tamari created NUTCH-2715:
-----------------------------------

             Summary: WARCExporter fails on large records
                 Key: NUTCH-2715
                 URL: https://issues.apache.org/jira/browse/NUTCH-2715
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 1.15
            Reporter: Yossi Tamari


com.martinkl.warc.WARCRecord throws an IllegalStateException when a single line is over 10,000 bytes.
Since this exception is not caught in WARCExporter, it fails the whole export.

I doubt that validity of the limitation in WARCRecord, but regardless, I think WARCExporter
should catch the exception and skip to the next record.

(See also [https://github.com/ept/warc-hadoop/issues/5])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message