nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (NUTCH-2715) WARCExporter fails on large records
Date Fri, 24 May 2019 13:29:00 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sebastian Nagel resolved NUTCH-2715.
------------------------------------
    Resolution: Fixed

Fixed with NUTCH-2716 ([PR #454|https://github.com/apache/nutch/pull/454]). Thanks, [~yossi]!

> WARCExporter fails on large records
> -----------------------------------
>
>                 Key: NUTCH-2715
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2715
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.15
>            Reporter: Yossi Tamari
>            Priority: Major
>             Fix For: 1.16
>
>
> com.martinkl.warc.WARCRecord throws an IllegalStateException when a single line is over
10,000 bytes. Since this exception is not caught in WARCExporter, it fails the whole export.
> I doubt that validity of the limitation in WARCRecord, but regardless, I think WARCExporter
should catch the exception and skip to the next record.
> (See also [https://github.com/ept/warc-hadoop/issues/5])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message