nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NUTCH-1675) NutchField to support long
Date Mon, 06 Jan 2014 14:45:56 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Markus Jelsma updated NUTCH-1675:
---------------------------------

      Component/s: indexer
         Priority: Minor  (was: Major)
    Fix Version/s: 1.8

> NutchField to support long
> --------------------------
>
>                 Key: NUTCH-1675
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1675
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.8
>
>         Attachments: NUTCH-1675-trunk.patch
>
>
> NutchField has no support for Long in readfields. Usually this is not a problem because
in reducers it is only written to the output. But when using NutchField in mappers, then a
reducer cannot read a Long.
> {code}
> java.lang.RuntimeException: problem advancing post rec#0
>         at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1217)
>         at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:250)
>         at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:246)
>         at org.apache.nutch.fetcher.Fetcher$FetcherReducer.reduce(Fetcher.java:1440)
>         at org.apache.nutch.fetcher.Fetcher$FetcherReducer.reduce(Fetcher.java:1401)
>         at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:197)
>         at org.apache.hadoop.io.Text.readString(Text.java:402)
>         at org.apache.nutch.indexer.NutchField.readFields(NutchField.java:89)
>         at org.apache.nutch.indexer.NutchDocument.readFields(NutchDocument.java:112)
>         at org.apache.nutch.indexer.NutchIndexAction.readFields(NutchIndexAction.java:81)
>         at org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:54)
>         at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>         at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>         at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1276)
>         at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1214)
>         ... 7 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message