nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2474) CrawlDbReader -stats fails with ClassCastException
Date Fri, 08 Dec 2017 21:47:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284278#comment-16284278
] 

ASF GitHub Bot commented on NUTCH-2474:
---------------------------------------

sebastian-nagel opened a new pull request #255: NUTCH-2474 CrawlDbReader -stats fails with
ClassCastException
URL: https://github.com/apache/nutch/pull/255
 
 
   - replace CrawlDbStatCombiner by CrawlDbStatReducer and ensure
     that data is properly processed independently whether and
     how often combiner is called
   - simplify calculation of minimum and maximum
   
   Tested in local mode. Large scale test on multi-billion CrawlDb (distributed mode) is scheduled.
I'll report the results after the weekend.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> CrawlDbReader -stats fails with ClassCastException
> --------------------------------------------------
>
>                 Key: NUTCH-2474
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2474
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.14
>         Environment: Java 8, distributed mode: Hadoop CDH 5.13.0
>            Reporter: Sebastian Nagel
>            Priority: Critical
>             Fix For: 1.14
>
>
> In distributed mode CrawlDbReader / readdb -stats fails with a ClassCastException in
the combiner:
> {noformat}
> 17/12/08 04:57:13 INFO mapreduce.Job: Task Id : attempt_1512553291624_0022_m_000039_0,
Status : FAILED
> Error: java.lang.ClassCastException: org.apache.hadoop.io.FloatWritable cannot be cast
to org.apache.hadoop.io.LongWritable
>         at org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatCombiner.reduce(CrawlDbReader.java:296)
>         at org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatCombiner.reduce(CrawlDbReader.java:222)
>         at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1639)
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1946)
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1514)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:466)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> {noformat}
> FloatWritables are used since NUTCH-2470, so that's when this bug was introduced.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message