nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject java.sql.BatchUpdateException after fetch and wrong WebPage.protocolStatus in trunk
Date Tue, 22 Mar 2011 16:19:57 GMT
Hi,

I did a few successful fetches for testing trunk's solrclean. After removing 
some pages for having a few NOTFOUND entries in the WebDB (with HSQLDB as 
storage backend) the following exception occured:

2011-03-22 16:53:49,727 INFO  fetcher.FetcherJob - -activeThreads=0
2011-03-22 16:53:51,036 WARN  mapred.LocalJobRunner - job_local_0001
java.io.IOException: java.sql.BatchUpdateException: data exception: string 
data, right truncation
        at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
        at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
        at 
org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
Caused by: java.sql.BatchUpdateException: data exception: string data, right 
truncation
        at org.hsqldb.jdbc.JDBCPreparedStatement.executeBatch(Unknown Source)
        at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
        ... 5 more

Also, when i execute solrclean and log WebPage.ProtocolStatus() i see wrong 
values for pages that were removed, instead of ProtocolStatusCodes.NOTFOUND 
(13) they got just 0.

It smells like a bug but i could be doing things the wrong way, of course ;)


Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Mime
View raw message