gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GORA-170) Getting a BufferUnderflowException in class CassandraColumn, method fromByteBuffer()
Date Mon, 05 Nov 2012 23:10:13 GMT

    [ https://issues.apache.org/jira/browse/GORA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491024#comment-13491024

Lewis John McGibbney commented on GORA-170:

WHen I attempt to Generate a fetch list with Nutch 2.x I get the following

2012-11-05 22:51:03,951 DEBUG connection.HThriftClient - keyspace reseting from null to webpage
2012-11-05 22:51:04,066 DEBUG connection.HThriftClient - Transport open status true for client
2012-11-05 22:51:04,066 DEBUG connection.ConcurrentHClientPool - Status of releaseClient CassandraClient<localhost:9160-8>
to queue: true
2012-11-05 22:51:04,087 WARN  mapred.FileOutputCommitter - Output path is null in cleanup
2012-11-05 22:51:04,089 WARN  mapred.LocalJobRunner - job_local_0001
	at java.nio.Buffer.nextGetIndex(Buffer.java:480)
	at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:336)
	at me.prettyprint.cassandra.serializers.IntegerSerializer.fromByteBuffer(IntegerSerializer.java:35)
	at me.prettyprint.cassandra.serializers.FloatSerializer.fromByteBuffer(FloatSerializer.java:25)
	at me.prettyprint.cassandra.serializers.FloatSerializer.fromByteBuffer(FloatSerializer.java:10)
	at org.apache.gora.cassandra.query.CassandraColumn.fromByteBuffer(CassandraColumn.java:74)
	at org.apache.gora.cassandra.query.CassandraSubColumn.getValue(CassandraSubColumn.java:86)
	at org.apache.gora.cassandra.query.CassandraResult.updatePersistent(CassandraResult.java:90)
	at org.apache.gora.cassandra.query.CassandraResult.nextInner(CassandraResult.java:56)
	at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:112)
	at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:111)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
2012-11-05 22:51:04,253 ERROR crawl.GeneratorJob - GeneratorJob: java.lang.RuntimeException:
job failed: name=generate: 1352155857-1625665918, jobid=job_local_0001
	at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
	at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:191)
	at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:213)
	at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:241)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:249)

This is without the patch attached.
> Getting a BufferUnderflowException in class CassandraColumn, method fromByteBuffer()
> ------------------------------------------------------------------------------------
>                 Key: GORA-170
>                 URL: https://issues.apache.org/jira/browse/GORA-170
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: storage-cassandra
>    Affects Versions: 0.2.1
>         Environment: Not sure environment matters for this one but Ubuntu
>            Reporter: Chris Gerken
>            Priority: Blocker
>             Fix For: 0.3
> When using CassandraStore and GoraMapper to retrieve data previously stored in Cassandra,
a BufferUnderflowException is being thrown in method fromByteBuffer() in class CassandraColumn.
 This results in a complete failure of the hadoop job trying to use the Cassandra data.
> The problem seems to be caused by an invalid assumption in the (de) Serializer logic.
 Serializers assume that the bytes in a ByteBuffer to be deserialized start at offset 0 (zero)
in the ByteBuffer's internal buffer.  In fact, there are times when a ByteBuffer passed back
from  the Hector/Thrift API will have its data start at a non-zero offset in its buffer. 
When serializers are given these non-zero offset ByteBuffers an exception, usually BufferUnderflowException,
is thrown.
> The suggested fix is to use the TbaseHelper class from Cassandra/Thrift:
>   import org.apache.thrift.TBaseHelper;
>   protected Object fromByteBuffer(Schema schema, ByteBuffer byteBuffer) {
>     Object value = null;
>     Serializer serializer = GoraSerializerTypeInferer.getSerializer(schema);
>     if (serializer == null) {
>       LOG.info("Schema is not supported: " + schema.toString());
>     } else {
>       ByteBuffer corrected = TBaseHelper.rightSize(byteBuffer);
>       value = serializer.fromByteBuffer(corrected);
>     }
>     return value;
>   }

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message